Forget privacy. Data is expiring, and we should be worried

Recording our history for future generations is easier than ever. However, digital data is far more vulnerable than physical artefacts

Old hard drives. Huge amounts of data are lost whenever a computer is scrapped. While privacy has been a central concern of computer-users in recent years, we should be equally worried about the preservation of our data, writes Aaron Fronda

We spend an obscene amount of time on the internet; often on some form of social media. Whether it be Facebook, Twitter or Instagram, these platforms are where we share and store our every thought and photo. They act as a digital map for the brief blip in which we exist on this planet. Facebook has replaced the scrapbook. Websites and blogs have become our digital diaries and textbooks. But while we have become comfortable (perhaps worryingly so) with uploading every second of our lives, do we ever stop to think about how safe the information we store on these digital journals really is? Not just from cyber-attacks by hackers, but also from the steady decay if time.

People around the world should be able to learn and build upon the work of previous generations, but they can only do that if they can have access
to it

Digital data is vulnerable – far more so than physical artefacts such as books or photographs. The average life of a web page is only about 100 days before it is altered or deleted. The shelf life of data stored on any hard drive – depending on how it is maintained – is roughly five years. After that, as a result of things such as magnetic field breakdown and advancements in the hardware, there is an increasing probability that the information will begin to deteriorate or become unreadable. Lost forever.

At the same time, libraries are quickly going the way of the dodo, with visitors falling year-on-year. Their gradual extinction seems natural when you consider we have access to everything at the click of a button or a tap of a screen. But there are some, such as Brewster Kahle, founder of the Internet Archive, who fear that, if we do not act now, we could lose large pieces of our past, plunging us into an Orwellian world of the perpetual present.

The Internet Archive
“Anyone who has ever lost data because the hard drive in their computer died knows that, while it is easy to put data on a hard drive, it isn’t easy to keep it backed up and safe,” says Alexis Rossi, Director of Web Services at the Internet Archive, a non-profit organisation that aims to build a digital library of websites and other cultural artefacts. Workers store data on at least two hard drives, which are kept in separate physical locations to maximise the chances of at least one copy surviving any catastrophic event. Content is audited periodically to ensure files are complete. They even remake files in new formats when they become popular, making sure to preserve the originals.

But simply keeping data effectively backed up and stored may not be enough. “We are also working with software archivists to emulate hardware and operating systems in order to replay old pieces of software”, says Rossi. “The Internet Arcade is the latest example of a collection like this.” Developing emulators to run old video games highlights how retaining a means of reading stored digital data is just as important as the information itself. It could be that there is an abundance of compact discs laying around someone’s home, each one a different album from a prolific artist, but without a CD player to access the information stored on them, the information is as good as gone.

When people buy, or in reality rent, music from the iTunes store, they don’t perceive that, if Apple (heaven forbid) were to go into liquidation and its devices left to fade, so too, over time, would consumers’ music catalogues become inaccessible.

“There seems to be a perception that once you put something on the internet, it will always be available”, explains Rossi. “We have seen many examples where this is not true.” He points to sites such as MobileMe (to which users uploaded large amounts of photos in a similar fashion to Facebook) and Friendster (which was once where millions of people conducted their online social lives). “The companies who ran those services decided to close them down, and much of that data would have disappeared if archivists around the world hadn’t done their best to save what they could. We make the saved versions of these sites available through archive.org to try to prevent that history and work from being lost.” This immense amount of data is stored on what the Internet Archive calls ‘petaboxes’, which are purpose built for high-density storage.

Digitising the past
Digital data is not the only concern of the Internet Archive. It has also amassed a huge physical collection, with 1.5 million books, 50,000 VHS tapes, 100,000 LPs and thousands of reels of film, which its workers digitise with purpose-built scanners. While the work they do is incredible, what is revolutionary is that all this information is freely accessible. You do not even need a login. In just a few clicks, you can be reading, watching or playing anything in the library’s archives. This freedom of information is key to the vision of the Internet Archive’s curators. “Access drives preservation”, says Rossi. “People around the world should be able to learn and build upon the work of previous generations, but they can only do that if they can have access to it.”

It is sad to think that we have less than one percent of the written documents from Rome and less than 0.1 percent of those from Ancient Egypt – but look at how we have built on that limited information and how it has shaped our modern world. Our treatment of the data stored in digital libraries is indicative of how complacent we have become about our recent history. Without the work of archivists, future generations may have 0.01 percent of our history. Information empowers us. Let’s hold onto it.