The myth of perpetual digital storage

An aspect of data archival not often thought about is the less obvious challenges related to long term digital storage. Is digital data forever? Lets look at some of the challenges to that.

Since the advent of digital computers we have always had the need for data storage.

While it might appear trivial to the non-professional to plonk that terabyte worth of files into a hard disk or tape and label it a backup, the storage administrator will tell you that there is more than meets the eye.

There are a myriad of issues to be considered, such as available archival media, deciding to make it near-line or off-line; on-site or off-site, capturing incremental changes, backup frequency and time windows, software to guarantee that data is captured in a consistent state – the list goes on.

Another consideration that might be even less obvious are the challenges related to long term digital storage.

Most compliance laws now require that data be stored for up to seven years, but there is no guarantee that this will not increase in future. With organizations generating more data than ever, I believe it to be an increasingly pertinent concern.

Is digital data forever? Today we want to look at some of the challenges to that.

Software and/or format obsolescence

Admittedly, this is an area that does not result in the outright loss of digital data. Nevertheless, it could still pose a very serious obstacle to being able to access or view the achieved data in a meaningful way.

For example, digital camera companies have their own proprietary “RAW” formats for recording the raw data from their cameras. Because these formats are often undocumented, vast amounts of data could be lost should the companies ceases to exist or stop supporting that particular format.

Another area that is closer to home can be evidenced in an upgrade from say Microsoft Exchange 2003 to Microsoft Exchange 2007. How do you access your archived Microsoft Exchange 2003 data store after the upgrade? Not without a lot of trouble for sure.

Shift it back to Exchange 2000 or to an older version of a non-mainstream software and you could have a very real problem on hand.

What about earlier iterations of your ERP/CRM software that was subsequently upgraded or tweaked to support more features. Where does that leave your earlier data backups should you need to reference certain information?

Media faults

Media fault occurs more often than most people realize. Consider the fact that administrators worth their salt rely heavily on RAID to guarantee data redundancy.

However, in a joint research titled “A fresh look at the reliability of long-term digital storage” conducted by a number researchers from Stanford, Harvard, Intel and HP Labs, they discovered that factors not related to the capabilities of RAID can cause it to fail.

For example, faulty PSU (Power Supply Units) resulted in a large number of machine resets – affecting a number of hard disk simultaneously. And yes, I have personally seen a faulty PSU fry the hard disk that it was attached to.

Also, the typical recommendation from RAID manufacturers would be to use the same model and capacity of hard disk to build the array. As a result, administrators typically purchase all requisite hard disk as part of the same tender for convenience as well as economics. This often results in hard disks that literally came off the same manufacturing line.

Using hard disk that comes from the same batch belies the fact that using hard disks with the same firmware, from the same manufacturing line and then subjected to the same usage environments is a receipt for them to fail very close together.

I know of a Web host that does hosting for a few thousand domains. They have taken to proactively replacing their hard disks with new ones after 3-5 years of use because, “They tend to fail together.”

Another particular troublesome area would be the degradation of data called “bit rot.” This is particular deadly because they happen without any warning, and often result in irrecoverable data cause by bit faults.

The most familiar example of this would probably be CD-ROMs. Despite being sold as reliable for decades, cases abound of them failing just after two to five years even when stored as per manufacturer’s recommendations.

Media and/or hardware obsolescence

Think about it: 5 ½ floppy disks are no longer being manufactured and most vendors are not even putting 3 ¼ floppy drives into newer computers.

Recently I came across an old SCSI hard disk. However, because all the newer servers featured SAS (Serial Attached SCSI) connection, I had problem accessing the data on that SCSI disk.

Not convinced yet? Does Iomega’s “Zip” and “Jaz” drives ring a bell? I should know – I just threw away a broken Zip drive the other week. I wonder if there are any used Zip media lying around.

How about tape drives that just conked out and your company decided against your advice to upgrade to a new model that utilizes a different tape cartridge?

<Next page - Malicious attack>