Since the advent of digital computers we have always had the need for data storage.
While it might appear trivial to the non-professional to plonk that terabyte worth of files into a hard disk or tape and label it a backup, the storage administrator will tell you that there is more than meets the eye.
There are a myriad of issues to be considered, such as available archival media, deciding to make it near-line or off-line; on-site or off-site, capturing incremental changes, backup frequency and time windows, software to guarantee that data is captured in a consistent state – the list goes on.
Another consideration that might be even less obvious are the challenges related to long term digital storage.
Most compliance laws now require that data be stored for up to seven years, but there is no guarantee that this will not increase in future. With organizations generating more data than ever, I believe it to be an increasingly pertinent concern.
Is digital data forever? Today we want to look at some of the challenges to that.
Software and/or format obsolescence
Admittedly, this is an area that does not result in the outright loss of digital data. Nevertheless, it could still pose a very serious obstacle to being able to access or view the achieved data in a meaningful way.
For example, digital camera companies have their own proprietary “RAW” formats for recording the raw data from their cameras. Because these formats are often undocumented, vast amounts of data could be lost should the companies ceases to exist or stop supporting that particular format.
Another area that is closer to home can be evidenced in an upgrade from say Microsoft Exchange 2003 to Microsoft Exchange 2007. How do you access your archived Microsoft Exchange 2003 data store after the upgrade? Not without a lot of trouble for sure.
Shift it back to Exchange 2000 or to an older version of a non-mainstream software and you could have a very real problem on hand.
What about earlier iterations of your ERP/CRM software that was subsequently upgraded or tweaked to support more features. Where does that leave your earlier data backups should you need to reference certain information?
Media fault occurs more often than most people realize. Consider the fact that administrators worth their salt rely heavily on RAID to guarantee data redundancy.
However, in a joint research titled “A fresh look at the reliability of long-term digital storage” conducted by a number researchers from Stanford, Harvard, Intel and HP Labs, they discovered that factors not related to the capabilities of RAID can cause it to fail.
For example, faulty PSU (Power Supply Units) resulted in a large number of machine resets – affecting a number of hard disk simultaneously. And yes, I have personally seen a faulty PSU fry the hard disk that it was attached to.
Also, the typical recommendation from RAID manufacturers would be to use the same model and capacity of hard disk to build the array. As a result, administrators typically purchase all requisite hard disk as part of the same tender for convenience as well as economics. This often results in hard disks that literally came off the same manufacturing line.
Using hard disk that comes from the same batch belies the fact that using hard disks with the same firmware, from the same manufacturing line and then subjected to the same usage environments is a receipt for them to fail very close together.
I know of a Web host that does hosting for a few thousand domains. They have taken to proactively replacing their hard disks with new ones after 3-5 years of use because, “They tend to fail together.”
Another particular troublesome area would be the degradation of data called “bit rot.” This is particular deadly because they happen without any warning, and often result in irrecoverable data cause by bit faults.
The most familiar example of this would probably be CD-ROMs. Despite being sold as reliable for decades, cases abound of them failing just after two to five years even when stored as per manufacturer’s recommendations.
Media and/or hardware obsolescence
Think about it: 5 ½ floppy disks are no longer being manufactured and most vendors are not even putting 3 ¼ floppy drives into newer computers.
Recently I came across an old SCSI hard disk. However, because all the newer servers featured SAS (Serial Attached SCSI) connection, I had problem accessing the data on that SCSI disk.
Not convinced yet? Does Iomega’s “Zip” and “Jaz” drives ring a bell? I should know – I just threw away a broken Zip drive the other week. I wonder if there are any used Zip media lying around.
How about tape drives that just conked out and your company decided against your advice to upgrade to a new model that utilizes a different tape cartridge?
A discussion on the preservation of digital data for extended periods would not be complete without a consideration of malicious third parties, or hackers bent on creating havoc.
As is evidenced from the above article, the threat of deliberate attempts to erase data is very real. Particularly vulnerable here would be near-line or on-line storage systems.
Yet because attacks don’t have to original from external parties, even off-line storage without a proper system of control can be susceptible.
Attacks could vary between destruction, censorship, modification, theft as well as disruption of services. Motivation could range from ideological, political, financial or legal factors as well as for bragging rights or employee dissatisfaction.
It might be of interest to note that because we are talking about long-term storage of data, the probability of a successful malicious attack actually increases with time!
Data loss due to organization faults
It is very possible to lose data due to poor corporate governance or sloppy leadership.
To quote a personal anecdote, a company I worked in had the source code for a production system deleted by a new staff. The new hire resigned shortly after joining, but before he left, performed a routine format and system restore on the personal terminal that he was using.
Unbeknownst to everyone due to a 100% turnover in the department, that machine actually contained the only copy of the above-mentioned source code.
Nobody knew better to stop him and it was practically impossible to piece together the project and source files into semblance of the original after what he did. As a result, the source code was effectively lost.
The previous team leader should have ensured that there are separate backups of such important data. On the same token, perhaps the management should have made a bigger effort to retain some of the members from the earlier team or pressed for a proper handover.
In a perfect world with unlimited budget we will all be running RAID with hot-spares synchronized across three geographically dispersed datacenters. And probably a robotic tape drive array to perform nightly data archival at each data center with a different tape every single day.
But storage being a cost center within the greater cost center of IT, the typical storage administrator often has to grapple with the problem of insufficient funding.
More often than not, compromises have to be made which can severely reduce the effectiveness of backups. Perhaps more worryingly, the nature of this particular fault often precludes upper management from knowing about it.
I will be doing a follow-up piece on ways to mitigate some of the above barriers to long-term digital storage. If you have any suggestions on that, you are most welcome to email me directly at paulmah (-at-) gmail.com
Have you come up against any of the above six problem areas to long term digital storage I highlighted above? Why not tell us more about it.
Bibliography: A fresh look at the reliability of long-term digital storage (Published August 31 2005)