Storage

The myth of perpetual digital storage

An aspect of data archival not often thought about is the less obvious challenges related to long term digital storage. Is digital data forever? Lets look at some of the challenges to that.

Since the advent of digital computers we have always had the need for data storage.

While it might appear trivial to the non-professional to plonk that terabyte worth of files into a hard disk or tape and label it a backup, the storage administrator will tell you that there is more than meets the eye.

There are a myriad of issues to be considered, such as available archival media, deciding to make it near-line or off-line; on-site or off-site, capturing incremental changes, backup frequency and time windows, software to guarantee that data is captured in a consistent state – the list goes on.

Another consideration that might be even less obvious are the challenges related to long term digital storage.

Most compliance laws now require that data be stored for up to seven years, but there is no guarantee that this will not increase in future. With organizations generating more data than ever, I believe it to be an increasingly pertinent concern.

Is digital data forever? Today we want to look at some of the challenges to that.

Software and/or format obsolescence

Admittedly, this is an area that does not result in the outright loss of digital data. Nevertheless, it could still pose a very serious obstacle to being able to access or view the achieved data in a meaningful way.

For example, digital camera companies have their own proprietary “RAW” formats for recording the raw data from their cameras. Because these formats are often undocumented, vast amounts of data could be lost should the companies ceases to exist or stop supporting that particular format.

Another area that is closer to home can be evidenced in an upgrade from say Microsoft Exchange 2003 to Microsoft Exchange 2007. How do you access your archived Microsoft Exchange 2003 data store after the upgrade? Not without a lot of trouble for sure.

Shift it back to Exchange 2000 or to an older version of a non-mainstream software and you could have a very real problem on hand.

What about earlier iterations of your ERP/CRM software that was subsequently upgraded or tweaked to support more features. Where does that leave your earlier data backups should you need to reference certain information?

Media faults

Media fault occurs more often than most people realize. Consider the fact that administrators worth their salt rely heavily on RAID to guarantee data redundancy.

However, in a joint research titled “A fresh look at the reliability of long-term digital storage” conducted by a number researchers from Stanford, Harvard, Intel and HP Labs, they discovered that factors not related to the capabilities of RAID can cause it to fail.

For example, faulty PSU (Power Supply Units) resulted in a large number of machine resets – affecting a number of hard disk simultaneously. And yes, I have personally seen a faulty PSU fry the hard disk that it was attached to.

Also, the typical recommendation from RAID manufacturers would be to use the same model and capacity of hard disk to build the array. As a result, administrators typically purchase all requisite hard disk as part of the same tender for convenience as well as economics. This often results in hard disks that literally came off the same manufacturing line.

Using hard disk that comes from the same batch belies the fact that using hard disks with the same firmware, from the same manufacturing line and then subjected to the same usage environments is a receipt for them to fail very close together.

I know of a Web host that does hosting for a few thousand domains. They have taken to proactively replacing their hard disks with new ones after 3-5 years of use because, “They tend to fail together.”

Another particular troublesome area would be the degradation of data called “bit rot.” This is particular deadly because they happen without any warning, and often result in irrecoverable data cause by bit faults.

The most familiar example of this would probably be CD-ROMs. Despite being sold as reliable for decades, cases abound of them failing just after two to five years even when stored as per manufacturer’s recommendations.

Media and/or hardware obsolescence

Think about it: 5 ½ floppy disks are no longer being manufactured and most vendors are not even putting 3 ¼ floppy drives into newer computers.

Recently I came across an old SCSI hard disk. However, because all the newer servers featured SAS (Serial Attached SCSI) connection, I had problem accessing the data on that SCSI disk.

Not convinced yet? Does Iomega’s “Zip” and “Jaz” drives ring a bell? I should know – I just threw away a broken Zip drive the other week. I wonder if there are any used Zip media lying around.

How about tape drives that just conked out and your company decided against your advice to upgrade to a new model that utilizes a different tape cartridge?

<Next page - Malicious attack>

About

Paul Mah is a writer and blogger who lives in Singapore, where he has worked for a number of years in various capacities within the IT industry. Paul enjoys tinkering with tech gadgets, smartphones, and networking devices.

7 comments
JohnMcGrew
JohnMcGrew

...that these are issues worth spending money on. Very few, even within IT truly appreciate what true longevity and redundancy really costs. And it's not just about hardware; I loved the story about how employee turn-over resulted in the loss of critical data. I wonder how much it ultimately cost to resolve that mess. Selling management or owners on the value of true long-term storage is a difficult fight, since the return on investment is almost impossible to quantify. A dozen years or so ago I was on a tour of NASA-JPL, going through huge rooms filled with tape containing data from nearly every US space mission ever flown. I was astounded to learn that perhaps less than 10% of all that data had ever been analyzed, and that no comprehensive catalog of that data existed. (Just like the corporate world frequently does, Congress loves allocating money to hardware and missions that return pretty pictures, but doesn???t care much about what happens to the data afterwards; which ironically enough is where the real science is) Data that isn???t indexed or cataloged for all practical purposes doesn???t exist. I thought it was a complete shame that we???d spent all that money on collecting that data, and that over 90% of it was going to be lost to ???bit rot??? before anyone figured out that it was worth anything, or even there.

NickNielsen
NickNielsen

I just trashed almost 250 floppies (of over 500, both 5-1/4 AND 3-1/2) that had been stored inside a magnetically shielded, humidity-controlled container . The 5-1/4" floppies all went out due to media obsolescence and, hopefully, redundancy. (They were all labeled as earlier versions of code and data saved on the 3-1/2" diskettes.) In almost every case, the 3-1/2" floppies failed to due to unrecoverable errors in track 0, sector 0. Talk about your bit rot.

paulmah
paulmah

Have you come up against any of the above six problem areas to long term digital storage I highlighted above?

sgt_shultz
sgt_shultz

isn't 'important' and 'floppy' mutually exclusive? i liked your article, i liked the tidbit about drives tending to fail together. all the other points i take with a large grain of salt (about what i'm worth) as you do not have evidence, only anecedotes. the troubleshooter in me begs nick to try to read the floppys with a drive of about the same age, even better is to store them with the drive that created them. which is the moral of the story imho. but as he said, the data wasn't valuable after all... if i had more ambition, i'd post this question: have you ever been asked to recover data from old systems that was a) mission critical and b) harder than 'a hassle' to recover. i have not and in my 15 plus years have not heard of a single case. sometimes my biggest contribution is to advocate for the secure destruction of old backup tapes, obsolete servers with raid etc. it is easy to keep backing it up, preserving it. harder to identify it and put a value on it for the business. time and storage are money. that is the real shame of old storage media. it just keeps taking up space and being re-inventoried for years. this article does not address any real issues but was interesting to me. too easy to say sky is falling. any trained records manager, which is what we all are, knows that data has a value and a lifetime. it is admins job to document same and provide for destruction in documented orderly way at end of life time. (sound of soap box being scraped back under kitchen counter)

john3347
john3347

I am retired and only have my personal collection of music and family photos to worry about. I save to CD/DVD for reasons of both cost and convenience. I have totally lost all data on CD's after 5 years of storage and have lost some data on others after only 3 years. I date discs when I record them and copy them every 3 years which works ok for only a few GB of material. (I do still have my original Zip 100 drive and information on them is still good, BUT my documents generated with Publisher 7 (if memory serves me correct) are effectively worthless. Anybody still got a copy of Publisher 7 kicking around? What is this; two of the six problems?

S,David
S,David

Heh. I have in my personal collection punch cards with no reader, DEC tape with no drive, SSHD 5.25 floppys, 8 inch floppys, cassette tapes, 60Mb tape cartridges, and the list goes on. Also, working drives for which new media is almost impossible to find. At the office we have ST506, RLL, and ESDI drives with projects from long gone programmers, but no controller cards. From my desk I can see two IBM midrange systems with three different tape formats, a slew of PCs with three different DAT formats, and two machines with 8mm tape drives in two different formats. I used to try and migrate the data forward to new formats, but found that, like papers on my desk, if it has not been needed after a period of time, throw it away and it will never be missed.

NickNielsen
NickNielsen

I attempted to read the floppies using the same drive on the same PC on which the floppies were originally created. My intention was to copy the data to hard drive, then transfer it via drive transplant to my current live system. The PC had been boxed up less than a year after the diskettes were created and was not used since. Ultimately, I attempted to read the floppies on three different systems using four different drives and seven different OS as well as two drive recovery utilities. "Unable to read track 0, sector 0" means the same in MS-DOS, FreeDOS, Windows 2000, FreeBSD, Ubuntu, PCLinuxOS, and Mandriva. It also means the same to GetDataBack and R-Studio; if you can't read the diskette, you can't get the data. I don't recall saying the data was unimportant in my original post. I can live without it, but I would rather have had it available.