Storage

10 things you should know about long-term data archiving

Archiving data is a far bigger challenge than performing ordinary backups. Here are some factors to keep in mind so you don't wind up with a collection of obsolete or irretrievable junk.

Today, almost every organization archives at least some of its data. Some do so to comply with federal regulations, while others use archiving to facilitate their internal business requirements. Regardless of an organization's reason for archiving data, the process can be trickier than it might appear at first. Unlike a typical backup, archives must be able to stand the test of time. Given the rapid pace at which IT evolves, longevity can be a tall order. This following list of considerations will help you improve the long-term usefulness of your archives.

Note: This article is also available as a PDF download.

1: Storage medium

The first thing to take into account is the storage medium you use for your archives. Since they will be stored for long periods of time, you must choose a type of media that will last as long as your retention policy dictates.

Tapes tend to become demagnetized over time, which can lead to data loss. As a result, tapes are rated according to their durability. A good quality tape should last for 10 years or more. In contrast, optical storage media will last indefinitely.

2: Storage device

Another major consideration is whether the storage device you are using for your archives will be accessible in a few years. For example, 15 years ago, I stored my archives on Zip disks. They were a good choice at the time because they were relatively inexpensive and you could fit a whopping 100 MB of data on a single disk.

Today, though, Zip disks are pretty much extinct. I still have my old Zip drive, but it connects to a PC via a parallel port. Like the Zip drives themselves, parallel ports are also extinct, so I can't read the data from the Zip disks.

Unfortunately, there's no way to predict which types of storage devices will stand the test of time. Even so, it is important to try to pick those that have the best chance of being supported over the long term.

3: Revisiting old archives

On a similar note, your archive policies as well as the storage mechanisms you use for archiving data will undoubtedly change over time. So be sure you review your archives at least once a year to see if anything needs to be migrated to a different storage medium.

For example, about 10 years ago, I realized that Zip drives were becoming extinct, so I transferred all of my archives to CD. Today, I store most of my archives on DVD, but because modern DVD drives will also read CDs, I haven't needed to move my extremely old archives off CD and onto DVD.

4: Data usability

One major problem I have seen in the real world is archived data that's in an obsolete format. For example, a few years ago I helped someone restore some document files that had been archived in the early 1990s. Although I was able to recover the data relatively easily, the documents were created by an application called PFS Write. The PFS Write file format was widely supported in the late 80s and early 90s, but today, there aren't any applications around that can read the files.

To avoid situations like this, you might find it helpful to archive not only data, but also copies of the installation media for the applications that created the data. If you use this approach, don't forget to also archive copies of any necessary license keys.

5: Redundancy

When data is ready to be moved to the archives, many organizations simply write the data to tape and then store the tape some place safe. The problem is that the tape is often the only copy of the archived data.

I once did some work for an organization whose standard practice was to write its archives to tape and store the tapes in a fireproof vault. The vault was of good quality, and the tapes actually survived a flood even though the vault was submerged for a few days. A couple of years later, the organization needed to restore something off one of the archive tapes, only to find that the tape was bad. My point is that even the most elaborate systems for protecting tapes will do nothing to guard against something as simple as a defective tape. Your only defense against this type of situation is data redundancy.

6: Selective archiving

Consider what should be archived. Sure, you want to archive your data -- but not all data is equally important. For example, you will probably want to archive your financial records indefinitely, but is it really necessary to preserve your telephone call logs for all eternity? Determine what types of data are present in your organization and the useful lifespan for each data type. Then, design your archival policy around it.

7: Retrieval method

As you design your archival system, remember that over time, the archives will probably grow to a monolithic size. So you need an efficient way of retrieving data from the archives should the need arise.

It might be simple to dump your archive data to tape, for example, but how well are your tapes indexed? If you aren't sure, ask yourself how much work would be involved in locating and retrieving a file that was archived three years ago. If you don't even know where to begin, it's time to consider a different method for archiving your data. Many commercial archival products provide a Web interface that simplifies the task of searching the archives for data.

8: Space considerations

Because your archives can become huge, you must plan for the long-term retention of all of that data. If you are archiving your data to removable media, capacity planning might be as simple as making sure there is enough free space in the vault to hold all of those tapes, and making sure that there is room in your IT budget to continue purchasing tapes. If you archive data to a network server, the capacity planning process will likely be much more important because of the limited amount of data that can be stored online.

9: Restoring to an isolated environment

As you develop your archive policy, you should stipulate how the data should be restored. My advice is to restore the data to an isolated environment whenever possible. I once saw a Fortune 500 company accidentally introduce a virus onto their file servers because they restored some infected archive files.

10: Online vs. offline storage

One last consideration is whether to store your archives online (on a dedicated archive server) or offline (on removable media). There are advantages and disadvantages to each method.

Storing data online keeps the data readily accessible. But the sheer volume of the archived data may make online retention impractical. Furthermore, data that is stored online may be vulnerable to theft, tampering, corruption, etc.

Offline storage enables you to store an unlimited amount of data. However, the data is not readily accessible, and it may prove to be difficult to restore the data should the need arise years from now.

About

Brien Posey is a seven-time Microsoft MVP. He has written thousands of articles and written or contributed to dozens of books on a variety of IT subjects.

16 comments
gregdinn
gregdinn

The most important information ever recorded was stored on paper. I suggest that you use this old and reliable technology that has stood the test of time. At least print impotant things like customer information and of course accounting. Pictures can be printed as well.

JTdot
JTdot

With step 4, I would probably recommend storing the files in an open format rather than just storing the installers for the app it was created in. Years down the track, machine architecture may not support these applications and you may not be able to access your data without a purpose built machine

tranieri2
tranieri2

I've seen cables that will convert parallel to another end (usb?). You may want to check into that!

DonG43
DonG43

Excellent article. It really points out that there are more factors to archiving than when, where and what media.

DonG43
DonG43

Re. item 5: I once heard a saying I use in all my classes, "Backups are worthless.... restores are priceless.

genenem
genenem

Over the last 21 years I have tried all sorts of backup devices and media. ONLY one has stood the test of time -- optical media (CD and DVD). The reason? Standards! Thanks for a thoughtful piece on an often misunderstood and neglected part of data management. Genene Miller Data Assurance Technical Associates www.data-a.com Do you know about yVolumes?

Who Am I Really
Who Am I Really

we do all of our data archiving with a simplified setup / policy; the policy is: Migrate -> Migrate -> Migrate our policy is simple: use the previous generation of storage medium and migrate to the next when it becomes cost effective ie. all our old data used to be on Data CDs of which some are no longer readable except on the system that created them specifically those CDs created by an ancient version of Roxio Direct CD (as there were no win98 or win2K systems available, I had to install the Direct CD module onto an XP workstation to access the data and the Daily BSOD were maddening while retrieving the Data for migration I then migrated all the to Data DVDs of the 4.7 GB variety as they can be had for 15-30 cents each now and when BD discs drop to a similar price per GB I'll begin migrating the data to that storage medium.

zackers
zackers

Businesses face a lot of legal considerations over what MUST be archived, especially since Sarbanes-Oxley in 2002. The new financial reform act of 2010 may also require additional archiving requirements.

TechrepLath
TechrepLath

I'm afraid I must firmly disagree with this point. In fact it is the first thing I will tell any customer/user... "Do NOT expect CDs to last forever!" I usually give a safe 8 year average estimate. Now good storage (cool, dark, etc...) will get you a great deal more, but I've had CDs go CRC inconsistent or even totally blank in less that 2 years. For important stuff I will choose depending on volume and privacy issues only. High volume, high privacy goes on multiple drives on different machines, with only sporadic syncronization to off-site personal storage or external HD... Low volume, low privacy is synchronized daily on offsite managed servers over ssh using rsync. NEVER will I use a CD/DVD for anything over short term storage or high volume data transport.

Asakku
Asakku

I wonder about the quality of this article as it fails to mention the importance of storing data in an open format. Doing this will pretty much ensure that you can read it later, without having to store an old program that may or may not run together with the data.

zackers
zackers

Second that on optical. Note that there are various types of optical. Write-once will generally last longer than RW optical. Even the best are generally rated for more than 10 years, though I did read awhile back of special (expensive) optical disks that were supposed to last 20 years. There are actually some optical types which have a shorter rated life than tape!

dunworthdl
dunworthdl

Item 4 addresses this sufficiently although he doesn't specifically mention using an "open" format. While unlikely, even open formats can fall into disuse or be surpassed by the evolution of a standard. As a systems administrator you can't depend on backwards compatibility so you will need to make sure that you have a program that will read your data.

smfieldssr
smfieldssr

Agreed. Optical media does suffer from instability with time. The record itself seems to last well enough, but the reflective surface the media depends on (aluminum base, I think) oxidizes with time, the reflectivity is lost, and the laser has nothing to bounce back to the reader - hence, no record.

Who Am I Really
Who Am I Really

in normal CD-R and DVD +/-R is Silver and Gold the Silver ones are for general use the Gold (read Expensive!) are supposed to be Archival Quality -(some even sell as 100 Year archival quality) Aluminium is used in the recording layer of CD-RW & DVD +/-RW the Silver reflective layer of the standard CD/DVD is more susceptible to oxidization / tarnishing, (silver turning black etc.) than the Gold used in the "Gold Archival" CD/DVDs CD/DVDs in use or improperly stored in areas suffering from relatively extreme climates: (high heat / humidity etc.) will perish quicker than CD/DVDs in use / storage in more moderate climates. another thing that affects the overall life expectancy of the CD/DVD, is the quality of the burn which is generally determined by the hardware, media, and the Speed at which the data is "Burned" / Written to the media ie. - A quick one off high speed burn on a high quality CD/DVD media will likely not last nearly as long as a a general purpose quality media using a lower speed to write the data on the media, - here is a photo of a 650MB multi-session Data CD with second from the end session burned @ max and the others all done @ 4x > http://i256.photobucket.com/albums/hh195/RichardFDisk/BadBurn.png _______________________________________ sources: - mostly compiled from memory of what information I have gathered from different CD/DVD program's user guides. And my job has been in Web & CD/DVD production, mastering & archiving for the last 9+ years

BilboRT
BilboRT

DVD's are aluminum based. DVD-R's are NOT. The active layer is a crystalline layer which blocks the light as it passes to the reflective aluminum surface. My own testing has shown that multiple reads of a DVD-R will erase/corrupt this layer. About 5 reads can do it. Try it yourself. Let us be logical here. If crystals tend to grow naturally, and DVD-R's are written by applying heat (via the LASER) them how can anyone claim they are archival?

Editor's Picks