Storage

10 things you should know about long-term data archiving

Archiving data is a far bigger challenge than performing ordinary backups. Here are some factors to keep in mind so you don't wind up with a collection of obsolete or irretrievable junk.

Today, almost every organization archives at least some of its data. Some do so to comply with federal regulations, while others use archiving to facilitate their internal business requirements. Regardless of an organization's reason for archiving data, the process can be trickier than it might appear at first. Unlike a typical backup, archives must be able to stand the test of time. Given the rapid pace at which IT evolves, longevity can be a tall order. This following list of considerations will help you improve the long-term usefulness of your archives.

Note: This article is also available as a PDF download.

1: Storage medium

The first thing to take into account is the storage medium you use for your archives. Since they will be stored for long periods of time, you must choose a type of media that will last as long as your retention policy dictates.

Tapes tend to become demagnetized over time, which can lead to data loss. As a result, tapes are rated according to their durability. A good quality tape should last for 10 years or more. In contrast, optical storage media will last indefinitely.

2: Storage device

Another major consideration is whether the storage device you are using for your archives will be accessible in a few years. For example, 15 years ago, I stored my archives on Zip disks. They were a good choice at the time because they were relatively inexpensive and you could fit a whopping 100 MB of data on a single disk.

Today, though, Zip disks are pretty much extinct. I still have my old Zip drive, but it connects to a PC via a parallel port. Like the Zip drives themselves, parallel ports are also extinct, so I can't read the data from the Zip disks.

Unfortunately, there's no way to predict which types of storage devices will stand the test of time. Even so, it is important to try to pick those that have the best chance of being supported over the long term.

3: Revisiting old archives

On a similar note, your archive policies as well as the storage mechanisms you use for archiving data will undoubtedly change over time. So be sure you review your archives at least once a year to see if anything needs to be migrated to a different storage medium.

For example, about 10 years ago, I realized that Zip drives were becoming extinct, so I transferred all of my archives to CD. Today, I store most of my archives on DVD, but because modern DVD drives will also read CDs, I haven't needed to move my extremely old archives off CD and onto DVD.

4: Data usability

One major problem I have seen in the real world is archived data that's in an obsolete format. For example, a few years ago I helped someone restore some document files that had been archived in the early 1990s. Although I was able to recover the data relatively easily, the documents were created by an application called PFS Write. The PFS Write file format was widely supported in the late 80s and early 90s, but today, there aren't any applications around that can read the files.

To avoid situations like this, you might find it helpful to archive not only data, but also copies of the installation media for the applications that created the data. If you use this approach, don't forget to also archive copies of any necessary license keys.

5: Redundancy

When data is ready to be moved to the archives, many organizations simply write the data to tape and then store the tape some place safe. The problem is that the tape is often the only copy of the archived data.

I once did some work for an organization whose standard practice was to write its archives to tape and store the tapes in a fireproof vault. The vault was of good quality, and the tapes actually survived a flood even though the vault was submerged for a few days. A couple of years later, the organization needed to restore something off one of the archive tapes, only to find that the tape was bad. My point is that even the most elaborate systems for protecting tapes will do nothing to guard against something as simple as a defective tape. Your only defense against this type of situation is data redundancy.

6: Selective archiving

Consider what should be archived. Sure, you want to archive your data — but not all data is equally important. For example, you will probably want to archive your financial records indefinitely, but is it really necessary to preserve your telephone call logs for all eternity? Determine what types of data are present in your organization and the useful lifespan for each data type. Then, design your archival policy around it.

7: Retrieval method

As you design your archival system, remember that over time, the archives will probably grow to a monolithic size. So you need an efficient way of retrieving data from the archives should the need arise.

It might be simple to dump your archive data to tape, for example, but how well are your tapes indexed? If you aren't sure, ask yourself how much work would be involved in locating and retrieving a file that was archived three years ago. If you don't even know where to begin, it's time to consider a different method for archiving your data. Many commercial archival products provide a Web interface that simplifies the task of searching the archives for data.

8: Space considerations

Because your archives can become huge, you must plan for the long-term retention of all of that data. If you are archiving your data to removable media, capacity planning might be as simple as making sure there is enough free space in the vault to hold all of those tapes, and making sure that there is room in your IT budget to continue purchasing tapes. If you archive data to a network server, the capacity planning process will likely be much more important because of the limited amount of data that can be stored online.

9: Restoring to an isolated environment

As you develop your archive policy, you should stipulate how the data should be restored. My advice is to restore the data to an isolated environment whenever possible. I once saw a Fortune 500 company accidentally introduce a virus onto their file servers because they restored some infected archive files.

10: Online vs. offline storage

One last consideration is whether to store your archives online (on a dedicated archive server) or offline (on removable media). There are advantages and disadvantages to each method.

Storing data online keeps the data readily accessible. But the sheer volume of the archived data may make online retention impractical. Furthermore, data that is stored online may be vulnerable to theft, tampering, corruption, etc.

Offline storage enables you to store an unlimited amount of data. However, the data is not readily accessible, and it may prove to be difficult to restore the data should the need arise years from now.

About

Brien Posey is a seven-time Microsoft MVP. He has written thousands of articles and written or contributed to dozens of books on a variety of IT subjects.

Editor's Picks