After my post on the myth of perpetual digital storage, I received a few e-mail messages from TechRepublic members who specialize in the field of data management or digital archival.
There are a few key areas that stand out, on which I will elaborate today.
Populate your RAID array with hard disk drives from different batches
Brant Bady, who is the Manager of Electronic Archives at the Royal British Columbia Museum, offered some practical tips. In his e-mail, he affirms the susceptibility of RAID to a simultaneous disk failure in the array.
This really does happen - I know from experience with a storage unit that I was "required" to use, despite warnings about this very problem. Two drives failed and it was not possible to recover the data from that unit, it all had to be pulled back from other media.
Dr Feli Galker of Chief Group, an Israeli company specializing in business continuity (BC), agrees. "We've been advising everybody for ages NOT to buy all RAID drives together."
Guidelines for a storage system
For a production storage system, Brant recommends the following guidelines, which are the culmination of his experiences at the Royal British Columbia Museum. Brant thinks that it's pretty good for bit-level archival, but is interested in suggestions for improvements. It might prove to be an overkill for some of you - but it's always good having a good reference design to start from.
- Start from a completely stand-alone system that's not connected to any external network, with servers and terminals in rooms that are locked or under physical control.
- Primary storage is magnetic disk, two raid 6 arrays (with hot spares, SATA drives from different lots) that are then mirrored using Sun's ZFS. They undergo regular verification using "zpool scrub" to ensure that the contents of the mirrors have not been corrupted.
- Our second media copy is written out via tar to pre-conditioned magnetic tape, and then read back and verified against the MD5 values. This copy goes offsite into a records storage facility, but within the same city.
- Our third media copy is 5 1/4 " Magneto Optical Disk, using a UDF 1.02 file system, again with each file being read back and the MD5 values verified against the original. These disks are soon going to be stored in an entirely different geographic region, along with a spare MO drive or two.
- Each of the media copies contain an inventory of the MD5 values for what they contain. Separately (in an RDBMS on a separate server), we store the MD5 checksum values for each file on each media type, and each time they are verified.
- New records/files added to the system go into a quarantine area for one month and are virus checked with up-to-date virus definitions.
- The server is running UNIX in order to be less susceptible to viruses that typically target other operating systems.
Tracking for business continuity
If you read through the various guidelines for a storage system earlier, you would have noticed that the verification of the digital data is not assumed. Instead, the MD5 hash of every single file is computed and stored separately, allowing data to be independently verified as necessary.
BOS, or Backup proxy Server, by Chief Group goes a step further with its focus on data restoration and backup as a means, not the aim. The idea is that recovered data is immediately ready for use, instead of being a hodgepodge of disparate files.
According to this web site:
The software tracks deleted files and reports them, thus making system administrator aware of file (intended, accidental or malicious) deletion.
Further, BOS enables Deleted File Management, keeps track of and allows the retrieval of deleted files. BOS is in fact a toolkit for administrators and contains many more useful features.
BOS is free for home users. It's sold by data volume to enterprises. You can submit a free download request at http://www.bos.co.il/150163
Note that while an evaluation of BOS is in the planning, I have not personally used BOS yet. So my recommendation should not be construed as an endorsement - it's simply a recommendation.
Paul Mah is a writer and blogger who lives in Singapore, where he has worked for a number of years in various capacities within the IT industry. Paul enjoys tinkering with tech gadgets, smartphones, and networking devices.