Donovan Colbert explores his deepest fears about what could happen to all that data being backed up. Will imperfect methodologies have to do or are we facing a data disaster?
At the TechRepublic Live 2011 event, attendees voted on topics for "Unconference Break Out Sessions" where attendees held round-table discussions on a variety of subjects. I decided to attend an Unconference on "Is Tape Back-up Dead? - and B2D and Off-site Replication solutions". It was very informative, and TechRepublic contributor Rick Vanover brought an incredible depth of knowledge and insight to the table.
We discussed many of the pressing concerns on the minds of IT professionals weighing the benefits and liabilities of traditional tape versus backup-to-disk and off-site replication solutions. Among the topics we discussed were the challenges with managing tape, dealing with testing data restoration ("Backup isn't the problem, restoring is the problem."), the cost dilemma between tape and backup-to-disk, and the challenges with moving large amounts off-site and restoring back to the data center.
The shelf-life of backed up data
After exhausting most of these discussions, Rick brought up the real Achilles Heel of B2D and off-site storage solutions. For long-term data archiving, disk backup is still prohibitively expensive for most organizations. Tape-backup still remains popular as it is the most affordable way to achieve long term archiving of historical data for a company. Shortly thereafter Rick commented that most organizations' archive solutions do not have a meaningful plan for being able to restore those tapes 10 years or more down the road. I agreed completely with his observations.
After returning from the conference, I met with one of my lead-engineers who is also my primary backup administrator. We discussed the various events and discussions of the TR Live event, eventually coming to the backup discussion. Because we work in the health-care industry, backup and long-term archival are constant themes in our day-to-day work. As I related the various topics we covered in the Tape Backup breakout session, my engineer responded, "The real danger though is that most tapes don't even support the lifespan for data that most companies want to achieve in their archival policies."
He is right. It is hard to nail down the life expectancy of tape media, a fact that is further confused by the use of ambiguous terminology in the tape industry. Does the 25-30 year "shelf-life" and lifetime warranty of my LTO-4 tape cartridge apply to only the tape itself, or the data that is stored on it? Ask a half dozen industry experts, and you're liable to get a half dozen different answers to this question, and all of them are going to qualify the statement with conditions and exceptions that may shorten or extend the meaningful life of the data stored on a magnetic tape. Most frequently, you'll hear backup engineers claim that for practical purposes, data must be migrated frequently from older media to newer media, if only to ensure that the data is stored on a media that is modern and can be restored from archive if necessary.
The complexities of maintaining backed-up data for archive in a digital format are fraught with all kinds of potential for oversight and error. It isn't practical to have a human verify each bit of data manually, but my experience has been that the perpetuation of "Garbage In/Garbage Out" corrupted data can render backup unusable even when all automated systems claim the data is in pristine condition. The volume of data is so large we simply must rely on automatic verification to ensure data integrity. That is a problem in itself.
The limits of data archiving
There have been countless times in human history where disaster or war have destroyed countless priceless works of human knowledge, setting humanity back hundreds or thousands of years. The sacking of the library of Alexandria is among the great historical examples of a tragic loss of collected works of human knowledge. Since man first began recording knowledge, the destruction of that knowledge has been a great concern to mankind.
With the advent of the Internet, the amount of data that society creates has seen explosive growth, especially over the last 10 or 15 years. Google built a global business empire in being the best at cataloging that information and delivering the wheat of the information explosion from the chaff - and there are still ongoing discussions about intellectual white-noise drowning out the truly valuable information as we continue to produce data at an accelerating rate. The Wikipedia article on "The Information Age" gives some impressive statistics that help conceptualize the explosive growth of information over the last few decades. According to Wikipedia, the amount of data that exists today is about 98 times what existed in 1986.
Much of that information solely exists in digital format.
Isn't that disturbing when we think about the limitations of current methodologies for backing up digital content? We may lose more information in the next 200 years due to bit-rot and bad backups than has been created in the entire 2000 years preceding us. In fact, I think it is not just likely, but probably inevitable. Somewhere today, someone may be creating a digital work of art in GIMP or Photoshop that has the potential to be the Mona Lisa of our era, a timeless work of creation that could exemplify our time and society. But the medium on which this work of art has been created is far more fragile than rice-paper, let alone canvas and oil. Such a work will need to endure the perils of being stored electronically as a collection of bits, most likely on a very unstable magnetic medium. Unless it is constantly copied, the medium it exists on is likely to become obsolete and inaccessible, even in the unlikely case that the data contained on the device remains pristine and non-corrupt. If the value of such a creation is not immediately recognized, the chance that it will survive to ever be discovered are remote. The preservation of the best endeavors of human creativity and intellectualism rely increasingly on our ability to proactively identify and protect these things in the present and to continuously preserve them and their accessibility into the future. Despite this, we rush to digitize ever more information. Everything becomes stored in digital format: books, magazines, music, art and every other piece of human creation and knowledge.
If there are countless societies throughout the universe, I think this must be a peril that each society faces at some point, along an evolution of risks that must be met and overcome as the technical knowledge of a society continues to grow. The dangers of losing our society's collected knowledge and creation via the imperfect means of digital storage seems to me to be as critical a time in our history as the dawn of the nuclear age. I wonder how many worlds may exist where natural disaster or war have left the burnt out skeleton of a society where the knowledge that they once had is rendered inaccessible because it was only stored in digital format?
Right now we are rapidly approaching an event-horizon where the vast majority of human knowledge is solely digital. Should we reach that point, we will be flying without a safety net until we figure out a long-term, reliable, standard, and persistent way to protect and recover that information. As this relates to your business or IT shop, the key question to ask yourself is where your organization is along this timeline, and how you plan to cope with it. We're rushing head-first in a drive to deliver paperless offices, but I don't think many of us have thought through the larger implications of that paperless future. The apocalyptic visions I offer above can affect your business on a smaller, localized scale. Many firms are already flying without that safety net, or waiting to see if the safety net they have in place is sound. The truth is that many won't know until they have to put these solutions to the test - and if they were wrong, it will be too late.
Do you have a solution that is practical and manageable for long term archiving of your organization's digital data? What is it? Please share your thoughts and strategies in the forum.