Disaster Recovery optimize

Facing a data apocalypse: The limitations of digital storage

Donovan Colbert explores his deepest fears about what could happen to all that data being backed up. Will imperfect methodologies have to do or are we facing a data disaster?

At the TechRepublic Live 2011 event, attendees voted on topics for "Unconference Break Out Sessions" where attendees held round-table discussions on a variety of subjects. I decided to attend an Unconference on "Is Tape Back-up Dead? - and B2D and Off-site Replication solutions". It was very informative, and TechRepublic contributor Rick Vanover brought an incredible depth of knowledge and insight to the table.

We discussed many of the pressing concerns on the minds of IT professionals weighing the benefits and liabilities of traditional tape versus backup-to-disk and off-site replication solutions. Among the topics we discussed were the challenges with managing tape, dealing with testing data restoration ("Backup isn't the problem, restoring is the problem."), the cost dilemma between tape and backup-to-disk, and the challenges with moving large amounts off-site and restoring back to the data center.

The shelf-life of backed up data

After exhausting most of these discussions, Rick brought up the real Achilles Heel of B2D and off-site storage solutions. For long-term data archiving, disk backup is still prohibitively expensive for most organizations. Tape-backup still remains popular as it is the most affordable way to achieve long term archiving of historical data for a company. Shortly thereafter Rick commented that most organizations' archive solutions do not have a meaningful plan for being able to restore those tapes 10 years or more down the road. I agreed completely with his observations.

After returning from the conference, I met with one of my lead-engineers who is also my primary backup administrator. We discussed the various events and discussions of the TR Live event, eventually coming to the backup discussion. Because we work in the health-care industry, backup and long-term archival are constant themes in our day-to-day work. As I related the various topics we covered in the Tape Backup breakout session, my engineer responded, "The real danger though is that most tapes don't even support the lifespan for data that most companies want to achieve in their archival policies."

He is right. It is hard to nail down the life expectancy of tape media, a fact that is further confused by the use of ambiguous terminology in the tape industry. Does the 25-30 year "shelf-life" and lifetime warranty of my LTO-4 tape cartridge apply to only the tape itself, or the data that is stored on it? Ask a half dozen industry experts, and you're liable to get a half dozen different answers to this question, and all of them are going to qualify the statement with conditions and exceptions that may shorten or extend the meaningful life of the data stored on a magnetic tape. Most frequently, you'll hear backup engineers claim that for practical purposes, data must be migrated frequently from older media to newer media, if only to ensure that the data is stored on a media that is modern and can be restored from archive if necessary.

The complexities of maintaining backed-up data for archive in a digital format are fraught with all kinds of potential for oversight and error. It isn't practical to have a human verify each bit of data manually, but my experience has been that the perpetuation of "Garbage In/Garbage Out" corrupted data can render backup unusable even when all automated systems claim the data is in pristine condition. The volume of data is so large we simply must rely on automatic verification to ensure data integrity. That is a problem in itself.

The limits of data archiving

There have been countless times in human history where disaster or war have destroyed countless priceless works of human knowledge, setting humanity back hundreds or thousands of years. The sacking of the library of Alexandria is among the great historical examples of a tragic loss of collected works of human knowledge. Since man first began recording knowledge, the destruction of that knowledge has been a great concern to mankind.

With the advent of the Internet, the amount of data that society creates has seen explosive growth, especially over the last 10 or 15 years. Google built a global business empire in being the best at cataloging that information and delivering the wheat of the information explosion from the chaff - and there are still ongoing discussions about intellectual white-noise drowning out the truly valuable information as we continue to produce data at an accelerating rate. The Wikipedia article on "The Information Age" gives some impressive statistics that help conceptualize the explosive growth of information over the last few decades. According to Wikipedia, the amount of data that exists today is about 98 times what existed in 1986.

Much of that information solely exists in digital format.

Isn't that disturbing when we think about the limitations of current methodologies for backing up digital content? We may lose more information in the next 200 years due to bit-rot and bad backups than has been created in the entire 2000 years preceding us. In fact, I think it is not just likely, but probably inevitable. Somewhere today, someone may be creating a digital work of art in GIMP or Photoshop that has the potential to be the Mona Lisa of our era, a timeless work of creation that could exemplify our time and society. But the medium on which this work of art has been created is far more fragile than rice-paper, let alone canvas and oil. Such a work will need to endure the perils of being stored electronically as a collection of bits, most likely on a very unstable magnetic medium. Unless it is constantly copied, the medium it exists on is likely to become obsolete and inaccessible, even in the unlikely case that the data contained on the device remains pristine and non-corrupt. If the value of such a creation is not immediately recognized, the chance that it will survive to ever be discovered are remote. The preservation of the best endeavors of human creativity and intellectualism rely increasingly on our ability to proactively identify and protect these things in the present and to continuously preserve them and their accessibility into the future. Despite this, we rush to digitize ever more information. Everything becomes stored in digital format: books, magazines, music, art and every other piece of human creation and knowledge.

If there are countless societies throughout the universe, I think this must be a peril that each society faces at some point, along an evolution of risks that must be met and overcome as the technical knowledge of a society continues to grow. The dangers of losing our society's collected knowledge and creation via the imperfect means of digital storage seems to me to be as critical a time in our history as the dawn of the nuclear age. I wonder how many worlds may exist where natural disaster or war have left the burnt out skeleton of a society where the knowledge that they once had is rendered inaccessible because it was only stored in digital format?

Right now we are rapidly approaching an event-horizon where the vast majority of human knowledge is solely digital. Should we reach that point, we will be flying without a safety net until we figure out a long-term, reliable, standard, and persistent way to protect and recover that information. As this relates to your business or IT shop, the key question to ask yourself is where your organization is along this timeline, and how you plan to cope with it. We're rushing head-first in a drive to deliver paperless offices, but I don't think many of us have thought through the larger implications of that paperless future. The apocalyptic visions I offer above can affect your business on a smaller, localized scale. Many firms are already flying without that safety net, or waiting to see if the safety net they have in place is sound. The truth is that many won't know until they have to put these solutions to the test - and if they were wrong, it will be too late.

Do you have a solution that is practical and manageable for long term archiving of your organization's digital data? What is it? Please share your thoughts and strategies in the forum.

About

Donovan Colbert has over 16 years of experience in the IT Industry. He's worked in help-desk, enterprise software support, systems administration and engineering, IT management, and is a regular contributor for TechRepublic. Currently, his profession...

55 comments
coldbrew
coldbrew

I worked for a company that kept backups, both monthly and yearly on Tape. At the time we had several different tape types: LTO2, VXA, and DAT. Eventually the drives were sold or broken. What I have learned is that LTO is tried and true. As long as the drive is good, you have a good chance of restoring data. I got fired when after requesting for 6 months a new drive, we were not able to read data from a critical servers backup.

JohnOfStony
JohnOfStony

We gave up tape backup in 2004 when our one tape drive failed and I then discovered that it was obsolete so we couldn't get a new one. I looked at replacing it with a 'modern' tape system but the variety of 'standards' and the rate at which tape backup technology was changing told me in no uncertain terms that tape was not worth the risk. Apart from the fragility of the tape itself and the way the drives rapidly become obsolete there is the time taken both to make a backup and to search and restore. We've been using hard disc backup ever since and on the infrequent occasions that we've needed to recover data, it has been quick and painless. As for the cost, we've found it's no dearer than tape taking everything into consideration, like the absence of an expensive tape drive and the man hours needed to find a particular file and restore it, not forgetting the need for regular maintenance of the tape drive. As a business, if the data's more than 10 years old we've finished with it. We haven't so far needed to recover any data that's on those obsolete tapes and I'm now sure we never will.

lshanahan
lshanahan

Government regulations and legal actions only compound the problem. I worked for a department of a state government that backed up 4TB every single night. They had a grandfather-father-son tape rotation that worked quite well for years. Then due to a lawsuit or FOIA request (I forget which) that subpeonaed emails going back to the beginning of time before there even were computers, an order came down that backup tapes could no longer be reused. So in addition to the issues of media shelf life, there was now an issue of where to keep them because the original storage facility wasn't nearly capable of handling the sheer number of tapes involved.

dcolbert
dcolbert

And getting into the more fringe-metaphysical discussions surrounding this article. As we discussed, I stated that the ideal storage medium for long term, redundant, self-perpetuating and resilient data storage would be biological. Recovery becomes the challenge if you lose track of the technology used to extract the stored data. This sent us off onto a tangent about alien abduction (and data extraction through I/O ports - which goes a long way toward explaining why alien abductors are so fascinated with probing us). The other significant problem with storing data in a biological system becomes that we know that our own genetic code is subject to degradation and data corruption (we just call it mutation). Again, a constant effort to correct the data through occasional extraction, review and reinsertion would explain a lot about alien abduction theories. All fringe theories aside, though - I think that the most logical solution for persistent long term data storage is through biological computing, with the caveat that data recovery becomes the significant challenge with a system that advanced. Again, this ties into the suggestion that we could discover the ruins of an ancient race and be surrounded by their data and never even recognize it - let alone have the technology to recover it. At the very least, this would make a great SciFi story.

oldbaritone
oldbaritone

As we look at the need for backups and the overwhelming amount of data captured every day, perhaps it's time to take a step back and say "What NEEDS to be archived beyond 10 years?" Government regulations have placed many requirements for data retention on businesses; but often there is a designated retention lifetime. One of the major functions of an archivist is to decide what things need to be saved, and what things should be disposed. Is it really necessary to have perpetual fail-safe backups of every SPAM email the company received, forever? Interim developmental releases of products probably fall into the same category. OTOH, "final release" products probably should be included in the long-term backup strategy. Which data is worth maintaining in the long-term archive, and which data should be retained only for the statutory term? But by separating out the chaff, the size of the perpetual backup could be reduced significantly. Make a true archive, not just a catch-all forever and ever.

froelicc
froelicc

If we have data that we REALLY can't do without and / or think is at risk of being lost, we can always write it down on paper. Oh believe me, the irony of what I just said is not lost on me. I realize I just suggested going back to the storage method that the computer age promised to eliminate. Lucky for us that paperless society never emerged. So we can always just write important information down in.... Oh, what where those bound paper volumes called? Oh yes, BOOKS! If you take care of them they will last for a very long time with next to zero degradation. Magnetic fields have no effect on them, and if we lose all electricity, we just need to find someone who can still read. Just a thought...

Ron_007
Ron_007

There are no simple solutions. From my point of view it boils down to a simple cost / benefit decision. We as corporations and as societies have to decide what is vital to keep and what is not. Cost vs Benefit. Maintaining archival data is not cheap. If you use tape, even if you have already made the investment to keep the required hardware and software and controlled environment for tape storage, you also have to have a regular program of "rotating" the tapes. They cannot just be left sitting on the reels or they will get brittle and break when you read them 20 years later. That takes time and money to do. This article: https://www.computerworld.com/s/article/9218881/Start_up_to_release_stone_like_optical_disc_that_lasts_forever?taxonomyId=19&pageNumber=1 describes DVD type storage that may have a longer life. This may take care of the problem of tape and bit rot, but you still have the issue of maintaining hardware and software to read the disks and files. But there still remains 3 issues to be addressed. Unlike 4000 year old stone carvings and 400 year old books and painting which do not require technology to be read, our digital recordings require electricity and technology to be read. Our technology can be taken away by a man made or natural EMP. Another issue is simply the quantity of data we are producing. While some people may decide that much of it is not important, many will decide that a significant portion of the data is worth keeping for a long time. Finally, ask any archeologist or historian. They will place as much value on "Granny's shopping list" from 4000 years ago as they do on the most advanced religious or scientific principle from the same period. One of the reasons the shuttles had to be retired was the 5 aging military spec hardened IBM 360 mainframe computers used to run each one of the shuttles. Parts are getting scarce. I recently read a humorous but true story. A guy worked for a company that was using old 1980's PDP-11 computers. He buys parts from computer recyclers and at auction. Recently, he was in a bidding "war" at an auction. He lost, but found out the purchaser. It was NASA. They needed the parts to keep computers running for the Voyager programs. The issue of aging archival data is important and pressing. Compare it to the Y2K bug, setting the time frame at the early 1990's. The problem is real, a few people recognize it, it is not going away. Yes, I know from personal experience that Y2K was a real problem, I worked on real defects that would have caused our computer programs to fail or output incorrect data. I think that although the Y2K problem was blown out of proportion by "Y2K Chicken Littles", that we "survived" it with so few real failures represents a large scale software project success, on the scale of writing software for the US and Russian Space Race in the 1960's and 1970's. I haven't heard of many (any) US space software failures that caused loss of life. Although I concede that at least some of the unmanned test launch failures were at the root software related.

CodeCurmudgeon
CodeCurmudgeon

In August last year, things came to a grinding halt at my workplace: Our 35 terrabyte SAN failed. The systems group worked on recovery 24/7 reloading some 25 terrabytes of data from tapes for the better part of a week, when it became aparant that the repaired SAN was still faulty, and had to reload it all again after the vendor completely replaced the SAN. Not too horrifying. Indeed the public might not have even noticed the outage if not for one thing: Somehow our DBA somehow failed to see to it that the systems group was backing up a not teribly large, but entirely critical database which served as the index to our biggest and most critical system. It took another two weeks for the disk forensics folk to recover that database. Meanwhile as a developer, I did on-line training for upwards of two weeks 'cause all of the servers I used were off-line 'till the SAN was back up. Talk about keeping all your eggs in one basket. . .

DittoHeadStL
DittoHeadStL

If oil & canvas, parchment, etc. have withstood the tests of time, and you think something is important enough to keep for the ages, then keep using those trusted media. Print it out, laminate it, protect it from light, etch it into stone, if necessary. Your efforts will be appreciated thousands of years hence. Too much work? Then I guess it wasn't that important, after all. I know -- I'm a fuddy-duddy.

J
J

The fragile nature of information has been a concern to me since 2006, and I have invested a lot of thought into it since then. The Egyptians preserved their information in stone, and a lot of it has lasted over 3,000 years. The irony is that theirs was the simplest system whereas ours is the most complex system. For example, there are a lot more prerequisites to attain knowledge from our system: electricity, a computer, an internet connection. For the Egyptians, the prerequisite was literacy. Many people believe that our worldwide society will collapse in this century, and when it happens, we could end up without electricity and thrust into another Dark Age.

Snak
Snak

For at least 20 years this topic has been debated, discussed and deliberated upon, and two things emerge from those conversations: One, as the article says, is the means by which to read old data on outdated media. However there is another threat and as far as I can tell, there is no possible way of preventing it, or dealing with it if(when) it happens. Every so often (and sadly not predictable) the Earth's magnetic field flips and magnetic North becomes magnetic South. Geology tells us this happens frequently albeit apparently randomly (it cannot be randomly; there has to be a cause, but what that cause is we have no idea, and if we did, there's probably nothing we can do about it. We do not even know if it happens suddenly or over a few days/weeks/months/years, although it is suggested it's a sudden event, probably taking seconds). However it hasn't happened for a long time which surely means it's overdue. On the day that happens, wave bye-bye to your data, your electronics, your civilisation and, probably, most of your lives. You might think this apocalyptic doomsay has no relevance in this forum/discussion. I hope you can say the same tomorrow :). Perhaps the best solution would be to print out all of the data on the resin these new 3D printers use. If the sheets were thin enough and the data stored digitally in a standard format, the accumulated knowledge could withstand such a disaster. However, do encyclopaedias, not Facebook.

dogknees
dogknees

The simple answer is to always bring your data with you to any new medium. When I build a new PC, I copy the entire contents of the hard drive(s) of the previous one to a folder on the new one. So, everything is live and is backed up and I have all my history.

thegreenwizard1
thegreenwizard1

Why do you want to store all the garbage collected about you? The only reason we have to keep record is for tax' purpose, beside of it.... How many photos did you lost in your life? Do you really want to keep your first school's year booklet? No, the majority of data we keep actually won't have any value in 5 years. And even medical datas need to be actualized, so why worrying?

sboverie
sboverie

We have a lot of data that has accumulated digitally; there are some that needs to be preserved and there is a lot that we would not want to share outside of friends and family. A good example is the SPAM I get daily, I don't read it and other than a weird intrusion it is not worth preserving. Another take, NASA has a lot of data from the Voyager missions that has not been examined. It is not digital and it is recorded on tape. The interesting parts like the flybys to planets have mostly been analyzed but there are thousands of hours of data that has not been analyzed. The tapes are analog and the equipment is specialized so that it would be hard to read the tapes with any other equipment. The tapes are still readable but the amount of time to do so is limited. Is the unread data valuable? We can't know until it is read and even then it may be worth archiving for the future to examine with better equipment.

wizard57m-cnet
wizard57m-cnet

Just recently, in my home state, there was a criminal case where the jury requested a computer that had the hardware to read the old floppy disks that were being presented in evidence. I suppose they found one, I didn't volunteer mine! Without maintaining hardware indefinitely, what good does it do to have 20 year lifetime claims on the media? As for the next Mona Lisa being created in Photoshop...I don't think so! Yeah, we have multitudes of "would be artists" playing around with Gimp and Photoshop but hardly what one would classify as "art" worthy of a museum. Quantity does not always mean quality.

Neon Samurai
Neon Samurai

I've been thinking about this lately though it may be off topic. We have information recorded in stone four thousand years ago. Some cultures left behind near complete histories (re-learning lost languages not withstanding). Four thousand years after our cultural end historians will have a few building names carved into stone; pretty much everything else goes the way of unprotected papyrus. Paper, magnetic tape, disks and platters all rott with time. We don't really have much recorded in a way that will exist to be discovered by historians a hundred years after we go.

Lazarus439
Lazarus439

How do you handle off-site storage/backup for DR purposes? How do you address generational backups, i.e., how do you meet a request to restore a file as it was two days ago because it's been corrupted for two days but only just discovered to be so today?

dcolbert
dcolbert

Legislators create technology laws that seem to follow common sense principles to them - but are completely out of step with the practical limits of the technology they are dealing with. I suspect they think that if they *make* it a law, technology will be pressed to keep up and innovate solutions. Sometimes it might even work out this way - but I think more frequently what we get is a situation where IT employees are responsible for meeting regulations where there is no manufactured solution available. This is a huge problem in my mind, as it relates to many of the regulations in health-care - in particular considering that the wording of health-care regulations means that an individual can be held criminally and civilly liable for violations.

dcolbert
dcolbert

Someone above described the situation where an archaeologist values a payparus shopping list or recipe as much as an ancient tome of astronomical study. In your example, if we do not archive SPAM, then there is a gap for historians and archaeologists later trying to figure out the granular details of what our society was like. From a historical perspective, the fact is that snake-oil remedies in the 18th and 19th century give us a very specific insight into those times. SPAM is the snake-oil advertisement of our era. I've seen it used twice now as a "perfect" example of what data deserves to be lost. In fact, though it is reviled, I think it is actually an example of the kind of data that *needs* to be preserved, and is the least likely to. That is the problem with judging NOW what needs to be retained and what should be discarded. The trivial, the pulp, the disposable - may give later people the MOST insight into how our society really behaved, what our values were, how we thought and what our concerns were. That is *exactly* why this is such a big deal.

Lazarus439
Lazarus439

I don't know a whole lot about paper, but I'm under the impression that "plain" paper itself isn't necessarily that long-lived. If you're printing for posterity, would you not need to use archival grade/quality paper? Also, ever had toner lift of a page was stored in a plastic sleeve or just simply stick to the page on top of it? Likewise, does the el cheapo off-brand ink not fade over time? Or even the OEM stuff, for that matter?

dcolbert
dcolbert

Whenever someone comes to me, pulling their hair out, telling me they just lost 10 hours of unsaved work that must be done in 10 minutes and asking me if I can recover it, I always tell them... Save often, print hard-copies frequently. When we were driving back from Orlando, my daughter's friend decided to write story on my Android tablet. At one point, she hit the home button. My wife went back to the app menu and was going to re-load the entire word processing app. I told her not to. Instead, I had her go into the "recent apps" menu and bring back up the current session that had shifted to the background. This was fortunate, as the story had not been saved. After I got her back into the story, I told her to make sure to save it frequently and made sure she knew how. All while driving 11,000 pounds of rolling metal down the highway at 70 MPH. Printing, unfortunately, was not an option. ;)

AnsuGisalas
AnsuGisalas

and whatever this was made for would have to be in a vault anyway.

dcolbert
dcolbert

My organization is so intolerant of this kind of failure - and the time-frame you describe without my SAN would be crippling to us. The time to recovery with the solutions we have would cause an outrageous hardship for us, and the true costs associated with the redundancy and complexity required to thoroughly prevent such a situation from occurring are beyond our means. I'm totally aware of it, and my staff is aware of it, we've communicated to the executive staff, but I suspect they don't understand the potential gravity of the situation. Instead, we move forward with our fingers crossed playing the odds. I know my organization is not alone in this regard. There are countless firms in countless industries playing this game (and countless firms selling snake-oil solutions that claim to prevent it from happening). At some point, this creates an actual economic barrier to entering business - because a smaller firm simply cannot compete with the resources available to a larger corporation in delivering fully robust solutions. It drives us to business consolidation (cloud and hosted based solutions leveraging an economy of scale). I mean, the implications of this discussion are *huge*, and they affect EVERYTHING about how IT exists today, and how it will progress tomorrow. Data backup and restoration becomes the foundation on which all of IT evolves based on. Data-backup and security, anyhow. These two things make it even more difficult for the small, underfunded startup or entrepreneurial effort to succeed - and that leads to a world that stifles innovation, creativity and competition. I hate reading stories of SAN failures. They're a potential critical hull breach for my organization. SAN failure is catastrophic failure - and I dread the 2AM call that one of our SANs has crapped out completely on us. When I was watching the effects of the Tsunami on Japan, as the reactors were melting down and they were describing the efforts of the nuclear engineers to contain the damage - I felt complete empathy for those people. I've lived through some business-catastrophic systems failures as a responding/owning engineer and worked endless shifts without relief trying to recover. It is the worst thing about working in IT, by far when it happens. That is why this discussion at the TRLive event was so interesting to me. Oddly enough, I find that WHS is the BEST one-touch recovery solution I've ever encountered from protection of my personal data. Not the current version, but the previous one. I know that it still has liabilities (it is difficult to back IT up - *and* it *is* prone to corruption that can eat all of your backups) - but it is a step in the right direction. Quick, painless, intuitive, and with some effort, flawless recovery of your personal systems. I wish enterprise grade backup and recovery could be made this simple and bullet-proof.

maconrad
maconrad

How do you print CAD models, interactive maps, audio and video files, etc, etc, etc?

dcolbert
dcolbert

We still do not have tools that are capable of moving the biggest blocks used to construct the Great Pyramids - and even the best theories on how the Egyptians achieved this are contested. Likewise, Machu Picchu contains perfectly interlocking stones in irregular shapes that modern tools would be incapable of cutting so precisely - and these stones were placed at altitudes, in rugged mountain terrain, without accessible paths, - where even our most powerful helicopters would be incapable of assisting us. These are, of course, in well documented societies with lots of preserved knowledge. When Europeans first arrived in the new world, they sent back claims of discovering cultures far more *advanced* than European cities. In general, though - it looks like a common theme is that these societies tended to "Keep It Simple, Stupid" in their approach to many problems. I think this ties perfectly into this discussion, where the challenge in Information Technology is to deliver robust, reliable, redundant systems with fail-safe preservation of data - without making the systems so complex they become unsupportable. Every time I remove a single point of failure, it seems that it increases the complexity of my architecture exponentially. I'm constantly aware, and dissatisfied with this fact. My intuition is that this may create as many problems as it solves. For a long time I've claimed that a badly engineered high availability system often becomes a high UNavailability solution. My own practical experience confirms this. But the alternative in simplicity doesn't deliver the goals I need to meet, either.

maconrad
maconrad

Archivists have been thinking about these issues since the 1970's. See, for example, Thirty Years of Electronic Records. Edited by Bruce I. Ambacher. Lanham, Md.: Scarecrow Press, 2003. xix, 190 pp. Bit rot and storage media obsolescence are actually some of the "easier" issues that need to be addressed. One of the more "difficult" issues is the software dependence of much of the data. Since humans began keeping data digitally they have stored that data in tens of thousands of file formats that require particular software to make sense of the data. Keeping data in formats that can be read by current software/hardware platforms is orders of magnitude more complex than keeping the bits intact and on current media.

JohnOfStony
JohnOfStony

Memory cards don't store data magnetically so they'd be immune to any magnetic disturbance. What would be far worse would be a nuclear bomb and its electromagnetic pulse which would wipe everything, even the BIOS in your PCs. I do like your 3D printer idea but we'd still need something to read the data. I would go for straightforward alphanumerics and diagrams but reduced to a very small scale on a par with microfilm then all you'd need would be a bright light and a powerful magnifying glass. The only negative aspects of that approach would be the cubic miles needed to store the media and the retrieval system.

nustada
nustada

Anyone who works with sensative equipment can tell you magnetic poles are shifting all the time, and it hasn't caused the end of the world.

dnox1978
dnox1978

- Historical reasons - To avoid reinventing the wheel again and again -To know what one once in the time-based decisions and why, for example, NASA went to the moon - Or to know why certain laws created and how they changed in relation to the old laws - and to at least try to avoid the same mistakes committed again and again - economic reasons but if you only store garbage, congratulations then at least this is not problem for you.

belli_bettens
belli_bettens

Have you been to a museum lately? It's not all about oil paintings and ancient sculptures anymore. You also have photo exhibitions, all most likely edited in photoshop. And apparently it's considered art. Remember that many paintings that are 'famous' today were not more than rubbish when they were created back then (100's of years ago). That's the thing about art :-) don't judge too soon...

Spitfire_Sysop
Spitfire_Sysop

I think your view is limited by your experience. Real digital artists paint with digital paintbrushes. With the power of CTRL+Z they can make their work perfect. It is possible to make amazing artwork with the right equipment: http://www.wacom.com/

Alpha_Dog
Alpha_Dog

If we used corundum for the disk material and a noble material for the medium, optical disks could last, but would cost in the tens of thousands for a DVD. A better choice is to store only what matters, for as long as it matters. We shouldn't archive juvenile jay-walking for 20+ years. Criminal records should be stored on media which lasts twice as long as the statute of limitations. If it's really worth keeping, print it and store it or etch it to steel.

AnsuGisalas
AnsuGisalas

I guess it wouldn't be so difficult to make a read-only disc that doesn't rely on something that degrades... but how much capacity would it have? On another level, I was looking for a game data editor for Master of Magic a while ago... I remember that there was one which could change the terrain and resources of the map, but now I can't find it. Gone with the wind...

dcolbert
dcolbert

Generational backups are a great example. If I do nightly full backups of every change but do not retain historical check-points, the issue you describe is not just likely, it is inevitable. When you think of *that*, it makes the volume of data that is already huge exponentially difficult to manage. Dedupe helps in a situation like this (where you only have to store the revision CHANGES and can create each checkpoint from a base document) - but it still means that you're storing multiple revisions of each document for archival - and for how long? 10 years from now you get an eDiscovery request for a *particular* revision of a document, and you can only provide the final version, when the pertinent edits have been conveniently removed. You're going to face fines. Just ask RJ Reynolds and Phillip Morris. Backup is a nightmare.

mvirard
mvirard

I think it is a mistake to compare the plight of current archaeologists probing civilisations long gone with the future predicament of, say, 4th millennia archaeologists. The reason is the incredible difference in the amount of useful material available to the two groups of archaeologists. Even if we ditch all the spam (we can still keep "samples") the future archaeologists would still be able to reconstruct the entire phenomenon just from the many indirect references that we are likely to keep. We can also play safe and condense huge volumes of spam into statistics and significant samples, and get rid of the duplicates. Let's face it: typical mass spam is mailed to millions of recipients. Do we need to save millions of identical emails? I think you know the answer.

dcolbert
dcolbert

You can trust the media more than magnetic media. Thermal printing, some inkjet printing, and certain delicate papers may be problematic, but if you're using environmentally controlled storage (or just common sense), you can dramatically increase the lifespan of "hard" media. In healthcare, we recognize that practices have lost complete paper-records for patients due to fire, flood or other natural disaster (or just lost paper files, completely). My argument is that in general, hard-media is more durable than digital media, especially for long-term, environmentally controlled storage and archive.

dcolbert
dcolbert

Tried to download and read an Amiga "HAM" image in native format on a Win7 or OS X machine lately? Emulation to the rescue, I guess... I think in general most file formats, no matter how obscure, can generally be recovered today, and the fact that we've settled in on some standards that are pretty well universal helps - but I basically imply the same thing above when I say, "What if you need a WordStar file written on a Kaypro CP/M disk" above. At the very least, this is a good argument among FOSS proponents regarding the superiority of Open Source formats.

dcolbert
dcolbert

From the Wiki: "Memory wearAnother limitation is that flash memory has a finite number of program-erase cycles (typically written as P/E cycles). Most commercially available flash products are guaranteed to withstand around 100,000 P/E cycles, before the wear begins to deteriorate the integrity of the storage.[8] Micron Technology and Sun Microsystems announced an SLC NAND flash memory chip rated for 1,000,000 P/E cycles on December 17, 2008.[9] The guaranteed cycle count may apply only to block zero (as is the case with TSOP NAND devices), or to all blocks (as in NOR). This effect is partially offset in some chip firmware or file system drivers by counting the writes and dynamically remapping blocks in order to spread write operations between sectors; this technique is called wear leveling. Another approach is to perform write verification and remapping to spare sectors in case of write failure, a technique called Bad Block Management (BBM). For portable consumer devices, these wearout management techniques typically extend the life of the flash memory beyond the life of the device itself, and some data loss may be acceptable in these applications. For high reliability data storage, however, it is not advisable to use flash memory that would have to go through a large number of programming cycles. This limitation is meaningless for 'read-only' applications such as thin clients and routers, which are only programmed once or at most a few times during their lifetimes." Flash memory is non-volatile - but is still prone to failure and data corruption. People who store all of their pictures on their huge SD card in their camera are making a particularly risky gamble. In my experience, I trust magnetic memory more than flash memory.

Snak
Snak

but not actually reversing.

JohnOfStony
JohnOfStony

dnox1978 wrote "To know what one once in the time-based decisions" What does this mean?

dcolbert
dcolbert

The fact is that people consider certain mediums, at certain TIMES, unworthy of being called art is part of the problem. The fact that society may later change their mind is the other half of this equation. The fact is that digital media *is* more fragile than rice paper. I dabble in art - and I use mixed media. I'll create a sketch on pencil and paper - then alter and enhance it in significant ways that expand the richness of the original sketch, using digital tools. Imagine if all we had left today of the Mona Lisa was the original studies and sketches and the under-sketch before the paint was applied. We would have lost so much of the full experience of this painting. That is the risk with dismissing digital art as somehow less valuable than traditional media - and I think it is a serious issue that does exist at this moment in time. Widely - we do not consider a lot of digital creations "art" - but some of these things being created may become masterpieces down the road. Actually, I think it is inevitable that this will be the case. When moving picture shows first became accessible - many dismissed it as not being "art". Now there are widely recognized examples of cinematic artistry that are considered classics and masterpieces of human expression. The *medium* isn't what matters - the vision that is conveyed is what separates art from pulp. Right now, widely, I think society dismisses digital creation as completely pulp. I think 100 years from now, we might look back differently.

Lazarus439
Lazarus439

That's the very large rub. NASA erased the tapes with the high resolution video of the first moon landing. In hind sight, this was clearly a colossal foul up, but it wasn't even discovered for most of 30 years. Hindsight is easy; foresight nearly impossible...

HAL 9000
HAL 9000

To long term preserve things is Ceramics. The problem is even if we can make something to make use of these Plastic Ceramics which can survive many of the possible Destructive Influences of the future how would anyone who comes latter know how to read them? This type of discussion kind of reminds me of the Tribe of Children in Mad Max something 3 I think but I'm not really sure and am not interested enough to look it up. They have the Storage Material but no idea of what it actually is or how it works. But out of that same movie the Compressor Rings of the 747 Engines would be the only thing to last longer than a few decades. Col

dcolbert
dcolbert

There are already societies that exist to try and preserve the code for classic video games. Many of these pieces of IP exist without ownership, because their companies have long since folded and disappeared themselves. Some titles are lost, or versions were lost, before people realized what was happening. Some of these titles were only realized to be important milestones far after the fact - so the precedent for the scenario I describe above already exists. It has happened, and I believe it is happening right now on a scale larger than we suspect. The question really becomes (and someone has already touched on it below) - what data *is* disposable and expendable? At some point, perhaps the question comes to, "Is this data worth the effort to preserve it". On the one hand we create disposable data and disregard it in a constant cycle when it is no longer useful to us. Is it REALLY necessary as a society to have working, copies of every word processor for every PC ever made? We become data pack-rats, data-hoarders, holding onto every minuscule bit of data every created on the off chance that some day in the future we may need it again. We become so over-whelmed by the sheer volume of data that we cannot manage it. On the other hand, we treat data as disposable, and we lose something critical - a revolutionary theory written on a Kaypro CP/M system using an obscure, early version of WordStar that could save humanity from an early demise. Tough calls - I'm not certain what the answer is from the scope of society. My more pressing concern though, is how do I apply these questions to preserving the data in my own organization? This is something I deal with constantly. My organization has a hoarder mentality when it comes to retaining data. I've purchased 2 SANs in the 4 years I've been working here - the last one was 24TB usable - and we're expanding that to a full 56TB usable soon - and still I worry about storage and long term archival. Inevitably, the data that I dispose of is the data that someone desperately needs 4 weeks later after 7 years of neglect. As a personal example, when I first started working here, I inherited 2 banker's-boxes full of old paper files, manuals, and disks from the two previous IT managers. I held onto these boxes of flotsam for 2 years, before I finally went through and weeded out what seemed unnecessary. It wasn't 6 months later that my superior needed a disk, one that I was certain I had seen, and pretty sure I discarded as so outdated as to never be useful again. Tracking down a replacement was virtually impossible. These are the nightmare scenarios that trap us between being buried under a literal mountain of historical archives, or thinning what *is* critical from what is not - and who decides, on what criteria? Data retention becomes one of the most important aspects of working in the IT industry - and things are only going to get harder as information continues to explode.

Spitfire_Sysop
Spitfire_Sysop

What we need is a metal disc encoded much like a vinyl record. It would be excruciatingly slow to produce and prohibitively expensive. The benefit would be that it would maintain it's data until physically damaged.

dcolbert
dcolbert

Which examples of SPAM *do* we keep, if any? Is just incidental reference to the phenomenon ENOUGH, or should we preserve specific examples? Which ones? Canadian Viagra ads? Knock-off Designer Watches? Chain letters? Microsoft Tracking e-mail hoaxes? A few examples of each? Sure, everything you say here is reasonable and actually seems fairly self-evident. But our perspective is biased by being in the moment of time when it is happening. Looking back at our society 50 years, 100 years, 1000 years from now, the things that are most telling about the nature of our society might be pretty hard to pick out *today*.

013gonzo
013gonzo

Try to read WordPerfect for DOS documents. You will receive a lot of garbig character, but when you take your old 3,5" diskets with WP5.0, install it Excelent everything is ok! Dragan Sponza Srbija

dcolbert
dcolbert

I could write my blogs in Spanish. They wouldn't be very good, and there would doubtless be lots of sentences that made perfect sense to me, that a native Spanish speaker would be completely puzzled by. In fact, sometimes I get done with a blog I've written in English, read it back, and find myself wondering, "what exactly was I trying to say in that traffic-accident of a run-on paragraph..." Tech Republic actually has a large foreign audience in countries where English is *not* the primary language. Most of our readers from these countries have a better working grasp of English than 98% of English speaking Americans have of any second language (myself included). To be fair, though, that 98% number is probably skewed by Native Spanish speaking Americans who are also conversationally fluent in English. They've learned English because it makes them marketable and competitive with American workers. To me, this is a symptom of why we're not retaining jobs in the United States - and losing jobs to nations where employees will make tremendous extra effort to make themselves attractive as potential employees. I get frustrated and have posted similar responses to yours to forum responders, John. I'm not lecturing you on that - so much as exploring the reasons why we get posts from readers who don't have a complete mastery of English. The truth is, as someone who struggles with being semi-fluent in a second language, I have a lot of respect for anyone who tries. It isn't easy.

HAL 9000
HAL 9000

Where they are digging Roman Town/City/whatever. A shopping list written on the wall is considered as one of the most important finds. What someone needed to buy each week to them would be nothing but 1,600 years latter we find in interesting and it certainly helps us understand how they lived at that time. ;) Col

dcolbert
dcolbert

In The Road Warrior, Beyond Thunderdome - the tribe of children in the oasis has an LP record that the tribe shaman carries with him hanging from a bamboo contraption. It is magical and sacred, but the only way that they can access the data on the disk is through spiritually through the interpretation of the Shaman. They can't actually play the record back to hear what is on it. That is my analogy above to finding the shell of an advanced society and not even recognizing their storage medium, let alone being able to access it. In a widely metaphysical sense, it could be possible that this kind of data is accessible and preserved all around us even now from some previous advanced society that went before us - we simply can't recognize it for what it is. Atlantis theories, Alien Astronaut theories, lots of fringe science may *flirt* with these kind of ideas. Perhaps the "great" ancient societies were actually dark ages after the fall of truly great societies, and that is where anachronistic gadgets like the Baghdad Battery and the Antikythera mechanism were simply relics of that knowledge as it faded from human memory. Anyone interested in the philosophical aspects of this, I'd recommend reading the Canticle For Liebowitz series - which deals broadly with this kind of topic.

dcolbert
dcolbert

Thanks for the link. Unfortunately, the solutions like these are still beyond bleeding edge and outrageously expensive. We need solutions like this at commodity prices and infinitely scaleable to effortlessly meet expanding storage consumption. 20 years ago 4MB of ram and 40MB of HD on a machine was bleeding edge. In the past several years we went from 80GB hard drives to 3TB hard drives. How soon does this solution become a relic, something comparable to an 8" floppy or an MFM "Winchester" hard drive? Moore's law is a double edged sword.

Lazarus439
Lazarus439

You also have not loose the disk and maintain the technology to read the disk. There's are a couple of highly illustrative examples of NASA losing, or coming very close to losing a lot of data from the moon programs because they kept the tapes that stored it, but not the drives that could read the tapes. Indeed, NASA apparently has lost the tapes with high resolution video of the first moon landing. See http://en.wikipedia.org/wiki/Lunar_Orbiter_Image_Recovery_Project; http://www.washingtonpost.com/wp-dyn/content/article/2007/01/30/AR2007013002065.html