TechRepublic member dcolbert vents about his frustration and concern over the recent Ma.gnolia failure. Do you think the ma.gnolia collapse provides an important lesson in the "cloud revolution"?
I’ve been doing some research which led me to checking out the back-story on the ma.gnolia.com failure. If you are unfamiliar, ma.gnolia was a Web-based bookmarking service that recently had a catastrophic failure of their production database that was unrecoverable. The more I read about it, the more I see this as a shot across the bow for those who are rushing to embrace “the cloud.”
There are a number of shocking things about how the ma.gnolia collapse occurred. At http://corvusconsulting.ca/2009/02/ma-gnolias-bad-day/ you can find a blog entry titled “Ma.gnolia’s Bad Day,” written by Todd Sieling. There is a lot of focus on what a tragedy this failure was for Ma.gnolia and for the founders of the service. There seems to be a lot of remorse for how much effort and hope these founders placed into Ma.gnolia, only to have some “misbehaving hardware” vaporize all of that work in the blink of an eye — all the data “slip(ped) away into the unforgiving ether.” There is less than a paragraph dedicated to the idea, “I feel responsible for not having pushed for comprehensive recovery plans” and “riding on unchecked assumptions.”
Let’s stoke up the camp fire, tune up the guitars, and have a round of Kumbayha, because this situation goes beyond “not having pushed for a comprehensive recovery plan.” From the data I am able to put together on the Web, Ma.gnolia was designed with inherently flawed underlying systems architecture – and clearly had no reasonable backup and recovery methodology at all.
In the blog I linked to above, Todd alludes to the fact that the “anti-cloud” scored “1” with ma.gnolia’s collapse. What is ironic is that he misses that his own admissions are a bigger indictment against cloud-based technology than the ma.gnolia failure itself. It illustrates how people without the full body of necessary experience can “go into business” with a good idea, a little hard work, and no idea that they’re leaving port without enough life vests, or even any idea that life vests are necessary.
On the other hand, we — the “end-users” of these cloud-based services — have very little transparency into how these companies exist. At some point, I am certain MySpace, FaceBook, eBay, or other major online services or applications were probably at the same place in their life cycle – in a naïve, unadvertised “public beta/pseudo-production” state — a good idea that was not robustly executed, where the people behind the curtains were making it up as they went. I’m sure at some point having avoided unthinkable disaster, someone knowledgeable was called in, took a look at the potential disaster, and Had A Cow: “You’ve got to get this taken care of RIGHT now!”
The thing that happened to ma.gnolia is that the disaster happened BEFORE they got to this point, but after the point that they were big enough that their failure had an impact that was felt, that registered throughout the industry. However, no one cares because it was just social Web bookmarking. But it could have easily been ePHI medical data, or something else equally important that had been blindly trusted to someone offering applications via the cloud.
I’m all for attacking the problem, not the people, but the problem here is that the people did not know what they didn’t know, but they rushed in anyhow, and even now, they’re not really admitting, or possibly even understanding, what caused this. Instead, they want to focus on “letting the negativity dissolve while embracing the positivity of the community that supports them” — or some other “Humane interaction between technology and humans” kind of sappy sentiment.
But let me try not to get caught up in that flashing-red anger that strobes when I think of how Todd’s personal philosophy has caused such a high profile disaster for the tech industry. Instead, let me focus on this — in the future, there needs to be a way to “vet” startup companies, there needs to be transparency into their design – they should have SLAs in place, plus they should have disaster recovery and backup methodology abstracts published.
Todd’s blog doesn’t capture the root cause of the problem or provide any solutions. It far-too-conveniently almost-completely avoids confronting any of the actual causes or potential solutions to avoid similar situations in the future. That is perhaps the most troubling aspect of the ma.gnolia collapse — the fact that there is no clear indication that there were any real lessons learned for the parties responsible. I’m not throwing stones, because I live in a glass house. I’ve had my share of epic failures as an IT engineer over the years. They’ve all also been epic learning opportunities for me.
Well-intentioned people executed great ideas in horribly flawed ways, and then were very gentle with each other when the inevitable disaster occurred. “How could you know that playing with fire was liable to burn you? Let’s have a virtual group hug and ignore those ‘griefers’ who are calling you mean names.” They’re so caught up in the rapture of the social aspects of the applications they’re making and the warm fuzzy 70s era “social revolution” they feel as a buzz of this direction of Web design and application, that they’re missing the fundamental framework that lies underneath which is critical and cannot be avoided, changed, or ignored.
I personally don’t find a lot of confidence in the idea of trusting my confidential, irreplaceable data with people approaching the design of Web-based “cloud” apps with this philosophical outlook. Perhaps it is my experience with big Fortune 500 IT organizations, but no one gives out hugs and consolations over Latte Grandes at the local Starbucks when a new Web app disintegrates along with tons of customer data at MCI or Intel. They give termination papers and a walk to the door escorted by company security.
So, ultimately, the ma.gnolia collapse is an important point in the “cloud revolution” that anyone involved in IT should study for a number of reasons. These are just my conclusions, and I would encourage you to go out and find what information you can and study up and draw your own conclusions. I see the framework in pretty simple terms — great idea, horribly executed, with little end-user control or ability to feel secure that those steering the ship are qualified to be in the bridge — as something that can exponentially be applied across “the cloud”.
I’m interested in finding out if you agree or disagree, and if you can propose any solutions to make the cloud more robust, more secure, more trustworthy. Does it require self-monitoring and self-accountability, or will nothing short of legislation and regulation prevent things like this? Where is the balance? Let me know what you think.