Social Enterprise

Ma.gnolia points to potential storms in the cloud

TechRepublic member dcolbert vents about his frustration and concern over the recent Ma.gnolia failure. Do you think the ma.gnolia collapse provides an important lesson in the "cloud revolution"?
This post was written by TechRepublic member dcolbert.

I’ve been doing some research which led me to checking out the back-story on the ma.gnolia.com failure. If you are unfamiliar, ma.gnolia was a Web-based bookmarking service that recently had a catastrophic failure of their production database that was unrecoverable. The more I read about it, the more I see this as a shot across the bow for those who are rushing to embrace “the cloud.”

There are a number of shocking things about how the ma.gnolia collapse occurred. At http://corvusconsulting.ca/2009/02/ma-gnolias-bad-day/ you can find a blog entry titled “Ma.gnolia’s Bad Day,” written by Todd Sieling. There is a lot of focus on what a tragedy this failure was for Ma.gnolia and for the founders of the service. There seems to be a lot of remorse for how much effort and hope these founders placed into Ma.gnolia, only to have some “misbehaving hardware” vaporize all of that work in the blink of an eye -- all the data “slip(ped) away into the unforgiving ether.” There is less than a paragraph dedicated to the idea, “I feel responsible for not having pushed for comprehensive recovery plans” and “riding on unchecked assumptions.”

Let’s stoke up the camp fire, tune up the guitars, and have a round of Kumbayha, because this situation goes beyond “not having pushed for a comprehensive recovery plan.” From the data I am able to put together on the Web, Ma.gnolia was designed with inherently flawed underlying systems architecture – and clearly had no reasonable backup and recovery methodology at all.

In the blog I linked to above, Todd alludes to the fact that the “anti-cloud” scored “1” with ma.gnolia’s collapse. What is ironic is that he misses that his own admissions are a bigger indictment against cloud-based technology than the ma.gnolia failure itself. It illustrates how people without the full body of necessary experience can “go into business” with a good idea, a little hard work, and no idea that they’re leaving port without enough life vests, or even any idea that life vests are necessary.

On the other hand, we -- the “end-users” of these cloud-based services -- have very little transparency into how these companies exist. At some point, I am certain MySpace, FaceBook, eBay, or other major online services or applications were probably at the same place in their life cycle – in a naïve, unadvertised “public beta/pseudo-production” state -- a good idea that was not robustly executed, where the people behind the curtains were making it up as they went. I’m sure at some point having avoided unthinkable disaster, someone knowledgeable was called in, took a look at the potential disaster, and Had A Cow: “You’ve got to get this taken care of RIGHT now!”

The thing that happened to ma.gnolia is that the disaster happened BEFORE they got to this point, but after the point that they were big enough that their failure had an impact that was felt, that registered throughout the industry. However, no one cares because it was just social Web bookmarking. But it could have easily been ePHI medical data, or something else equally important that had been blindly trusted to someone offering applications via the cloud.

I’m all for attacking the problem, not the people, but the problem here is that the people did not know what they didn’t know, but they rushed in anyhow, and even now, they’re not really admitting, or possibly even understanding, what caused this. Instead, they want to focus on “letting the negativity dissolve while embracing the positivity of the community that supports them” -- or some other “Humane interaction between technology and humans” kind of sappy sentiment.

But let me try not to get caught up in that flashing-red anger that strobes when I think of how Todd’s personal philosophy has caused such a high profile disaster for the tech industry. Instead, let me focus on this -- in the future, there needs to be a way to “vet” startup companies, there needs to be transparency into their design – they should have SLAs in place, plus they should have disaster recovery and backup methodology abstracts published.

Todd’s blog doesn’t capture the root cause of the problem or provide any solutions. It far-too-conveniently almost-completely avoids confronting any of the actual causes or potential solutions to avoid similar situations in the future. That is perhaps the most troubling aspect of the ma.gnolia collapse -- the fact that there is no clear indication that there were any real lessons learned for the parties responsible. I’m not throwing stones, because I live in a glass house. I’ve had my share of epic failures as an IT engineer over the years. They’ve all also been epic learning opportunities for me.

Well-intentioned people executed great ideas in horribly flawed ways, and then were very gentle with each other when the inevitable disaster occurred. “How could you know that playing with fire was liable to burn you? Let’s have a virtual group hug and ignore those ‘griefers’ who are calling you mean names.” They’re so caught up in the rapture of the social aspects of the applications they’re making and the warm fuzzy 70s era “social revolution” they feel as a buzz of this direction of Web design and application, that they’re missing the fundamental framework that lies underneath which is critical and cannot be avoided, changed, or ignored.

I personally don’t find a lot of confidence in the idea of trusting my confidential, irreplaceable data with people approaching the design of Web-based “cloud” apps with this philosophical outlook. Perhaps it is my experience with big Fortune 500 IT organizations, but no one gives out hugs and consolations over Latte Grandes at the local Starbucks when a new Web app disintegrates along with tons of customer data at MCI or Intel. They give termination papers and a walk to the door escorted by company security.

So, ultimately, the ma.gnolia collapse is an important point in the “cloud revolution” that anyone involved in IT should study for a number of reasons. These are just my conclusions, and I would encourage you to go out and find what information you can and study up and draw your own conclusions. I see the framework in pretty simple terms -- great idea, horribly executed, with little end-user control or ability to feel secure that those steering the ship are qualified to be in the bridge -- as something that can exponentially be applied across “the cloud”.

I’m interested in finding out if you agree or disagree, and if you can propose any solutions to make the cloud more robust, more secure, more trustworthy. Does it require self-monitoring and self-accountability, or will nothing short of legislation and regulation prevent things like this? Where is the balance? Let me know what you think.

About

Sonja Thompson has worked for TechRepublic since October of 1999. She is currently a Senior Editor and the host of the Smartphones and Tablets blogs.

27 comments
dcolbert
dcolbert

Anyone aware of what is going on with Microsoft's Danger/Sidekick cloud based app and data loss? http://www.appleinsider.com/articles/09/10/11/microsofts_danger_sidekick_data_loss_casts_dark_on_cloud_computing.html The Cloud is good for what it is good for - there are great purposes for the Cloud. But it isn't going to enable us to all use low power thin-client Linux OS based machines that have no local apps or file systems. The people who see a future like this... are completely missing the mark.

jck
jck

Giving your computing and storage/retrieval functions over to a company to do for you is two things: 1) Simply another example of outsourcing that is going to fail miserably because it will eventually go to costing more than having kept in-house staff and facilities to do it. 2) Like asking someone off the street that you don't know at all/that well to wipe your bum for you and make sure they feed your baby ever 3 hours 26 minutes and 12 seconds. Some day, they will fail to do one of them. And if it's critical to things, then you will fail as well as your cloud computing provider. I give cloud 5-8 years before it dies. It's a fad. Your server on-site is always going to be more reliable than all the wires/cables of the internet between you and the cloud provider. EOD...QED

dcolbert
dcolbert

You're either right, or we're dinosaurs thinking like relics. The business logic of this tears me in two directions. From one perspective, if you can outsource, offshore, or otherwise move out work, tasks, or other responsibilities which are *not* related to your core business - to someone that does those things solely as their core business, there should be a business benefit. Lots of companies would like to turn paper documents into digital documents and store and retrieve them efficiently through electronic means. This is easier said than done - and most businesses are not document storage and retrevial businesses at their core. Lots of businesses take this role *on* internally, and try to implement solutions (and this, struggling with document management, goes back far before the digital revolution)- and they often fail or have very poor solutions implemented. So, from a business perspective, outsourcing to a document management company that does document management as their CORE business (and therefore has a built wealth of experience), and also has an economy of scale built into their operation suited to doing JUST document management - means that they should be able to do that role more efficiently and less expensively than your Widget company can do it in house. This is the MODEL argument for outsourcing, ofshoring - and really, the emergence of the Cloud. Clearly there are a BAZILLION counter-arguments as well - many of them just as convincing. I'm usually one of the first into the pool with the counterpoint arguments against the long term impact of overzealous outsourcing or offshoring. My guess, I suppose, is that balanced, moderate diversification is always the best strategy. You certainly shouldn't exclude opportunities that include outsourcing, offshoring, or Cloud based services. But you shouldn't count on them solely to be the secret leverage that gives you a tremendous business advantage. The rules of successful business do not change - the collapse of the dot.com bubble should have illustrated that to all of us.

Osiyo53
Osiyo53

So what's new? And why should anyone be surprised at such an event or occurrence? Similar has happened before and will undoubtedly happen again, many times. Even if one finds an outfit that seems to have their ducks all lined up in a row and to be doing things right. That only holds for today, at the time you did your checks and investigations. A change in ownership, management, or staff tomorrow can make a drastic change in the situation. After all, you have no direct and firm control of that service provider. Thus, while I myself do use a service that allows me to put certain data and files "in a cloud", I maintain a complete backup of all said data and files, directory structure, etc on local hard drive that I also then backup routinely. Despite any assurances made by whomever, I'm not about to believe that they're not subject to failures, Oops, bad planning, poor execution of the plan, momentary brain fart, or whatever. BTDT, too many times.

rmagahiz
rmagahiz

Ma.gnolia did offer users a way to export their bookmark data - I know, because I had a set about nine months old at the time of the crash. Ever since then I've been doing the same kind of thing with other projects of mine out in the cloud (blogs and social networking posts mainly) periodically. But I'm just an individual, not a company making use of the service with actual fiduciary responsibilities to worry about. With small amounts of data as we are talking about in the Ma.gnolia case it's really not hard to keep your own backups regardless of any representations by the service provider. But if we were talking about many GB of data out in the cloud, this can start to become unwieldy and expensive, especially if it had been the kind of data that would require a point-in-time restore.

dcolbert
dcolbert

If I was determined to back up all of my web-based and web-created content, even with Facebook increasingly becoming a centralized repository where a majority of it eventually flows to, it would be an incredible task to accomplish. In fact, I can guarantee that huge volumes of data that I have created since 1987, online, have simply disappeared into the ether - and some of it was really good stuff, too. I've probably written several large novels worth of raw data during that period, 1987 to present. And there are various archiving services out there that preserve a GREAT deal of it, actually. Additionally, I'll admit, the stuff that is MOST important to me, I do have multiple local backups and copies. But reasonably, for the typical casual user, the way we are moving to cloud based solutions, I think that it is probably irresponsible to try to shift the responsibility for data protection to the user - at least "completely". The service provider should retain the majority of responsibility and liability. I think some would argue that this will drive companies out of busiess, or stop them from entering the market. I'm not certain that this is a bad thing, considering the companies such a move would be liable *to* drive out of or away from doing business. But yeah, in this example, ma.gnolia is just an example, a canary in the coal-mine. This time it wasn't anything that bad. Exactly. This time. Which is why it didn't get a LOT more attention than it did. I think that it is a great opportunity for those of us who control the direction of this industry to pause and contemplate where the hype of "The Cloud" may be leading us, and what some of the very real risks down that road may be.

Osiyo53
Osiyo53

I trust them with NOTHING that's critical.

dcolbert
dcolbert

I tell our users, repeatedly, "Do not trust US, I.T., to back up your critical documents, and do not trust your technology to protect your critical documents. Make multiple copies, in multiple locations, with multiple revisions, and always have hard copies as well". They hate that - and they never do it, and they always lose important data, anyhow. But I make a best effort to protect their data where ever, when ever, and how ever I can.

tkwilson
tkwilson

Show me the budget and anyone can tell if its missing. Quality continuity of processing affects design, development, testing, integration and will add 10-20% to costs. Unfortunately, it is almost always bolted on after the first release is out because it is accepted as "something that can be done later". A risk that often gets successfully dodged but ultimately always comes home to roost.

Tony Hopkinson
Tony Hopkinson

That backups are a good idea? I refuse to believe that. This was a business decision, someone estimated the cost of doing it properly, versus the risk of not. They got it wrong. I'm all for not going mad on the redundancy front, but they appear to have had none. Stupidity should always be rewarded in my book. It as you say highlights a big issue with the cloud, it's alright getting an SLA, abstracts of processes etc. How do you verify though. Are you going to trust them, to hope that legal redress will cover you. Suing them because they put you out of or cost you a lot business, isn't going to get you it back. Especially if all your customers are suing 'you' for risking their assets with these muppets to save a buck. Risks multiply, theirs is yours, a critical point of failure covered by an assumption. To me that's no different to assuming your back up will restore, your cluster will failover, your three years old cable drawing is right. Using the cloud as storage as opposed to connectivity increases risk, and managing that could cost more than the short term unsustainable cost saving you may get from doing it. So why bother? That's not even touching security and legislative compliance, both of which are far thornier technical issues. I mean let's face it a kid burning a spare of the dvd he just bought just out performed these guys.

dcolbert
dcolbert

That they knew, and intentionally cut corners and simply lost on a gamble. But I don't believe it. I beleve that "infrastructure" I.T. has been demphasized and devalued throughout the industry, whereas "Development and design" I.T. has become the "rock-star" segment that gets the highest salaries, the best benefits, and the most attention. The problem is that in my experience, developers are only concerned in BKMs that apply to *development* - very few of them understand the BKMs of kernal/system level engineering. I'll admit, as a systems guy, I might be a little biased on this - and there is no doubt that I need input and assistance from the development team in order to design meaningful, robust data and disaster recovery solutions. But I think we can see throughout corporate America, DBAs, Developers and Programmers are the hot-ticket jobs - while companies try to figure out ways to offshore, outsource, automate, or downsize IT support, maitainence and engineering roles. The latter are seen as the "janitorial services" of the modern I.T. workforce. The thing is, when you have the developers, programmers and DB guys designing the entire enterprise - they do things that make sense from that perspective, that a true system-engineer/system-integrator would *never* allow to happen. And then you end up with a situation like this. The conflict that exists between these two groups in most large organizations is a good one. A decent IT engineering group should often block the development group based on security or engineering concerns. The constructive confrontation between groups makes for more robust solutions and deployments. Everything I've read about ma.gnolia indicates that this confrontation was not occuring there. It was a big group hug with no checks and balances - and it seems like they honestly just didn't even know enough to know that they were headed down a path toward destruction. And so, I'd rather believe that there was gross negligence going on than gross incompetence. The one doesn't seem as bad as the other. Greedy guys being cheap and evil and causing a failure, you've got someone to be upset at. People just being completely incompetent, that is kind of scary.

Tony Hopkinson
Tony Hopkinson

To build their infra-structure and advise them, that's their fault as well. Bet he/she was cheap. Unless we are talking a very small organisation, like one idiot, too many points of failure for me to accept it wasn't policy. I've seen lots of employers go into crossed fingers mode, I've seen more than a few techs not get their backup strategy exactly correct (in a mirror on occasion as well :p ) But nothing? No outage, no rollout failure, no hardware error, no config screw up, no OS update, no hardware change, no oops from development until biblical catastrophe... Really hard to credit.

Tony Hopkinson
Tony Hopkinson

The ones who want to make a killing off providing it keeping pushing it to myopic fools looking for a one off bonus or promotion though. To be fair if I was working in the provider indsutry, and the ethical bypass surgery succeeded, I'd be telling you I was trustworthy, control was irrelevant and you could get promoted by saving your company thousands as well. :(

dcolbert
dcolbert

I am not convinced that the push toward cloud based, subscription modeled computing will achieve the "reductions in cost" being promised. I think that the underlying infrastructure, hardware and personell costs are static - someone has to pay for them. You're playing smoke and mirrors with costs with a cloud model. Or you're sharing support and infrastructure costs in much the same way that ISPs or airlines oversell capacity in their industries. We know how that works out THERE, why would we believe that this kind of service model would work out any different in hosted computing solutions? Any reduction in cost is going to come at availability of service and support, or in sharing infrastructure (which means sharing capacity, bandwidth, throughput and other metrics that affect QoS). Once you work your way through all of that, you get to issues of "control" and how much of it you give up in this model. Once you've become dependent on hosted services for mission critical components of your business process, and you cannot easily move those back in-house or to another service provider without disrupting your business - then you're at the mercy of whatever rate schedule they want to put you on. I just don't see the wisdom of the cloud-based subscription hosted model of service delivery.

Tony Hopkinson
Tony Hopkinson

is the obvious logical outcome of the cloud service model. Anyone seen any of the people pushing this idea, you'll get more control, more options, better service?. No they are pushing cheap, cheap ventually always comes at the cost of reduced quality, such as backups, security and reduced functionality.

dcolbert
dcolbert

By the looks of it, ma.gnolia was a very small organization in an informal network or community of developers, users and "visionary innovators". And I'm sure there are some benefits of this kind of model, and that the creativity and atmosphrere of such an environment is probably something special to be part of. Especially if you enjoy that kind of sense of group belonging and identity). It seems a little cult-like to me - but if it works for people, it works. Intel seemed a little cult-like to me, too, to be honest. But the core of ma.gnolia itself looks to have been 3 to 5 people, none of whom seem to be dedicated full time to supporting the application. That is, they put up a web site, and let it run. Most of them had "day jobs" and other gigs, if I'm reading it right. And that is the problem. I know a couple of guys in California that were running web based operations like this. Just two guys, some hosted space, and some subscription based applications and services. From the outside, going to their websites and signing up, it looks just like any other big, legitimate operation - but the fact is, it isn't. These guys are talented individuals and I trust they're doing their due dilligence, but you simply can't tell, if you just show up at a site. And therein lies the problem with the cloud. It is potentially far more disruptive to the success of cloud computing than any challenges "thin-client computing" has ever faced - and this-client computing has been failing to make a major change in the industry since the early 90s. The alternative is to have a very narrow group of industry giants providing all of the successful cloud based applications and services. That doesn't sound innovative, competitive, or exciting to me. It sounds like giving up MORE control to MORE giant corporations. You idealist *nix types may come to miss the good old days of the "WinTel duopoly".

MyopicOne
MyopicOne

Not sure I agree with lumping the DBAs into the 'rock star' category after having been one for 14 years, but I do completely agree with your assertion that some staggering levels of incompetence occurred there. There is in fact a difference between a development DBA and a production DBA - and a good production DBA's native paranoia would never let a hardware failure crater a company. Or at least all the ones I know or trained. You must always be able to get it back. I remember each and every time - 4, in fact - one of my (up to 600) databases lost more than a minute or two of work - every one - and exactly why. That I'm still a DBA says something about who was at fault in each case: 1) Vendor contractor misued his company's software utilities and blew up a development database, then tried to fix what was actually unrecoverable for two days before reporting the problem. 2) Idiot coworker deleted Financial data and didn't tell anyone for a week. 3) Same idiot employee forgot to tell me one of her other databases had gone to production AND deleted data from it. No, I couldn't get her fired... 4) Got overruled on how to handle an index corruption error in an SAP database by the SAP Manager and Basis Admin. Basis Admin blew it up unrecoverably - fortunately not production. I'd say there's an even chance that bean counters forced bad technical choices to be made - and I'm seeing a lot of bad choices made in Corporate America these days.

dcolbert
dcolbert

As I was writing my response, I knew DBA was a bad choice of "job title" - you're absolutely right, my beef isn't with DBAs, it is generally with DB Developers. There isn't always a lot of distinction in the industry, but their roles are miles apart, and often in different departments, in the largest of organizations.

santeewelding
santeewelding

What they did doesn't bear any relationship to the big, slow trainwreck going on all around us.

boxfiddler
boxfiddler

I'm fighting against moving our data storage and tracking to 'the cloud' as I type. I don't particularly like the notion that - as with atmospheric clouds - weather is outside my influence, much less control. Thank you dcolbert. I'm forwarding this link to several people.

CG IT
CG IT

In almost all big enterprises, there are redundant systems that make sure there is no single point of failure. While this probably will be the case for Google, MS and others in the "Cloud Computing" genre, other startups will cut corners to cut costs and not build in redundacy. While I'm not a proponent of Cloud Computing I can see many businesses jumping on the band wagon because of the appearance of reducing costs. For some it might actually be a benefit. Like you, I have this negative gut reaction of storing personal documents and information in a public venue. Those guys were just plain foolish to not have built in redundancy and any type of Disaster Recovery.

biancaluna
biancaluna

Seen many small fly by night IT companies go belly up in the past 20 years, and some really foolish decisions from organisations. I have seen data loss, business damage and other disasters with non cloud based companies. I have worked for large Fortune 500 companies, and trust me, some of the WHAT THE moments I have had are frightening. Part of the assessment of any service provider should be disaster recovery, exit strategy, redundancy and more. Part of the assessment of any services providers should be YOUR disaster recovery, exit strategy, redundancy and more. The fact that your data, and therefore your business, is in the cloud should not and cannot preclude a business from a BCP managing those nasty What If scenarios. I suspect that this risk is present with any small, startup company. If you go for cheap, you get cheap. And that is a lesson I have learnt many years ago whilst working on mega deals. I am not saying that the cloud does not have additional issues, but you are presenting it as a black - white scenario and that is not realistic. Mea culpa those orgnisations who ventured into deals with those companies and did not do their homework. There are many companies who do a terrible job at storing personal documents, IP, research data and my personal information. I would actually rather trust some cloud companies than some banks, government departments and other semi government Agencies I have had the pleasure of working for.

Tony Hopkinson
Tony Hopkinson

is how do you know the third party is doing the job properly. You can always review your 'own' policies and practices and reevaluate the risks. But if you are using a cloud, you are assuming professional, as we both know from a myriad of experiences, that's not a good assumption to make even in companies that think they are doing the job properly.

dcolbert
dcolbert

I agree with the OP in this branch of the thread, that I do draw it as black and white when the risks are not *isolated* to the cloud. Face it, many organizations do not have the technology experience, or have people at the helm who make bad technology decisions that affect entire companies - and even fortune 500 companies are not immune. I worked for one of the "big 3" telcoms at one point, and their I.T. group member in charge of tapes came back from vacation to find tapes strung completely around his office because the data on them was useless. It was public humiliation, but it also meant that a data failure had occured that they couldn't restore from. They weren't happy with him. So, yes, this kind of thing can happen. I think your counterpoint is excellent, Tony. I think the other points I would make is, a failure in due dilligence in your own company, as you point out, is your own accountability, not some external source that you trusted and let you down. But beyond that, the scope or magnitude of impact that a cloud service can affect is so potentially much broader. A very popular cloud service could affect tens of thousands, hundreds of thousands, millions of individuals, worldwide. It could affect multiple corporations, instead of just your own. The two facts combined, the relative inability to transparently see the internal policies and execution of the hosted cloud based service firm, and the potential for much broader magnitude of impact, makes the cloud model very troubling, for me.

victor.gutzler
victor.gutzler

It might be necessary for the IT industry to set up an entity that licenses companies wanting to host cloud services. Standards would have to be defined for security, redundancy, and disaster recovery. There could even be levels of certification, where social networking services only need a "silver certificate" and medical records services require a "platinum certificate". But can we trust the industry to enforce itself, or do we have to legislate these standards through government? Who is going to seriously consider cloud services without some kind of credentialing?

dcolbert
dcolbert

There will probably be some catalyst issue, where there is large negligent loss of life or tremendous negative economic impact, before the Federal Government comes in and legislates regulation. I mean, all we really need to do is look back at any emerging industry over the last 100, 150 years to see how this model generally works. Someone pointed out somewhere that the best self-regulating industry around are the industry that makes pressurized tanks (Water heaters and Propane tanks). I don't know if I.T. has what it takes to rise to that level.

Tony Hopkinson
Tony Hopkinson

an initial unsustainable cost reduction leaduing to bonus, a killing on share prices or promotion.