Security

Forget hackers: Complex networks doomed to Murphy's Law


There is an interesting story in the New York Times today, "Who Needs Hackers?" by John Schwartz. The article appraises network threats and recent high-profile outages, suggesting that while malicious hacking might be good fodder for disaster movies, the mundane problems inherent in enormously complex software and systems are more likely to trigger the massive failures of our nightmares:

"We don't need hackers to break the systems because they're falling apart by themselves," said Peter G. Neumann, an expert in computing risks and principal scientist at SRI International, a research institute in Menlo Park, Calif.

Steven M. Bellovin, a professor of computer science at Columbia University, said: "Most of the problems we have day to day have nothing to do with malice. Things break. Complex systems break in complex ways."

Air traffic control, customs and border security, communications, the electrical grid, voting machines, launching space shuttles -- they can all be taken out, or at the very least, seriously interrupted by a bug -- an innocently cascading network snafu. Is there any way to avoid spontaneous combustion of our complex networks?

The aforementioned Dr. Neumann thinks so. He is quoted in the article:

"If you design the thing right in the first place, you can make it reliable, secure, fault tolerant and human safe," he said. "The technology is there to do this right if anybody wanted to take the effort....

"We throw this together, shrink wrap it and throw it out there," he said. "There's no incentive to do it right, and that's pitiful."

Have we traveled too far down the road of complexity to turn it around as Dr. Neumann suggests? With everyone worried about terrorists, wars, global warming, and catastrophic natural disasters, it seems unlikely that many people will get on the "solve the design complexity" bandwagon. What do you think will happen? Will it take a major catastrophe to cause a revolution in software design and systems?

About

Selena has been at TechRepublic since 2002. She is currently a Senior Editor with a background in technical writing, editing, and research. She edits Data Center, Linux and Open Source, Apple in the Enterprise, The Enterprise Cloud, Web Designer, and...

13 comments
hoblotron
hoblotron

does it ever bother anyone that all of these huge computer companies are CONSTANTLY pumping out some new software that is better than the last version, or better than other competitors? Have we as humans finally come to the point of being so impatient that we need an upgrade of something every few months? The problem with software today is no one takes the time to finalize their product and make sure all the kinks are out and the proper security is in place. It's sad to say that we as a people are not smart enough or fast enough to catch up to our own technology. We are pushing ourselves to a point with our technology that we are not ready for, or able to admit we are not ready for. Our systems will continue to crash themselves until we finally learn to have enough patience to learn and perfect what we are doing. So dark is the ego of man!

lost in space...
lost in space...

Part of this issue may stem from the fact that the software development community has not been held, historically, to the same product quality and assurance standards as have been almost any other producer of commodities anywhere. As was stated, "...wrap it, ship it...". If this community was subject to the persistent scrutiny of product liability attorneys on a regular basis, I'm of the opinion that a great many things might change...

David.Jackson
David.Jackson

The quote 'If you design the thing right in the first place, you can make it reliable, secure, fault tolerant and human safe' presumes those responsible for the systems know what is 'right'. In my experience those responsible don't know how to properly specify or think out the implications of what they specify and therefore keep asking for modifications which frequently trash the best systems analysis efforts used at the start the project!

catseverywhere
catseverywhere

By all available evidence it appears to me voting machines are designed to be "hacked," as in stealing elections is their purpose....

CCrabtree
CCrabtree

The problem really lies not only at the developer levels, but at their management. People are always pushing for the cheapest, fastest way of doing the task, so often it becomes a matter of simply upgrading by parts, or stacking everything on top of the old in a traditional "dump" pile. Take here at work for example. We've got all sorts of old machines, from DOS running our Lathing machines, which something as simple as a boot sector virus from a floppy can take out. This results in downtime and having to replace it. Heck, I've had to make SATA hard drives compatible with DOS because we can't even pick up old enough hardware. If a person was given ample resources to redesign and deploy a system correctly, in the long run it could save allot of time, effort, money, and DR/BC planning. But the concern is always the up front cost. And building on top of that simply is cheaper. Nobody wants to invest a large amount up front, regardless of the long term savings. Management and government are too ingrained in the short term drawbacks to realize the net benefit.

NGENeer
NGENeer

And I've seen situations where, no matter how carefully I (or somone else) specified the details in the design specs, the sofware engineer(s) are absolutly sure they "know how to do it better." The result typically is a worthless product so the project had to started over. Of course the 2nd time around it was usually better because we could rub their noses in why it didn't work their way, and why we had specified certain things that they thought were inconsequential.

shardeth-15902278
shardeth-15902278

They got rid of failover WAN links and set their routers up with static route info. They determined that during the previous year they had suffered one failure due to a telco outage, and over a dozen incidents related to configuration glitches, routing problems, etc. So by eliminating all the 'high-availability features', they have made their network more... available. Downtime was reduced dramatically, as was cost. An interesting approach...

jmgarvin
jmgarvin

You can plan and roll out over the course of time so that the upfront cost doesn't seem quite as daunting. Triaging can go a long ways towards making that move a lot cheaper and a lot easier. Plus, you don't have to migrate over night, you can move stuff over time.

dcolbert
dcolbert

High availability done poorly inevitably becomes high unavailability. I've worked for organizations that had poorly implemented HA solutions that caused more downtime than uptime. I don't think that throwing away redundancy and high availability is the *solution*, though - doing it *correctly* is.

dcolbert
dcolbert

A delayed depolyment over time results in a sphegetti system of older and newer technology. The older technology creates holes in security and reliability that compromises your entire network. By the time you get around to deploying the last rounds of your "done right" enterprise, the earliest roll-outs are showing their age. The constant upgrade cycle is also a challenge, created artificially by the entire industry. If something works, and does the job good, you should be able to standardize on it. But "enhancements, upgrades and fixes" drive an endless upgrade process. I agree with the first poster. A deployment done right from the foundation results in a strong complex network. Hardware and software high availability and fault redundancy should be built into the foundation, not patched in as an after-thought after the first major and expensive outage. Management is short sighted because they are rewarded for quarterly performance and do not plan on being in the same position in 2 to 4 years, anyhow. This sets up a scenario where there is no incentive to design the infrastructure correctly. In fact, the incentive is just the opposite. Design it poorly, and 4 years down the road some new manager can get a bunch of brownie points for fixing the legacy mess you've left him.

shardeth-15902278
shardeth-15902278

I imagine it probably depends somewhat on the requirements of the individual company, whether they should or shouldn't dump HA.

jmgarvin
jmgarvin

Planning is key. Too many of us are just ready to deploy. If you delay deployment WITH THE PLAN IN PLACE, you are aware of what is coming and what needs to be done. * Not to mention that you can't always start from the ground up. Sometimes, you have to start in the middle and work your way out. I've seen too many deployments fail because of existing infrastructure that couldn't be changed AT THE TIME. Change management is a GOOD thing. * Edited to include