Everyone’s had their day ruined by a computer crash at one time or another – but why does this happen in the first place? Peter Cochrane examines the causes and offers a solution to flaky tech.

How come technology seems to fail at the most critical times? There you are making good progress toward an important deadline, overcoming all obstacles and having a winning day, when – crunch – the printer stops functioning. Better still, your PC crashes. And then at that very instant the boss appears to ask how things are going and will he be getting his report in less than an hour?

What gives, is technology bating us or is this really the norm?

The answer to this sometimes frustrating conundrum comes in two parts – the understandable and the distinctly quirky.

First, the understandable. When we get married, start a company or make any major life changes, we tend to re-equip. That is, we buy the TV, oven, fridge, washing machine, dryer and vacuum cleaner all at the same time.

Strange as it might seem all of these white and brown goods are designed to broadly the same lifetime specification. The Mean Time To Failure (MTTF) is around five years and the Mean Time To Death (MTTD) is around eight years. Ergo multiple and near simultaneous failures are to be expected – they actually have been built-in. In other words, if we buy 10 items at once, there is a pretty good chance that two or more will fail at a similar time.

This mechanism also applies to our automobiles, computers and other IT equipment – and everything else that is mass-produced. So buying a PC, printer, scanner and back-up drive all at once puts us in the same vulnerable position. Of course, the amount of use and abuse also influences the actual outturn of the MTBT and MMTD. Add to this the variability between manufacturers, suppliers and maintainers as well as that unpredictable commodity – software – and the stage is set.

Another factor: When is your car most likely to fail? The day after it has been in the repair shop. Once the repairman has been inside the box it is far more likely to become unreliable. This is true of everything you own. It is certainly true of any and all software upgrades and installs for computers. Hence the old adages – ‘leave well enough alone’ or ‘if it ain’t broke don’t fix it’.

Some generally unseen mechanisms can often lead to a cascade of technology failures as well. In all complex systems a single point mechanism can lead to multiple failures and, conversely, multiple small failures can see a single dominant failure. For example, a network hub failure may see the loss of an internet connection, printer and scanner. At the same time, opening one more application when almost all the RAM is full, the hard drive is severely fragged and the mouse is dirty can cause a total system freeze.

And now for the quirky mechanisms – us. One of our first problems is recognising what is going wrong when tech failures occur and then diagnosing the cause. Very often we are not all that good at it. We tend to jump to the wrong conclusion and, especially when tired and stressed, make mistakes and compound problems through bad decisions and actions.

Add to all of this the fact there are a lot of us networked together, with different competence levels, all trying to achieve different objectives and you have a disaster in the making.

We also have run up our load of company, domestic and leisure activities to a point where almost everything is on a critical path. There is no slack – no room for error or failure. In a way we assume our technology will not let us down.

Why? Because much of our technology – heat, light, power, communication, transport – is reliable. Sure, IT is still flaky but it’s an awful lot better than it was 20 years ago and continues to improve. With our current mindset, any and all failures come at a critical time because everything we do is critical. We have no back-up, no standbys or no extra members of staff who can pick up the ball.

Is there a solution? Yes. But it means becoming less efficient and building in some slack. I don’t want to brag because what follows is a bit extravagant. But because a lot of people live in my home, effectively two families under one roof, I now have two washing machines, dryers, irons and kitchens – and four vacuum cleaners. Domestically I am in good shape – but I am not advocating this as a solution. A sharing agreement with a close friend or neighbour makes for a far more economic solution if it can be arranged.

On the IT front, if you want reliability you have to spend money on dual machines, hard drives, printers, scanners and everything else. And never upgrade software or install a new OS or application on all of your machines simultaneously. Do it sequentially, establishing stability a stage at a time.

Overall my most effective investments have been in back-up hard drives, both internal and external, plus several no-break power supplies. If a power glitch or outage occurs, my systems keep running. This single measure has saved me much grief and paid for itself many times over. And it was the least expensive of all my precautions – just $100 or so for battery backup for my server, router, hubs, drives, PC and peripherals.

On one level I stand in awe that modern society and technology works at all but on another I can see all the inefficiencies. In the end failure is endemic and part of the learning process. We just need to continuously minimise the overall impact. And believe me, IT is getting better.

Whoops, there goes another light bulb.

Written after my printer ran out of black ink, a light failed in my office and my ISP went down for half an hour. All extremely rare events but grouped in the same hour. Column completed within the next hour and despatched to silicon.com via my Wi-Fi link.