A newly released study from IT management software firm LogicMonitor has found that IT outages and downtimes are plaguing the industry, with 96% of IT decision makers saying they’ve suffered at least one significant outage in the past three years.

The average organization surveyed in the study experienced 10 IT blackouts or brownouts over the past three years, leading to lost revenue, lost productivity, mitigation costs, brand damage, and more.

80% of respondents rated availability and performance as their top two concerns, but the amount of avoidable outages speaks to a disconnect between goals and current practices.

SEE: Launching a career in cybersecurity: An insider’s guide (free PDF) (TechRepublic)

Among all respondents, 51% of the outages were considered avoidable–so why aren’t they being avoided?

The causes of outages and how to address them

Preventing outages requires knowing why they happen: If you don’t know which potential fires to fight you can’t even get started. Each organization is different, but that doesn’t mean systems outages take an infinite number of forms.

According to survey respondents, there are six major causes of downtime:

  1. Network failure,

  2. Usage spikes,

  3. Human error,

  4. Software malfunction,

  5. Hardware failure,

  6. Third-party vendor outages.

There are clearly a few items here that are completely out of the control of an organization’s IT department: Third-party vendor failure and human error are all parts of life in the tech world. The rest of the items on the list, however, are well within IT’s control to prevent.

LogicMonitor does point out two major missed opportunities to avoid downtime:

  • Failing to notice when usage is trending upward toward critical limits (this includes network use and things like network drive storage capacity),

  • Failing to notice when critical hardware or software performance is trending steadily downward, indicating potential failure.

Conveniently enough, LogicMonitor does sell software specifically tailored to those needs, but purchasing their tools isn’t necessary in order to track important network statistics.

IT leaders should be sure that their management software is tracking essential hardware and software, monitoring all network traffic, and sending notifications to the responsible parties when an issue has been detected or if a critical limit is being approached.

Don’t focus solely on prevention and ignore recovery, though: There are important lessons to be learned in conducting a post-mortem after an outage, and those lessons can be applied to future preventative measures as well.

No matter the cause, system downtime can get expensive quickly, so outage factors within IT’s control should be avoided at all costs. Sure, you can’t control everything, and you shouldn’t try to–just be sure you’ve taken all the steps to avoid something you could have stopped.

Image: iStockphoto/metamorworks