It has been a rough few days for anyone interacting with the state of Virginia following an IT outage that affected 26 state agencies. Can a storage area networking failure really cripple a state's IT systems?
Virginia's IT infrastructure, which is managed by Northrop Grumman, has led to a few statements from agencies. Notably, Virginia's Department of Motor Vehicles hasn't been able to process requests for licenses and ID cards. These systems are supposed to be up and running on Tuesday, six days after the outages started to appear.
Meanwhile, the Virginia Information Technologies Agency (VITA) said in a statement that teams have been working throughout the weekend to restore data. In a nutshell, the IT infrastructure of the state of Virginia was reportedly crushed by an EMC storage area network failure. The Richmond Times-Dispatch reports that several systems are still down. The same paper said that Northrop Grumman will have to pay a fine for the failure. And the real kicker is that recently revised its contract with Northrop Grumman and extended the deal for three years. The state paid an additional $236 million for better service from Northrop Grumman.
Needless to say Virginia residents aren't pleased. We've received a few emails and calls and the comments on the Richmond Times Dispatch site are summed up by this one:
Highlights of the Revised Contract
Consolidates and strengthens Performance Level Standards with a 15% increase in penalties across the board if Northrop Grumman fails to perform on clearly identified and measured performance standards. - PAY-UP
Improves Incident Response teams to determine technology failures and expedite repair - FAILED
Institutes clear performance measurements for Northrop Grumman that agencies can easily track - FAILED
Adds new services to contract such as improved disaster recovery and enhanced security features - FAILED
Among the key parts of the VITA statement:
- Successful repair to the storage system hardware is complete, and all but three or possibly four agencies out of the 26 agency systems have been restored. Agencies continue to perform verification testing.
- Progress continues, but work is not yet complete for the three or four agencies that have some of the largest and most complex databases. These databases make the restoration process extremely time consuming. The unfortunate result is the agencies will not be able to process some customer transactions until additional testing and validation are complete.
- According to the manufacturer of the storage system (EMC), the events that led to the outage appear to be unprecedented. The manufacturer reports that the system and its underlying technology have an exemplary history of reliability, industry-leading data availability of more than 99.999% and no similar failure in one billion hours of run time.
The official explanation for the outage leaves a bit to be desired and frankly doesn't pass the sniff test. The outage was blamed on the failure of two circuit boards installed and maintained by EMC.
Simply put, it's a big disconcerting that two circuit boards can bring down a state's IT infrastructure for nearly a week. Talk about a lack of redundancy.
Among the things that don't add up in the Virginia IT outage:
- Why wouldn't these boards be replaced quickly?
- Why was there a single point of failure?
- According to the Washington Post, service was restored for 16 agencies, but 10 require "a lengthy restoration of data." Where was the disaster planning? After all, Northrop Grumman touted its disaster recovery for the state just two years ago.
- Where did the IT management fail?
We're told that Northrop Grumman knows about its IT management issues and is working on correcting the problems. Northrop Grumman was awarded a $2.3 billion IT services contract in 2005. And the company has touted some of the state's successes. Meanwhile, Northrop Grumman even relocated to Virginia. Hopefully, that proximity will lead to better IT management.