General discussion
-
CreatorTopic
-
June 27, 2006 at 4:45 pm #2194201
INCIDENT MANAGEMENT – essential reporting or more interference?
Lockedby benjamin.reid · about 16 years, 11 months ago
blog root
Topic is locked -
CreatorTopic
All Comments
-
AuthorReplies
-
-
June 27, 2006 at 5:31 pm #3110896
Stop calling the techos ! – they’re on it !
by benjamin.reid · about 16 years, 11 months ago
In reply to INCIDENT MANAGEMENT – essential reporting or more interference?
We all know what its like, your production platform starts performing like a dog just three weeks after deployment and no one yet knows what the root cause is. Your monitoring tools have highlighted some symptoms to two specific support teams. Those teams immediate snap into action and begin diagnosis, quickly ruling out some obvious possibilities. The team leads talk and come up with 2 or three possible actions that might quickly isolate the problem and one involves bringing down the application, the others mean bringing down the server. Now the fun starts.
First the help desk ring to report the problem but some users have escalated through their manager instead. Others have just talked amongst themselves or assumed a cause lies with the network and have called them direct. Before you know it there’s a three ring circus and phone calls are going everywhere and different stories are circulating back to your CTO and CIO. The Service delivery manager is furious at finding out second hand and the problem is still there because the tech team have done nothing but field phone calls for the last 40 minutes.
Now obviously this is a slightly exaggerated, worst case scenario but you get the drift, right ? The point is sometimes an incident needs more than technical attention. The communication of impact severity and areas affected along with who is leading the technical recovery is vital to stop the confusing melee from manifesting. Management, users, other tech teams may all have a stake in the recovery of the service and if you want your Tech teams left alone to work their magic you need someone that can get brief and concise updates from them and then shield them from any other distractions. Often this is the help desk but increasingly this is the role of an incident coordinator or manager.
The IC is able to accurately record events in a timeline and will collect forensic evidence (logs,screen dumps, photos of damaged hardware, system dumps) for later analysis. The IC ensures that the correct teams are involved, that the right people are notified and that progress reports are distributed to all stakeholders…..”brilliant!” say the tech teams, “thanks for taking that on and making my life easier”.
The thing is it doesn’t always go that smoothly. If your IC does not have the respect of the Tech teams and badgers them constantly for information, you have gained little. Likewise if their communication skills are not up to scratch and the progress reports misrepresent the incident to executive management you again have misinformation. If you have not carefully considered the best approach for you IC with buy-in from the business owners, management, help desk and IT then you might as well not bother.
With a well planned approach your IC takes all the pressure of the tech teams and translates the problem resolution activities to management in language they can can easily digest. The IC is then free to conduct post incident reviews with concise information rather than relying on the memories of those involved.
I’m certainly interested to hear of others with a take on the value of Incident Coordination as a separate role rather than an aspect of the help desk.
-
-
AuthorReplies