Use wiki-based runbook automation to promote a collaborative capture of best practices

The technology analyst firm Gartner recently predicted that half of the companies in the US will have wikis in one form or another by the end of next year. Wikis are fast and dynamic and particularly useful in developing a consensus over a broad user group. They are optimized when members are encouraged to candidly present their best thoughts and recommendations in good faith, in the same way a good speaker encourages interaction from his audience.

Think about it. Does it ever seem that your organization is banging its head against the proverbial wall when it comes to recurring problem-repair and resolution processes? Is your level one and two staff being underutilized while your level three people are overwhelmed? What if you were able to replace your aging knowledge base with a collaborative wiki to turn procedures into automated problem-solving tools useful at all three levels?

When you do a search in Wikipedia, you expect to find more than lines of narrative text. The links to source documents or other related information are the "on-steroids" effect of this resource being online, a giant leap forward from an ink-on-paper document. Enter a new kind of runbook that uses this same advantage to make use of your company's tribal knowledge and collective wisdom. The wiki approach puts the whole process into an "actionable" form with embedded links to leverage existing tools, scripts, utilities and API's and speed the diagnosis and resolution of your problems, making it so simple that even a caveman could do it.

So how does a wiki do all this? Jimmy Wales, founder of Wikipedia's online storehouse of open-source facts, calls it "harnessing the wisdom of the crowd" in a process of hashing it out with increasingly dependable data that ultimately gets you to the best possible version - and then keeps it current.

In its best form, the goal of runbook automation is not just to automate the existing runbooks, but to continually identify the things that should be automated and then improve on the process. Most of the current problem-remediation processes are hindered by an increasing "usability gap" where inconsistent use spirals downward into obsolescing procedures, which become more inconsistently used, becoming even less dependable, and so on. This problem is often addressed by assigning a dedicated knowledge-base team, which becomes cost-prohibitive and the system ultimately fails.

With a wiki-based automated runbook, the goal is to promote a collaborative capture of best practices, typically from the level three IT operations and tools-engineering team. This information is made available to levels one and two to make them more effective in problem-solving, or at least to define the type of triage information to be captured prior to an escalation. Unlike other runbooks, an "actionable" runbook emulates the true wiki-approach further by including links to automated tasks on-demand or to orchestrate coordinated procedures when a repeated event occurs.

According to former Micromuse Chief Software Architect Dr. Duke Tantiprasut, "A lot of the users I've talked to say they were successful in leveraging Netcool to centralize fault management, however quite a number of them fail to capitalize on what this kind of solution really enables - the ability to standardize and streamline problem-remediation processes." The wiki-based Resolve Runbook Automation solution developed by Dr. Tantiprasut can be integrated seamlessly with IBM's Tivoli Netcool to include right-click functionality and smooth transition between the two environments. Users can click from an event to the runbook display of the procedure they should follow to diagnose and resolve the problem. For alerts that do not already map directly to a problem, the initial wiki document may represent the start of a guided diagnostic or a decision tree (when this happened, I took this approach and got this result) of interconnected wiki documents. These can define redundant problem conditions and FAQ's to be checked prior to an escalation, adding instructional value and bringing newer team members up to speed without requiring extensive training or experience.

What about existing "playbook" information that may not be event-based - how does this kind of tool integrate in a large environment? Logic-based playbooks don't require procedures to be fully programmable in order to apply them. In larger environments where playbooks are already established in word documents or spreadsheets, a wiki-based RBA solution leverages existing procedures and increases their accessibility. By using contextual information in alerts and embedding directions directly in the runbook procedures, these static documents are transformed into actionable tools that can enhance the problem-solving effectiveness of the collective environment. In addition to capitalizing on established playbooks, wiki-based runbook automation also leverages current automation systems like Opsware, providing the contextual integration that creates a consistent procedural user-interface for this and other similar tools. Operators don't need to be experts on Opsware as long as they understand that the wiki-link or "action-task" has been set up in such a way that clicking on it reconfigures the server to bring it back into compliance.

The primary users and direct beneficiaries of wiki-based RBA solutions are IT operations level one and two team members who use the system on a regular basis and level three tools-engineering team members who define the procedures and automations used to integrate with systems, applications and tools. The level one and two teams need to be fluent in problem-solving skills and be able to interpret the results of the executed actions, although not necessarily have esoteric expertise or training on the specific tools the RBA solution is leveraging. This frees technicians up to solve more problems faster, and to triage more effectively without being experts in every tool and problem domain involved. By contrast, the level three team needs to have domain expertise in the problem area as well as scripting and programming knowledge, specifically in Groovy, which is a standard java-based scripting language.

When problems are resolved more efficiently, benefits extend to the entire organization in terms of economics, customer satisfaction, better use of limited resources and overall synergy. This new breed of runbook automation solutions using wiki collaboration is doing all of these and taking advantage of the investments your organization has already made for a quick return in both hard and soft dollars. To download a free white paper on "Increasing IT Operations Efficiency With Collaborative Runbooks and Automation," go to:

Jeff Cerny is the Director of Marketing at generationE Technologies (, a professional services firm in the IT service management space, and also serves as marketing committee chairman for the board of directors at i.c. stars (, a grassroots IT training organization in Chicago, Illinois.