Starting at the beginning: Measuring basic administration

Unifying your measurement models can result in a highly available system.

Over the years, a large number of clients have paid me to make "improvements" or "upgrades" to their IT systems and procedures. Most of these clients never reaped the benefits of their investment. Instead, fundamental problems in their administrative team and practices caused them to throw good money down a procedural hole. We could have easily avoided this problem by stepping back and addressing the core problem rather than forging ahead with neat technologies.

I learned this particular lesson when a client called me back two years after our initial engagement. My first job for him involved the design and deployment of a modular infrastructure and a new development process. He wanted me to come back as an independent consultant to review his progress. My client felt that something somewhere was going wrong despite his best efforts.

Going to look at one of my own designs two years later gave me the unique opportunity to examine how the design evolved over time. It also let me see where my own assumptions and lack of planning in areas I really should have predicted led my client into continuing problems.

On the infrastructure side, the client's IT team continued to add to the servers, ignoring the change management and architectural procedures agreed upon during the deployment. When I sifted through the debris, it seemed clear the changes were slowly degrading the network's stability. More importantly, customer requests would sit for months when the team had an agreed-upon SLA of one week for an initial review.

On the development side, the QA and architecture procedures had transformed into pro forma meetings where people passed paper back and forth to sign. No one questioned assumptions. Changes made to the code were rammed through so "the users would stop complaining."

Obviously, my client's teams developed a large disconnect between the form of the process and its intention. Some, more out of frustration than belief, flirted with so-called rapid development methodologies to catastrophic results (e.g., an untested piece of code shutting off order entry for six business hours). Where could we start to unravel the puzzle?

Back to basics: What were we measuring?
My client and I decided to take a three-part approach to the problem. First, he would try to lead his team to somewhat better customer service as a bridging activity. Second, we would work together to build a measurement system capable of tracking the multiple factors creating the vector we call real performance. Third, we would build an action plan to implement the information gained from the new measurements. This later plan would include a feedback mechanism allowing my client to fine-tune it over time.

Since we both came from an infrastructure background, I turned my attention to the measurements for infrastructure administration first. Sorting through the available measurements, I discovered the following:
  • The UNIX and Microsoft teams worked under two different sets of measurements.
  • The security team had a separate set of measurements from the other two core teams, although their work directly impacted that of both teams.
  • None of the teams were measured on customer satisfaction. The only group or system measured on customer satisfaction was, in fact, the help desk. Consequently, a review of their records revealed that every customer complaint landed on their shoulders.
  • While most of the teams took benchmarks of their systems, none were held accountable for the results.
  • The measurements included meeting "personal goals" that simply restated the IT organization's goals. There was little or no sense of progression or accomplishment.

In and of itself, this was not terribly surprising. Over the years, I had seen this exact pattern at dozens of clients, some of them quite bright and progressive. It was embarrassing, though. My clients pay me to know better—not to repeat the mistakes of their competitors.

What did we need to measure?
Given our measurements above, we could easily predict the current outcome. So the question was, how do we build a system of measurements leading to a different equilibrium? Should we go all the way over to pure "leadership" measurements? Discard everything in favor of measuring budgetary compliance and ROI?

Rather than take a radical approach, my client and I agreed that, since business process is a vector of forces, we should try to measure the activities acting to create that vector. For IT administration, we chose four measurements: budget, capacity, growth, and stability.

Our budget measurement included both return on investment (ROI) and maintenance cost over time analysis. ROI became the measurement for new projects; those that could not meet the threshold received one negative marker. Maintenance cost over time became our measurement for determining if a system required further investigation. If a system displayed a need for greater and greater investment (either hardware or time), the client could request more detailed information to determine if it were truly degrading.

We measured capacity on three levels. First, could current services be expanded without the investment of considerable amounts of cash and time? Second, could the existing team support the upgraded services based on the predicted expansion in administration time? Third, did the system change to user needs with the parameters established by the SLA?

We used the growth metrics to delve into the team building side of the equation. The technical ability of any IT system to meet the client's needs is limited by the ability of the team to provide those services. Unfortunately, we cannot dictate intellectual or creative growth. Instead, we established our goals for the system and farmed them out to interested parties. For example, if a UNIX admin wanted to explore the esoteric joys of a security audit, we allowed him to do so with suitable supervision. We measured growth both by project completion and by goal establishment/resolution.

Our measurement of stability rolled together both technical and customer service metrics. We declared that it wasn't enough to just have a system with massive uptime. All of the systems involved needed to work together, creating a highly available system for the end users. If a system was unavailable during business hours, everyone involved (from administrators to programmers) took a hit.

We also unified the UNIX and Microsoft teams' measurement models. Doing so forced them to at least talk to one another every once in a while. They became even more fervent about it when they realized we were quite serious about the idea of making a mark against everyone when a system went down.

Step by step: Applying the same method to the other systems
This was obviously only the first step. Despite our desire to pull things into separate worlds, IT remains a highly interconnected enterprise. We spent months building new measurement models for the other parts of the system. I'll delve into those other models in later articles.

If you built a true multidimensional measurement model for IT administration, what measures would you use? Why would you choose those measurements for your specific environment and not others?

Editor's Picks