In my last article, “Starting at the beginning: Measuring basic administration,” I talked briefly about how a client called me back to help him with the aftereffects of a life-cycle project that one of my companies had executed. My client proposed a short-term, leadership-based solution, while I took up the challenge of creating a sustainable medium-to-long-term solution. We started by analyzing and addressing fundamental problems with the measurement of the infrastructure team.

In order to address the problems, my client and I decided to use broad rather than highly focused measurement categories. Each category contains specific performance metrics. We chose to draw measurement categories from several management disciplines rather than focus on a single kind of measurement. This allowed us to build a measurement system supporting a wide range of possible goals, rather than the narrowly focused goals inherent in a single methodology.

The first pass at measurements and metrics for infrastructure resulted in some generic, high-level data we could apply to almost all infrastructure functions. Although we recognized the need to make more refined metrics for each function, we decided to go with the generic measurements as a medium-term solution. This choice freed us up to move on to the next step.

For our next step, we wanted to address the second part of the IT tripod—development. Since we both came from infrastructure, neither of us felt overly comfortable addressing this topic. Still, we had managed more than a few developers in our time. Setting aside the traditional animosity, confusion, and misunderstandings between the two groups, we moved forward.

Development’s “current state”
Sifting through the chaos of development’s documentation took three weeks of dedicated labor. Going back to compare actual vs. documented results from various activities consumed another week. What emerged would hardly come as a surprise to anyone who worked in the industry for more than a few years.

Development and infrastructure readily blamed one another for stability problems. Development claimed infrastructure moved too slowly to address customer needs, and typically failed to correctly transport new code into the system. Meanwhile, infrastructure blamed development for the constant instability plaguing their customer facing systems. The support structure, including help desk and second-tier support wanted to fire both teams and replace them with the “dedicated people who kept the system running.”

From development’s point of view, they worked long hours at the thankless task of building information tools for uncaring clients. They fought complex, useless processes in the effort to churn out “good code.” This code, in theory, greatly enhanced the business’s ability to respond to changing conditions.

From the customers’ point of view, the IT team (as a whole) was unresponsive, refused to talk with them about what they needed, and generally discarded what information was provided. They mostly ignored the waves of new functions pouring out of the development team in favor of their spreadsheets and e-mail-routed documents.

Closer interviews with some of our biggest “customer culprits” revealed two things. One, the customers simply did not want to keep up with the constant changes to the core systems. They used their own system because it displayed considerably less volatility. Two, the customers no longer felt beholden to the IT staff to execute information technology tasks. They were developing more than just modest proficiency with their extensive desktop tools, including desktop databases.

This disparity between the two views made perfect sense when I looked at the measurements applied to the development department. Although they had documentation detailing processes capable of addressing the customers’ problems, we measured them on only two things: function output and business problems addressed per year.

Selecting measurements
These measurements contributed heavily to the problem. In effect, they encouraged the developers to produce a large number of changes whether they did anything useful or not. If developers produced more functions for the system than they did the previous year, they were marked as being “improved” and received a bonus.

The metric for the second criteria, addressing business problems, led down a similar path. Development marked a business problem “resolved” if they completed all of the functions proposed to address it during the design phase. They had no tools other than rumor to check whether the functions were in use, or if the problem actually disappeared.

My client and I decided to scrap the original measures. In their place, we came up with four key measurement areas:

  • Budget
  • Capacity
  • Functionality
  • Quality

These four provided us with the governing measurements; the metrics we assigned to each one would determine how we influenced the team’s behavior. The team received a grade as a whole based on their performance in all four areas. Each team member would, later on, be graded in a similar fashion.

Measuring budget—defined as the ability of the team to meet its business objectives while not going beyond its anticipated total budget—proved more difficult than we initially anticipated. Unlike infrastructure, where we can point to servers and wire, development works primarily as a creative endeavor producing an ephemeral product. In order to address this, we took the current budget as the benchmark and declared that if the other three measures were acceptable, budget was met. If the team exceeded budget while the other three measures fell, something was wrong. If the team exceeded budget while the other three measures increased, the business needed to decide if the increased level of service was appropriate for its current strategy.

The metrics for capacity (originally called growth potential) took their queue from the team’s original personal growth goals. Recognizing that programming is a creative endeavor, we established certification and training goals for each of the team members. These goals directly tied to the projected needs of the company; products and capabilities we needed received the highest marks. However, if the team built the capacity to work with technologies outside of the list, they still received some credit. After all, no one can predict the future. If their “guess” as to what would prove useful in the future proved correct, they received a second mark higher than the mark given for meeting a projected technology goal.

Functionality’s metrics caused considerable dissention in the ranks. In order to receive high marks for functionality, the development team had to produce IT products the customers used. We measured use on three levels: reduction in time to perform the business task, the length of time the users continued to use the function, and usage pattern. In our measurements, we noticed that when users were bypassing most functions in favor of their own tools, they would still enter data into the system in bursts, despite the periodic nature of the work being performed. If the function was used periodically rather than in bursts (and matched with a periodic business system), we noted the behavior.

Quality proved contentious as well. The developers, naturally, produced perfect code every time. Our metrics penalized them if, during review, the QA team found any of the following: a failure in a related function caused by the newly implemented code; a failure in the function that the implemented code provided; or a failure in the system as a whole related to the new function. They received boosts to their scores from flaws found during the QA process before the code went live.

Moving forward
This data gave us the beginnings of a possible measurement method for development. Obviously, we customized it to meet specific problems we encountered. Were there other measurements we could have selected? Would you have selected the same metrics and measurements in the same situation?

With sets of medium-term metrics in place for both infrastructure and development, my client asked me to turn my attention to the third part of the tripod. Then he wanted me to go back to refine the measurements for specific elements of the three. Those efforts are a tale for another time.