Many IT organizations focus mainly on customer service but neglect to examine their own information needs for managing IT. When I took on the role of IT manager for a software company, I found the reporting system lacking. After an initial needs assessment, I was able to outline certain goals we had to reach in order to have a successful reporting system. Among the goals were:
- Discuss and set quality-of-service standards with internal customers.
- Discuss and review quality of service with suppliers such as Internet and frame relay vendors.
- Set appropriate priorities and resources.
- Incorporate results into individual and department performance plans.
- Report results and direction to management and the rest of the company.
- Measure the benefits of recent projects and expenditures.
- Ensure that backups were successfully being completed.
Basically, three key areas needed to be looked at individually and then pulled together:
- IT infrastructure uptime alerting, monitoring, and performance tracking over time
- Support call management over time
- Resource backup over time
The key term here is “over time.” In my previous experience as the information guru at a bank, I became aware that point-in-time information has only limited value. I’ll focus here on how I approached designing and building a reporting system for the IT infrastructure. In future articles, I’ll cover how I tackled support calls and the backing up of resources, along with their associated reports and analysis.
What needs to be tracked?
Every organization is different, but in my enterprise, the IT infrastructure needed a multidimensional management model with the dimensions of connection types, services, locations, and time. I needed to design the information database so it would properly weight the impact on the company in order to better reflect uptimes and availability. I also needed to take into account the linkages between the service provided and the connections.
Let’s look at measuring uptime for e-mail as an example. Multiple components go into uptime: the server, the Internet, the internal networks, remote access like VPN, alternatives such as Outlook Web Access, and the users’ computers. So how do you report that e-mail was up 98 percent of the time? The key here is weighting the components. If the e-mail server is down, the entire service is down, affecting the entire company. However, if VPN is down, then e-mail is down only to those users accessing e-mail by VPN.
The next trick is to properly weight downtime by the average number of connections in order to get a true picture of the overall company impact. If the connection to a sales office is down, the impact to the company would be the time down weighted by the average daily connections to that office. For instance, there could be 50 salespeople with connections but only 10 salespeople in the office on a given day. So the right weighting would be 10 out of the total average daily connections for the entire company.
The report format
I usually begin these projects with several high-level spreadsheet prototypes. I prefer the final system to be a dynamic reporting system either on the Web or in Excel pivot tables using a Microsoft SQL Server database that provides an easy way to summarize the information, incorporate time, have access to detail, and provide me with the analysis tools built into Excel.
The abbreviated example shown in Figure A should provide good ideas for building your information database. In this case, I had several detailed pivot table reports on various tabs of the master Excel workbook. I then developed formulas that accessed the pivot tables to extract the information needed and applied my own higher-level linkage formulas. Since the information was summarized from a more detailed Excel pivot table, I could easily examine details and trends for any dimension. For example, I could easily look at WAN by month, by day, or by hour.
There’s not enough space here to examine all the methods of reporting by time, detail, service, and connection. However, in later articles, I’ll show some of the analysis I was able to do.