Jay Rollins continues his account of a help desk rehaul. This week he talks about testing the basic and advanced functionalities of the help desk software.
Last time, we talked about how to choose the best software for your help desk. This week we pick up after the testing Numara Software's FootPrints and ManageEngine's ServiceDesk Plus Enterprise Edition.
After installing each application and testing basic functionality, both systems function as described. FootPrints has a much better flow and user interface, but the ServiceDesk system is still very usable. We have not had to contact Numara Software for support (we've really had no issues on the install), but we have had to call ManageEngine. We actually contacted support around midnight EST and reached a live person who was very helpful. It was a challenge to understand accents, but we managed to get through it, and we had our first call resolution. The call was not really about trouble with the system, but actually how to do something that was not well documented.
The basic functionality checked out fine for both systems during our test. The real test is the advanced functionality. In preparation for the advanced test, we did the following:
1. Listed all of our systems and looked at all of the moving parts and critical services. An example would be our ERP clinical system's requirement for access to a database server or another system's local disk requirements, etc. We also assigned a criticality metric that helped us outline what systems were the most critical to the company on a scale. Impact to the company was determined by revenue impact, operations impact, and back up systems or processes availability.
2. Reviewed all of the alerts that we currently have. This was a helpful exercise because we found some useless alerts that really weren't doing anything.
3. Did an initial gap analysis on what alerts we need vs. ones we have. This got interesting as well. When it is the IT guys, we start rattling off all these alerts that we want. That started to get out of hand, so we differentiated which alerts were needed to tell us that something might be wrong. The other alerts were more information that would be used in the course of troubleshooting problems. Once we classified these, things went a little more smoothly. Additionally, we looked at what alerts would initiate proactive maintenance. An example of this would be an 80% disk space usage for local server drives. When it hits 80%, we need to initiate work to either add a disk, free disk space up, buy a new server, or offload to a SAN where it can be better managed.
4. Defined service levels. We went down the path of defining what dictated a Level 1 issue through Level 4. Here are the definitions we came up with:
a.) Level 1: Affects more than 25% of the company from operating critical systems. Response time: 15 minutes, 24x7x365 with on-call resources and all hands on deck once notified. The entire team drops what they are doing to focus on this issue. To track activity on this issue, we will need a trigger field. The default state of the ticket is "in queue," and the options in the pull-down field are "assigned to technician" and "Active." Once the ticket is set to "Active," it means that this is the ticket the technician is currently working on. Having this trigger helps a couple of things: We can better set expectations with the customer, and we can view appropriate priority given inbound tickets via reporting.
b.) Level 2: Affects an entire satellite location (there are nearly 60) or is a performance issue (vs. a total outage) on a critical application. Response time is within one hour (noted with "Active" as described above), 24x7x365 with on-call resources. Help will be called as needed, but the initial technician is the issue owner.
c.) Level 3: Affects individual user with a critical work stoppage (i.e., user cannot perform their job). Response time is within four hours, 8x5 and is denoted by being assigned to a technician as described above. On-call resource can address at their discretion; otherwise, the issue waits for the next business day. (Note: This is based upon current IT staff. A larger staff may have longer hours or operate on weekends.)
d.) Level 4: Affects individual units and focuses on the "nice-to-have's" or mini projects like testing a new Microsoft Outlook plug-in. Response time is next business day 8x5 and is denoted (or the clock starts ticking on the SLA) once the ticket is assigned to a technician.
5. Defined escalation procedures and communication strategy as follows:
a.) Level 1: All CXX are notified immediately via e-mail or phone call, and an update is sent every hour until the issue is resolved. Updates will go out with the latest information added to the ticket in the system so the team doesn't have to take an inordinate amount of time communicating.
b.) Level 2: Escalation to all IT management, divisional vice president, and satellite office GM immediately with hourly status updates. Send notification to CXX if problem has not been resolved within two hours and communicate status updates every hour to all parties.
c.) Level 3: Escalation to IT managers if issue has not been resolved within four hour SLA. Escalation to CIO if issue has not been resolved within eight hours. Communication includes e-mail denoting ticket has been assigned (or added to a technician's personal queue). Another communication will go out when SLA has been missed. Otherwise, an e-mail with a survey will be sent when the ticket has been closed.
d.) Level 4: Escalation to IT management if issue has not been addressed within 48 hours. Escalation to CIO if issue has not been resolved within 72 hours. Communication would consist of an expectation-setting e-mail stating when the technician would be able to get to the mini-project and a follow up e-mail with an estimate as to how long it would take to complete. The ticket will be closed with conclusion and project summary.
The frequency of communication may seem a little too much, but my experience is that you cannot communicate enough. Even if they ignore the updates, every stakeholder has a warm and fuzzy feeling with the knowledge that their IT staff is on the problem as if it was our own. This goes a long way in building trust and getting across that we know what we are doing, and the business is in good hands.
Next week we see if the systems can handle these business rules, plus a few additional changes. Spoiler alert: We chose ServiceDesk. We felt the weaknesses in the product were well worth putting up with given the price difference between it and FootPrints. I guess that remains to be seen, but I'll let you know how it goes.