Service-Level Management (SLM) is exploding into the application arena to meet new end user availability and response time needs. Although emerging solutions are addressing client/server systems, as well as Web/Internet and Enterprise Resource Planning (ERP) applications, the lack of support for unsexy, in-house applications may well be the deal-breaker for large organizations’ SLM tool and methodology purchases.

Downtime is getting unacceptably expensive as businesses increasingly depend on IT services for mission-critical applications.
This article originally appeared in the September issue of Wiesner Publishing’s Software Magazine and appears on TechRepublic under a special arrangement with the publisher.
In this new environment, organizations must define IT availability in terms of applications, rather than resources, and use language that both IT and business users can understand. Thus, in the past, “IT’s [assurance of] 98 percent network availability [offered small comfort] to a sales guy who couldn’t book orders,” comments Rich L. Ptak, vice president of systems and applications management at Hurwitz Group, Framingham, MA. “It didn’t mean the application was running, or the response time was good enough for the salesman,” he adds.

Kris Brittain, senior product line manager for SLM strategy at Tivoli, Austin, TX, agrees that SLM [has been] “a lot of air, not traction.” By contrast, today’s business service-level agreements (SLAs) between IT and the line of business define “what [customers] should expect from IT without hiccups.” A subset could be operational, she adds.

SLAs tied to users
Current SLAs are tied to applications in the end-user experience, says Hurwitz’ Ptak. With their focus on the user, rather than the information being collected, SLAs aim at linking the end user’s business process experience with what is happening in the IT organization.

“The SLA is a common bridge between the end user and the IT department, couched in language both understand,” says Martin Haworth, a divisional manager for Hewlett-Packard’s OpenView, Roseville, CA. However, while users care most about response time, he notes that it has historically been ignored in service-level agreements.

That is rapidly changing, as organizations demand end-user response time measurement from their suppliers, for client/server as well as mainframe applications. For example, when New England Financial of Boston relocated its customer service center from a private fiber to a remote connection, call service customers were most concerned about response time and reliability. Therefore, John Corbett, vice president of the company’s network technical services, required a tool that provided response time monitoring at the client/server level.

Earl Newsome, too, vice-president and CIO of Owens-Illinois, Toledo, OH, sought “a system to allow measurement of end-user response time [as a] critical component of user satisfaction,” when the glass/plastics manufacturer underwent a complex migration from legacy to client/server systems. Although legacy performance over time provided sub-second response time, he notes that client/server performance is just now rearing its ugly head. Newsome believes that monitoring all elements of the response time component offers the biggest bang for the buck in the client/server world of scarce resources.

Application viewpoint
Indeed, the application viewpoint offers the best perspective into a company’s mosaic of connections, any one of which could slow the user down, says P.K. Karnick, vice president of Aptia, an SLM firm in San Jose, CA. This is no news to end-user organizations.

It’s a very complex environment, says Frank Slootman, vice president of the EcoSystems division of Compuware, Farmington Hills, MI, and organizations need to do root cause analysis when users have service problems; if, for example, it takes them too long to check inventory. When IT organizations were more infrastructure-oriented, notes Slootman, service problems resulted in fingerpointing, and time wasted passing the buck, before they found the domain responsible, be it the server, the network, or the connections. Now, however, as IT organizations change from infrastructure providers to service organizations, they are looking at the application level to determine what’s consuming the system. Compuware was a pioneer in application availability and SLM.

The application-level strategy seems to be working. For example, after New England Financial changed to a remote connection for its customer service center, a pension customer complained about slow response time. When the company deployed the VitalSuite product line from INS, it graphically illustrated that 70 percent of the response time was being used at the client end, not at the network or the server. VitalSuite displays a detailed picture of where the time in the application is spent, by segment, including the desktop workstation, network, servers, gateways, and even the mainframe. The application developers could see this and remediated the problem from their end.

Four approaches
To capture the end user’s experience for SLM, organizations must collect application response time metrics. The GartnerGroup, Stamford, CT, has defined four ways of measuring end-to-end response time: code instrumentation, network X-ray tools, capture/playback tools, and client capture.

By instrumenting the source code in applications, organizations can define the exact start and end of business transactions, capturing the total round-trip response times. This was the approach taken by Hewlett-Packard and Tivoli with their Application Response Measurement (ARM) Application Programming Interface (API) initiative.

However, although the approach is insightful in capturing how end users see business transactions, it is also highly invasive, costly, and difficult, requiring modifications to the application source code, as well as maintenance of the modifications. And, despite the promise, only 3 percent to 5 percent of ERP applications have been ARMed, or instrumented.

A second collection approach is via X-ray tools, or network sniffers. An example is Sniffer Network Analyzer from Network Associates, Menlo Park, CA. Sniffers use probes spread out in strategic locations across the network to read the packet headers, and calculate response times as seen from that probe point. Although noninvasive, this approach does not address the application layer. Because it does not see transactions in user terms, it doesn’t capture response time from the end-user perspective.

Capture/playback tools use synthetic transactions, simulating user keystrokes, and measuring the response times of these “virtual” users. While simulated transactions have a role in testing the applications’ potential performance, they do not measure the actual end user’s response time experience. Examples are CAPBAK from Software Research, San Francisco, and AutoTester from AutoTester Inc., Dallas.

Client capture is the fourth and most promising approach to measuring response time from the user’s perspective. Here, intelligent agents sit at the user’s desktop, monitoring transactions of end users to capture the response time of transactions. Client capture technology can complement network and systems management solutions.

Perhaps not fitting neatly into the Gartner categories, but with products installed in over 1,200 sites worldwide, is Programart Corp., Cambridge, MA, which concentrates on the application performance management market. The Strobe tool reports on the resources used by an application; the APM Power tool analyzes performance data and offers suggestions to improve performance. The company’s products are based on IBM’s OS/390, which is experiencing a rebirth as an e-business platform, thus helping Programart’s business, says Amy Bethke, director of product marketing. The company plans to support Windows NT and AIX next year, she adds.

From here to maturity
Despite the fact that service-level management and agreements are moving in the right direction, the SLA market is not mature, according to Steve Waldbusser, chief strategist at the INS software division. Instead, he says, “SLAs are ahead of the software that’s monitoring them.” Indeed, 47 percent of the network professionals surveyed by INS in 1999 said that difficulty in measuring SLAs was a significant barrier to implementing or improving SLM.

Although SLA contracts have not been monitorable by software people until recently, that situation is changing. Vendors are starting to automate the monitoring process, and trying to keep pace with the moving target of customers’ changing needs.

For example, when New England Financial first considered INS’ VitalSuite software, Corbett says the initial version didn’t report what the company needed to see. Instead, there was a difference between the vendor’s and the end user’s definition of response time. The vendor measured response time from the time that the end user hit the enter key until the last screen queued up. While the start time was the same for the end user, the user considered the end time to be when the first screen printed, and he or she could actually do something. Since the response time differed considerably according to the two perspectives, the vendor redesigned the measurement to accommodate the user’s definition.

Furthermore, New England Financial is striving to be proactive. So, says Corbett, in the future the company will be using VitalHealth, the VitalSuite piece that links into the help desk. Then, if the desktop agent sees the transaction exceeding a 40-second SLA threshold, for example, it will send an alarm to the VitalHealth console sitting at the help desk, for real-time diagnosis and resolution.

In addition, Corbett notes that New England Financial’s application that services external customers was written in-house. INS’ Waldbusser says that VitalSuite recently added support for the previously unrecognized area of in-house-developed applications. While not sexy, these applications are vitally important to organizations, providing order taking, logistics, interfacing with customers, and basic company operations.

INS sees the New England Financial case as the tip of the iceberg, which is ushering in a new generation of customers who require the monitoring of in-house applications.

Toward a Business Dashboard
Says Hurwitz’ Ptak, there’s recently been an explosion in focused SLM tools and methodologies. While companies such as Hewlett-Packard, Tivoli, and Computer Associates offer generic, enterprise-focused solutions, smaller, emerging start-up firms are starting to provide comprehensive, integrated, but more cost-effective solutions. Once scattered and unfocused in strategy, they are honing into organizations’ requirements more carefully, making themselves attractive partners.

Ptak predicts that SLM will move toward a focus on the business process, whereby organizations will abstract the state of the business processes that run their companies. In turn, the available data and its abstraction will consolidate into a dashboard reporting system. As organizations move toward a business dashboard, the data will be just a given. Ptak adds that since solution providers are rapidly becoming sophisticated in making data available, it is already happening, and quicker than expected.

Electronic business is driving these SLM trends. With IT now the main entryway to the business, the organization can’t afford to look sloppy.

IT has long been an overhead operation, adds Ptak. Now it’s on center stage as a business driver. Therefore, SLAs, SLM, and Quality of Service (QoS) have all become critical to the successful delivery of IT services.

Janet Butler has 20-plus years of experience as an IS journalist and editor, specializing in application development, systems management, IS planning, and help desk issues.

Tell us what you think about service level management by posting a comment below. If you have a story idea you’d like to share, please drop us a note .