I was glad to see that Scott Lowe has raised an important issue that many technical folks do not seem to think about much, if at all. Coincidentally, I happen to have published two articles on the subject in the Journal of Capacity Management. The first, in 1983, had the title "The User's View of Computer System Relability" and the second, in 1985, was on "The User's View of Computer System Availability." The articles make exactly the same point as does Scott, namely, that metrics on system up-time do not fully reflect meaningful availability to business users.
While my articles are likely no longer available in libraries, some of the concepts from them are summarized on pages 56-59 of my book "Outsourcing Informaton Security" (Artech House, 2004). The book section also draws from a source on contract negotiation, since definitions of what comprises unavailability, from the user perspective, are key in drafting effective service level agreements either internally or with third parties.
Too many people are pinging servers and claim that they are taking care of "availability". You need to at least keep track of CPU, Memory, Network, Disk IO and Disk space. These simple indicators is almost always enough to diagnose the health of any server.
- Keyboard Shortcuts:
Keep Up with TechRepublic