Jamie Lerner, President, CITTIO, Inc.
Businesses rely on computers, networks, software, and
databases to compete effectively. All these systems must remain healthy for a
business to operate efficiently. In today’s IT environment, computing devices
from multiple vendors are often used to address many requirements. Should any
of these resources fail unexpectedly, the negative
impact can be severe.
A conservative Gartner estimate
states that the average cost of downtime for a computer network is $42,000 per
hour. Gartner also estimates that companies typically
experience a total of 87 hours of downtime per year. A company that experiences
more than 175 hours per year could save as much as $3.6 million annually by
successfully implementing monitoring technology to reduce downtime to the
With the increased complexity and quantity of computing
equipment and software, monitoring the health of these systems can no longer be
performed manually. Specifically, monitoring software must be used continuously
to perform tests that ensure all computers, network devices, and software
components are working properly.
Gartner notes that when critical
servers and networks crash, businesses pay dearly in terms of productivity,
damaged reputation, and financial performance. According to USA Today, U.S.
companies lost an estimated $100 billion from network outages in 1999 alone.
For average companies, the Standish Group warns that the cost of a minute of
downtime for a mission-critical application is $10,000. For large companies,
the price can be millions of dollars a minute.
When failures occur, minimizing downtime is crucial to
limiting business impact. If a corporate Web site “available globally 24 hours a
day, 7 days a week” goes down, the company loses a valuable avenue for sales,
contacts, marketing efforts, and business development. Often these loses are
difficult to quantify.
System failures can sever important lines of corporate
communication. Frequent failures cause corporate
cultures to lose confidence in these highly effective business tools, minimizing
return on investment in them.
IT organizations with the challenge of keeping systems
operational 24×7 have the following requirements:
monitoring technology that helps keep critical systems up and running
around the clock.
systems that are rapidly implemented and easily maintained. IT
organizations have neither the time nor resources for lengthy
installations or complex maintenance.
application, system, and database level monitoring that provide early indications
of systems trouble as well as key real-time data and historic performance
that identify problems and are intelligent enough to solve them.
of multi-vendor solutions for monitoring, maintenance, and management into
one central dashboard.
high-level view of the overall health of the network, coupled with the
ability to drill down into specific data.
simple, flexible licensing model. Complex per-probe or per-module
licensing models are riddled with hidden costs. Multiple components make
them difficult to install, and it is also impossible to predict the total
cost of ownership over the product’s lifecycle.
The problem with traditional monitoring solutions
To reduce or eliminate the disruptions caused by computing
outages, major vendors such as Hewlett-Packard, IBM, and Computer Associates
have built monitoring solutions. Network and systems management (NSM) software
accounts for a large slice of IT budgets. In 2004 alone, companies spent $7.1
billion on such products.
These products are not only expensive but also tend to be
difficult to install, administer, and maintain. While they have been available
for many years, their high cost and complexity are associated with the
companies have not deployed formal monitoring technology.
companies have either failed in or abandoned the attempt to deploy them.
companies have deployed low-end monitoring systems, sacrificing vital
functionality in exchange for a partial solution.
The features IT organizations need
The following attributes are critical to a highly functional
Java and Internet-based architecture
Software should be written in Java and designed for a
Web-based environment. Web-based systems with zero-client architectures require
very simple software distribution or upgrade mechanisms, because the technology
resides on a single server. In addition, the system should be securely accessed
and administered from any location without additional client side
Most traditional system monitoring products pre-date the
Internet and are essentially client-server based systems with limited Web-based
reporting capabilities. These systems require upgrades and patches on the
central server and the client.
Simple, intuitive Web-based user interface
System administrators need fast, easy access to system
functionality without requiring lengthy training. Ideally, the user interface
should allow operators and administrators to get up to speed in less than a
day, by using familiar user interface paradigms such as tree controls, tabs,
graphs, and tabular data.
Traditional applications can take three to nine months to
install and configure. They often require significant consulting services,
increasing total cost of ownership. Automation technology discovers servers,
networking equipment, and software applications–and collects performance
statistics and applying thresholds–which reduces installation time from months
to days. It automates cumbersome, repetitive configuration and maintenance
tasks, relying on defaults or templates designed to meet more than 90 percent
of an organization’s needs.
Industry standards such as J2EE, SNMP, WBEM, and JDBC
provide for easy integration with other technologies and lower overall support
and maintenance costs. By leveraging industry standards, an ISP engineering
team can react to industry changes more rapidly and leverage engineering
investments for a more cost-effective solution.
No proprietary heavy agents
Most traditional system monitoring vendors provide a heavy
agent that must be distributed to production systems. These agents often
consume significant network bandwidth during communication to the management
station and significant resources on each monitored server. In addition, every
system on the network must be upgraded when the product is patched or upgraded.
A far more innovative approach is to use the built-in Simple
Network Management Protocol (SNMP) technology that comes with most systems
rather than requiring a proprietary agent. Using User Diagram Protocol (UDP) to
communicate with agents consumes very little network bandwidth. In addition,
when the operating system is upgraded and patched, the SNMP agent is also
patched and upgraded by the system vendor, simplifying overall maintenance of
the monitoring system.
Zero MIB Compile SNMP architecture
Vendors implement different SNMP management information
bases (MIBs), which are collections of performance
statistics. Typical systems require users to compile the MIBs,
select the variables to monitor, build graphs, and set thresholds. This process
alone can take months, because a single vendor may have more than 500,000
NSM automation technology determines the SNMP capabilities
of every node and applies a data collection template. Based on this template,
the monitoring software automatically collects recommended SNMP statistics,
builds historic trend graphs, and applies a predefined, recommended threshold
Many lower-end products monitor via Internet Control Message
Protocol (ICMP) alone. If port 80 responds to a ping, the software marks the
HTTP service as operational. Unfortunately, ICMP monitoring does not check for
predicted response characteristics. A better approach is to determine that a
service is running and then
perform a full synthetic transaction to ensure the application is
responding appropriately. This approach exercises the underlying software by
running a synthetic or false transaction and measures its latency. An ideal
solution also allows system administrators to write custom pollers
in a variety of supported languages to build synthetic transactions for
in-house developed applications.
Pre-integrated, bundled architecture
Many NSMs are stand-alone
applications that require additional license fees–and additional training,
configuration, maintenance and tuning–for operating systems, databases,
reporting packages, and notification software.
A comprehensive, bundled solution has a full application
stack including operating system, Web server, Java server, and embedded
database so that no more products must be purchased, configured or installed.
Portal architecture enables IT organizations to assemble
favorite tools and applications into a single dashboard. This framework enables
common security architecture and supports a common “look and feel” for a mixed
bag of applications.
An operating model based on real-world experience
An effective NSM operating model is the result of lessons
learned while running large scale commercial data centers. It should include:
Schedules — should be part of the standard configuration so that engineers
are only notified when they are on call.
set grouping — allows routing of messages to the appropriate team members.
For example, Oracle notifications are sent to DBAs,
while network outage messages are sent to network engineers.
Manager — shortens downtime caused by the inability to locate or access a
device. Asset management lets administrators store key non-technical
information about a device’s location, access requirements, and vendor
Operating Procedures (SOPs) and the Document Manager — allows operators to
attach instructions regarding how to fix problems to network events. For
example, if the table space of an Oracle database is full, a DBA should be
able to link the notification with instructions for extending table space.
response — enables system administrators to build standard responses to
frequently-encountered problems. For example, if a service such as HTTP
goes down, an automated response can quickly and easily restart it.
The right solution
Network and systems management solutions should overcome the
cost and complexity concerns that have kept organizations from implementing
them or that have caused them to abandon their efforts. The right solution,
such as CITTIO’sWatchTower, offers:
overall investment when compared to traditional solutions
industry standards basis
model built on data center management experience
of heavy agents
architecture that supports personalization, rapid adoption of new
technologies, and robust security
and monitoring tools built on Internet-enabled technology
single interface for comprehensive 24 x 7 system control