Right, last week we looked at IBM Director, and I wasn’t very impressed. Let’s now take a look at Nagios and see if it’s any better. Nagios is an open source host, network, and service-monitoring system, used by many large corporations and even some government agencies! The very honestly named propaganda page will fill you in on where and how Nagios is being used.

Nagios is a very flexible system with masses of potential; here is the official word on what it can do for you:

  • Monitoring of network services (SMTP, POP3, HTTP, NNTP, PING, etc.)
  • Monitoring of host resources (processor load, disk and memory usage, running processes, log files, etc.)
  • Monitoring of environmental factors such as temperature
  • Simple plugin design that allows users to easily develop their own host and service checks
  • Ability to define network host hierarchy, allowing detection of and distinction between hosts that are down and those that are unreachable
  • Contact notifications when service or host problems occur and get resolved (via email, pager, or other user-defined method)
  • Optional escalation of host and service notifications to different contact groups
  • Ability to define event handlers to be run during service or host events for proactive problem resolution
  • Support for implementing redundant and distributed monitoring servers
  • External command interface that allows on-the-fly modifications to be made to the monitoring and notification behaviour through the use of event handlers, the web interface, and third-party applications
  • Retention of host and service status across program restarts
  • Scheduled downtime for suppressing host and service notifications during periods of planned outages
  • Ability to acknowledge problems via the web interface
  • Web interface for viewing current network status, notification and problem history, log file, etc.
  • Simple authorization scheme that allows you restrict what users can see and do from the web interface

Monitoring of host resources and network services is exactly what I’m looking for–add instant notification, and we are really cooking. Knowing that a problem exists ‘before’ the helpdesk phones go crazy with users complaining is a major benefit. Most of the time, issues can be resolved without many users noticing at all!

The support is good with Online Documentation, FAQ’s, Mailing Lists, and Forums.

So, installation and configuration–how was it? I currently have one server with Nagios installed, which is going to be the main monitoring station, but it’s currently only monitoring itself. We use SuSe Linux Enterprise Server, and Nagios is actually in the applications repository, so it can be installed with the YAST management tool. However, I took preference to compiling and installing Nagios from source, meaning I will have a better understanding of what’s actually going on. The install was uneventful, everything working as described in the installation documents. Configuration of the service monitoring wasn’t too difficult; I went for disk space, CPU load, and Memory Usage. These tools were in the form of plug-ins which are basically small scripts executed on the host and feedback information. Nagios can use SNMP to gather information; this will be very useful for monitoring server health, especially combined with HP’s Integrated Lights Out (ILO) and various other devices that offer SNMP availability.

All things considered, I am very happy with Nagios. It’s free, fully-featured and well supported. The installation went without a hitch–more than can be said about IBM Director. Setup and configuration will be time-consuming with a steep learning curve (my SNMP skills need to be developed); however, this is well worth the effort and won’t take long to return this investment of time.

I hope this proved useful, I would love to hear from anyone else using Nagios.