Linux

Nagios monitoring with NRPE allows better tracking of remote systems

The NRPE plugin allows you to track exactly what is going on with remote servers, both from an external view and an internal view, including disk usage, CPU spiking, and memory issues.

Nagios is a really great host and service monitoring system with a lot of flexibility and power. It is not the easiest system to set up, but with a little patience, determination, and past tips on TechRepublic, the task is much less daunting.

One addition to Nagios that was never covered previously was using NRPE to monitor services on remote systems that are not typically exposed on the network. For instance, it is easy enough to monitor whether or not HTTP or SMTP services are available by checking on them remotely, but how do you determine whether you are running out of disk space, or if the load average has spiked? These things cannot be easily determined without having local access to the system. One way to accomplish this is with the check_by_ssh command that hasbeen looked at previously, but an even better way to do so is with the Nagios Remote Plugin Executor (NRPE) daemon.

What NRPE does is run checks on a system remote from the central Nagios server, allowing Nagios to query it as if the checks were run locally. In essence, Nagios talks to NRPE, asks it to run a specific check, waits for the response, and logs it along with everything else it watches. These are checks that could only be run locally: checking the number of users, load average, disk space usage, available memory, whether the local system can query DNS, and so on. While NRPE's function is very similar to the check_by_ssh plugin, the overhead is much smaller, making it faster and more efficient.

To begin, you will need the NRPE daemon and the local Nagios plugins to be installed on the remote server. Using Red Hat Enterprise Linux 5, the NRPE and Nagios plugins are available via EPEL or RPMForge. Via EPEL, you would install NRPE and a few plugins using:

# yum install nrpe nagios-common nagios-plugins nagios-plugins-{disk,dns,users,load,procs}

This installs NRPE and enough Nagios plugins to at least get started. The main NRPE configuration file is /etc/nagios/nrpe.cfg, and this is where you can determine which checks NRPE will execute, and from which hosts these checks will be permitted. Also be sure that these checks run as a special user -- either an 'nrpe' user or 'nagios' user. With the EPEL packages, NRPE is pre-configured to run as the user 'nrpe', and that user is created upon package install.

One way to lock down which hosts can access NRPE is to change the allowed_hosts option. While NRPE does do some access control, and this is a valid way of specifying allowed hosts, perhaps a better way would be to configure the firewall to only allow a specific IP address to connect to the port that NRPE is listening to.

At the end of the file are the various configured checks. These are the only checks that NRPE will perform; if the central Nagios monitor requests a check that is not listed here, it will not execute it. For instance:

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

This will create two commands that NRPE will respond to: check_users and check_load. The name noted in square brackets is the name of the command that Nagios must call via the check_nrpe plugin.

Once NRPE is configured, you can start the NRPE service to have it begin listening to requests:

# chkconfig nrpe on; service nrpe start

Once it is started, run a test on the Nagios server, to make sure it can talk to the remote NRPE daemon:

$ /usr/lib64/nagios/plugins/check_nrpe -H 192.168.100.12 -c check_load
OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;

This calls the check_nrpe plugin and tells it to connect to the host 192.168.100.12 and run the check_load command, which is defined in the NRPE configuration file. If the check_nrpe command returns a string like the above, NRPE is running and you can integrate check_nrpe into your existing Nagios configuration to start examining local services on the remote servers. If not, double-check that the firewall is allowing access to port 5666 (the default) on the remote system and that you have the correct plugins defined.

As an example, you might define a "check_nrpe_load" command in the Nagios server's commands.cfg, which will be used to check the load on remote NRPE daemons:

# 'check_nrpe_load' command definition
define command {
        command_name    check_nrpe_load
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c "check_load"
}

and a corresponding service in services.cfg:

define service {
        use                             nrpe-service
        hostgroup_name                  nrpe-services
        service_description             Current Load
        check_command                   check_nrpe_load
}

And then define the hostgroup "nrpe-services" for those hosts that have NRPE installed, via hostgroups.cfg (the servers "server1", "dns", and "server2" in the following example):

define hostgroup {
        hostgroup_name  nrpe-services
        alias           Nagios via NRPE
        members         server1,dns,server2
}

NRPE is a great way to get additional information that tends to be protected from outside viewing, in a way that is easy for Nagios to consume and track. It does require a few extra steps, and it would be a good idea to use iptables to restrict access to the NRPE port (5666, by default) to only authorized IPs.

With NRPE running, you can easily watch for disk usage, CPU spiking, memory issues, and other things that you would not be able to see without it. This gives you a means of tracking exactly what is going on with remote servers, both from an external view and an internal view. And because Nagios is so versatile and you can easily write your own plugins, there really is very little you can't monitor with Nagios, even on remote servers.

About

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

4 comments
gracedman
gracedman

Thank you, Vincent. We are big Nagios fans but eventually chose OpenNMS for our primary monitoring system but it was truly neck and neck with great admiration for both. I'd be curious to hear how others have fared who have evaluated both. We were as much concerned with trending and autoremediation as monitoring. We also wanted to standardize on SNMP rather than introduce NRPE. All of that can be done with Nagios and, believe it or not, the Nagios configuration is much easier and more flexible. However, OpenNMS seemed to edge it out on SNMP support, auto-discovery (for better or worse!), scaling to very large infrastructures, and lower network overhead (though much higher management server overhead). But, as I said, OpenNMS edged out Nagios by just a nose. Where have others come out in similar real world comparisons? - John

daboochmeister
daboochmeister

Vincent, do you know if the protocol Nagios uses to talk to the NRPE is SSL/TLS encrypted, and/or permits any authentication controls? Or, can the NRPE plugin be configured to only allow access from localhost, and an ssh tunnel be used between the Nagios server and the monitored server? (allowing for ssh-standard authentication approaches, e.g., certificates)

shahbaz.ali
shahbaz.ali

Hi John, Any idea how to configure the OpenSNMP for Vmware ESXi, As we haev enabled snmp in ESXi and its community deatils. Still we do not get its updates in OPenNMS. Anyone have any clue please share. Best Regards, Shahbaz szafar_66@hotmail.com

vdanen
vdanen

It doesn't look like NRPE does SSL or TLS on its own. I could be wrong, but looking at my NRPE configuration file, I don't see any hints about SSL/TLS. Authentication controls are there in a limited scope (and by that I mean you can define what addresses can talk to it). But, NRPE will only hand out information you want it to. I don't know if you think that load averages or disk usage are super-sensitive, and I suppose it depends on what you want to accomplish. I believe a previous tip discussed using the check_by_ssh plugin which will use SSH keys, etc. That will give you all the encryption and authentication you might want. Here it is: http://blogs.techrepublic.com.com/opensource/?p=321 There is also another mechanism called NRDP which looks like you can run on a site over HTTPS, which would presumably allow for SSL and probably basic authentication for access. I've not looked at it yet so I don't know how it compares, although I plan to look at it in the future (would probably be nice for those who host a web site but don't have the ability to install new services or ssh into it). http://exchange.nagios.org/directory/Addons/Passive-Checks/NRDP-%252D-Nagios-Remote-Data-Processor/details Hope that helps.