For net admins, network downtime means more than just a problem with the network; it means calls from users, constant queries from the boss, and lost company time and money. But without constant babysitting, keeping a network running is a daunting task. To keep constant watch over your network, I've developed a Perl script that allows you to automate network monitoring and alerts you when there's a problem.
How it works
What we will be looking at is a simple Perl script that monitors the state of both your internal and external network connection, and alerts the administrator when a problem arises. Periodically, using cron or another scheduler, the script will ping an external address. If ping returns an exit code of 0, meaning the site was reached, the script terminates and life carries on as usual. If it returns an exit code of 1, which means the site was unreachable, it tries to ping a secondary IP address. If that also fails, the script will stop and start the network on the specified interface (be it eth0, eth1, or eth2, etc.), and will then attempt the ping test again. If both tests fail, an e-mail is sent to a specified e-mail address.
There are a few things to keep in mind here. First, the script needs to be run by cron or another such scheduler, which means you can have it run every five minutes (probably excessive) to once a day (probably too conservative), or any time in between. A good time schedule might be to have it run every 30 to 60 minutes. The total running time of the script should be no more than two minutes—and that's only if both ping tests fail.
Second, it will alert an administrator by sending an e-mail. Because the script does monitor network connectivity, you'll want the script to e-mail an actively monitored local account. Sending an e-mail to an account elsewhere on the LAN or Internet may not work because if the message is being sent, the network is down and couldn't be restarted locally. This means the e-mail won't actually be sent until the network is once again available, at which time the e-mail won't be required any longer.
Note that you can also change the alert to whatever method you prefer. For instance, instead of e-mail alerts, you may want to make use of the wall command to echo a message to every terminal opened on the network to alert users that the network has failed.
Some may ask, "Why not reboot the system?" The simple answer here is "because unattended reboots are a really bad idea." If you run this script on your workstation, do you really want to have the system shut down, without warning, while you're editing a spreadsheet or some other important document? Probably not. To err on the side of caution, it simply alerts the admin or owner that the network needs attention.
Keeping these things in mind, take a look at the script itself.
Break it down
The first few lines of the script define variables you will be using. Specifically, $ping1 is the first IP address to test, and $ping2 is the second address. The variable $admin is the e-mail address to send mail to for administration purposes; in this case, it is the local user admin. The $netdev variable is the Ethernet device the ping tests will be sent through and that will be restarted should an outage occur. If you have more than one network card on the system, you can use this to tell the script which card you are interested in testing. While routing commands specify the actual destination (i.e., a LAN address ping may go through eth0, while an external address ping may go through eth1), you must tell the script which Ethernet device you want to restart to try to bring the network back up. Finally, the $msgbody and $msgsub variables contain strings that will be used for the e-mail message body and subject.
Next, you must define a subroutine called pingtest. For example, because you could be performing a ping test four times, it makes sense to use a subroutine to make the code easier to read and easier to debug. This routine simply takes the IP address as an input variable and assigns it to the variable $ping. It then performs the ping command itself with the system() function, in which you use sprintf() to correctly format the command so you can tell the ping command what Ethernet device and IP address to use.
The $retcode variable contains the exit code of the ping command. Perl uses $? as a built-in variable to return system information, and you must do a bit-wise shift of eight bits to obtain the exit code of the previously executed system() command. Finally, you return the exit code as we exit the subroutine.
Starting the test
Next, you actually start the tests. Assign to the variable $test1 the return value of running the pingtest subroutine with the input value of the $ping1 variable. This means your subroutine attempts to ping the IP address specified by $ping1, and the $test1 variable contains the exit code of the ping command, either a 0 for success or 1 for failure.
Next, test to see if $test1 is equal to 1, which means it failed. If it is, you then call the pingtest subroutine with the IP address stored in the $ping2 variable, while assigning the exit code to $test2. If $test2 also equals 1, you attempt to shut down and restart the network device specified in $netdev by calling the ifdown and ifup scripts. These are both native Linux scripts for bringing a networking interface up or down.
Once you've done this, run the pingtests again, but use $rtest1 and $rtest2 as the variables to hold the exit codes returned by ping. If both of these tests fail, you again issue the system() command, but this time call /bin/mail to e-mail a message to the user specified in $admin, with your defined message body and subject in $msgbody and $msgsub. If you get to this point, exit your script with an exit code of 1, so any other scripts you write around this one know that your tests failed and the network is down. Otherwise, exit with a code of 0, meaning the network is fine.
As you can see, the script is extremely simple to both write and use, and it does an effective job of monitoring the network. It tries to restart the network using the relatively generic ifup/ifdown scripts found on almost every Linux distribution. If you're using something other than Linux, modify those commands to suit your OS.
Remember to ensure that the script is being run by a scheduler as the root user. You'll want this script in root's crontab every 30 to 60 minutes, because normal users cannot restart the network and errors would occur. To add this to root’s crontab, first run the command (as root) crontab -e and add the following line:
30 * * * * /path/to/script
where /path/to/script is both the filename you give the above script and the direct path to the file.
If you wished to use this on a system where you don't have root access, you can modify the ping command and remove the -I parameter to specify the interface. You can also remove the attempts to restart the network and the subsequent ping tests. You would still benefit from having two different ping tests executed periodically.
Once again, simple scripts save the day. Dabbling in Perl, Bash, Python, or any other command-line-driven scripting language can save you a lot of time and effort in eliminating or shortening tedious tasks.
Would you like to see more scripts like this? If so, drop Jack Wallen, Jr., a line and let him know.
Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.