In my introduction article, "Big Brother is watching your network," I explained how to install and configure this powerful, open source network- and server-monitoring tool. After completing the instructions in that article, you should have a working Big Brother system that fully monitors both a Linux server and a Windows 2000 server.
Big Brother also offers two functionalities that can decrease your response and troubleshooting time when network troubles crop up: e-mail alerts and an easy-to-read Web display. Alerting is what makes a monitoring system worth the time to set up and administer. By having a system alert the administrator when a problem arises, admins can troubleshoot the network problems more quickly. Also, by making the Web display of the network status easier and faster to navigate, especially on sites where there is a lot of equipment to monitor, you can find problems more quickly. Here, I will explain how to take advantage of Big Brother's e-mail alert and Web display functionalities.
It’s 3 A.M., and you’re sleeping soundly when, suddenly, your pager goes off! You can blame Big Brother for some sleepless nights, but that’s just the software doing its job. Armed with a reliable alerting system, Big Brother allows no network problem to go unnoticed.
For the examples in this article, I will assume you've completed the steps in the previous article and have Big Brother up and running. Note that I am running the software on a Red Hat Linux 7.1 server. Please make the necessary adjustments for your Linux version.
Big Brother can send out a notification in many ways. The type of notification can be made dependent upon the error condition. For example, at 3 A.M., a systems administrator may want to be alerted that a major portion of the network has gone down. On the other hand, if the error was something less serious, such as an HTTP instance on a clustered server having trouble, the admin might prefer a simple e-mail notification sent in his or her inbox.
The types of notifications available with Big Brother include e-mail alerts, pager alerts, and SMS notifications. In this article, I will explain how to set up e-mail alerts.
A few notes
In my previous article, I was monitoring the system on which Big Brother was running as part of the example. I will continue to use this example in this article.
If you recall from my previous article, I used the bb-hosts file in /usr/local/bb/etc to configure which hosts to watch. To set up the notification and alerting feature, Big Brother uses a file named bbwarnsetup.cfg, which is also located in /usr/local/bb/etc.
To start, rename the bbwarnsetup.cfg to bbwarnsetup-orig.cfg, create a new bbwarnsetup.cfg file, and add the following line:
You need to put this line in so that you'll be notified if Big Brother cannot read the system messages, which would indicate a system problem.
A few minutes after I added this line, I was notified that the messages file couldn't be read with the following e-mail message:
From bb Sat Jan 12 12:19:45 2002
Date: Sat, 12 Jan 2002 12:19:45 -0500
Subject: !BB - 1307610! localhost.msgs - 400192168065129
 localhost.msgs red Sat Jan 12 12:19:44 EST 2002 Urgent message file problems reported
&red /var/log/messages is unreadable
Please see: http://localhost.localdomain/bb/html/localhost.msgs.html
Let me explain the format of the bbwarnsetup file. Each line in the bbwarnsetup file takes seven parameters (listed in Table 1), with each parameter separated by a semicolon (;).
|Parameter number||Parameter name||Description|
|1||Host||This defines which host to watch. The parameter must match an entry in the bb-hosts file, because that file dictates which hosts and services are being watched. An asterisk (*) can be used as a wild card to watch all hosts.|
|2||Exhosts||This indicates which hosts to exclude. It's useful in the event that a wild card was used as the host in the first parameter and a host must be excluded.|
|3||Services||This tells which service to watch. Again, an asterisk can be used as a wild card to monitor all services.|
|4||Exservices||This parameter indicates which services to exclude.|
|5||Day||This indicates which day to use the rule (0 – 6, Sunday through Saturday). An asterisk indicates all days.|
|6||Time||This tells what time interval to use the rule (0000 – 2359, midnight to 11:59 P.M.). An asterisk indicates all times.|
|7||Recipients||Here, you can indicate an e-mail address, numeric pager number, or a number of other options that can be defined by the user. Big Brother is very extensible.|
The rule above that was added to the bbwarnsetup.cfg file when read literally says “Notify root at localhost if the messages file on 192.168.65.129 indicates an error condition.” But how would you write the rule if you only want to be notified when there is a problem with messages during working hours, assuming working hours are Monday through Friday from 9 A.M. to 5 P.M.? You would use this rule:
192.168.65.129;;msgs;;1 2 3 4 5;0900-1700;root@localhost slowe@localhost
The numbers 1 2 3 4 5 are days of the week as required by Big Brother, and the 0900-1700 defines the time range. Notice that there are two e-mail addresses indicated. Big Brother is very flexible! This example rule will watch all of the hosts defined in the bb-hosts file, but we’ll only watch their http instances:
Syntax is extremely important in these rules. To ensure that the Big Brother rules you create are valid, the designers have included a script called bbchkwarnrules that checks the syntax for the rules. It also resides in the /usr/local/bb/etc directory.
To use the script, simply type /usr/local/bb/etc/bbchkwarnrules.sh at the command line after editing the bbwarnrules file. The desired output from bbchkwarnrules is a blank command line, which indicates that the script found nothing wrong.
Troubleshooting an invalid rule
Take a look at the following rule:
On first glance, it doesn’t look too bad, but when you run bbchkwarnrules, notice what happens:
[bb@localhost etc]$ ./bbchkwarnrules.sh
invalid rule: <*; ;http; ;*; ;*;root@localhost>
Notice that it’s an invalid rule. When you look closely, you can see an extra semicolon between the asterisks (*). Be forewarned, however; using bbchkwarnrules will only catch the glaring errors. For example, replace http with ytew, which stands for absolutely nothing that I know of:
then look at the output of bbchkwarnrules:
[bb@localhost etc]$ ./bbchkwarnrules.sh
No output means no errors detected even when we replaced the http with the ytew as the service to watch. Bbchkwarnrules is designed to make sure that the number of parameters is seven—nothing more and nothing less.
Avoiding too much notification
If you want a surefire way to fill up your inbox with notifications, try this rule:
It says to notify at all times on all configured hosts on all services. You might want to avoid rules like this one.
For additional help
There is so much more to notifications than I can possibly put into one article. If you find yourself in trouble with a particular rule, trying taking a look at the archives of the Big Brother mailing list.
Setting up the display host for even faster troubleshooting
An easily read display is important when troubleshooting a network issue. In my previous Big Brother article, I explained how to get items on the Big Brother Web page to show their individual status. Big Brother can do much more than that, however.
There aren't too many systems on the screen shown in Figure A, but they are all of different types: Some are UNIX servers, some are Windows servers, and some are network devices.
|Here, you see an unformatted Big Brother status screen.|
Let me pare this list down a little by introducing some organization to it. Big Brother allows items on the display to be grouped using HTML tables and allows sub pages to be used for sets of devices. All of these options are configured in the bb-hosts file, along with the hosts to be monitored. The best way to see how this works is via an example.
Here are the contents of my /usr/local/bb/etc/bb-hosts file. Please note that I cheated and repeated the same entry several times to get a display with significant data.
group Linux Servers
192.168.65.129 localhost !msgs
172.16.1.88 apple # BBPAGER BBNET BBISPLAY ssh
group Windows Servers
group Network Equipment
Notice the Group statements in the file. They allow Big Brother to group the entries directly below it into a table. Figure B shows the output from this file.
|Notice that each category has its own header and that the display is easier to read.|
The output can also be put onto a separate page using the Page directive, which uses this format:
Page pagename description
The Pagename parameter is the name of the HTML document that will be created. For example, if you place a page network Network Equipment directive in bb-hosts, a file named network.html is created with the devices listed below the directive inside it. Only a link to this page will appear on the main Big Brother page.
Consider the following example bb-hosts file:
page linux Linux Server
192.168.65.129 localhost !msgs
172.16.1.88 apple # BBPAGER BBNET BBDISPLAY ssh http:/172.16.2.88 !msgs
page windows Windows servers
page network Network Equipment
Notice the Page directives that now replace the former Group directive. Figure C shows the output of the main Big Brother HTML page after you’ve customized it.
|This is the detail for a customized Linux page.|
Notice that the Linux servers are red and Windows and Network are green. Clicking in one of the status indicators will bring up the detail for that section.
Organization and alerting are the two most important aspects to any monitoring system. Without some sort of alert system and an organized display to show the status of your network, your response time will be hindered, leaving your network down longer. Using the e-mail alerts and configuring the Web display for Big Brother will help you develop a more cohesive and responsive troubleshooting system.