Gkrellm displays graphically, in real time, everything you may want to know about the health status of your Linux computer. Gkrellm also has a companion that very few people seem to know and use, even though it has been around for a while. This other piece of software, called gkrellmd, is a daemon that collects all the same statistics as its more popular, screen-oriented relative, but does not draw them on a screen. Gkrellmd, instead, transmits all those data in plain text format over a TCP connection.
The original and most common use of gkrellmd is to let you monitor via gkrellm other, remote computers. This is how C. Carey, for example, keeps an eye on his Tivo. The basic configuration is very simple: install gkrellmd on the remote machine, start it with a command similar to this:
nohup gkrellmd -u 3 -P 19150 -m 2 -a 127.0.0.1 -a 12.13.14.15 &
Then tell to a copy of gkrellm running on your computer to fetch all its data, over the network, from that daemon:
gkrellm -f -s 127.0.0.1 -P 19151 &
A detailed explanation of these two commands, complete with how to encrypt all the traffic with SSH, is in the same place where I first learned them — that is William Stearns’ gkrellm page.
Here, instead, I’ll show you how to make the next step. When I realized what gkrellmd does, I immediately thought that one may use it to delegate to a script the task of continuously monitoring some systems and taking action when a particular parameter passes some threshold. Using gkrellm to do all the low-level number crunching would relieve you from rewriting code that does the same thing.
What really interested me was that this approach to automatic monitoring would be useful not only for servers, but also on local computers. Having a script that pops up a warning window when, for example, there are at least N unread email messages, or any other combination of events that just happened, may be much more efficient and less distracting than having to stare all the time at the gkrellm GUI.
I have asked gkrellm developer Bill Wilson details about the format of the gkrellmd text data stream, and this (thanks Bill for your quick answer!) is a synthesis of what I learned. When it’s started, gkrellmd first transmits some configuration information and initial data (this is an edited excerpt):
Starting GKrellM Daemon 2.3.4
<gkrellmd_setup>
<version>
gkrellmd 2.3.4
<decimal_point>
.
<sensors_setup>
0 "k8temp@c3/temp1" 0 12779522 1 0.0000 0.0000 "NONE" "temp1" 0
....
<cpu_setup>
n_cpus 2
...
Then gkrellmd keeps sending a continuous stream made of lines defining parameter names (enclosed in “<brackets>”) followed by lines that contain the cumulative totals for those parameters:
<cpu>
0 3049980 3288 631108 16598679
<disk>
sda 4393239552 6047793152
...
The five numbers after <cpu> are CPU number, user_time, nice_time, sys_time, and idle_time. The disk label, instead, provides bytes read and written for each disk name. The complete specification of these fields is in the attached file, but the snippet above is enough to go on to the real question: how do you write a script that understands this stuff and takes some action when necessary? I modified the one Bill sent to me to explain the general procedure:
1 #!/usr/bin/perl
2
3 use strict;
4 use Net::Telnet; #Perl module available for all distros
5 my $done = 0;
6 my $IS_CPU = 0;
7
8 #IP address or host name of the computer to be monitored
9 my $hostname = "127.0.0.1";
10
11 # Open the gkrellmd server at hostname and port 19150.
12 my $gkrellmd = new Net::Telnet (Timeout=>10, Errmode=>'die');
13 $gkrellmd->open(Host => $hostname, Port => 19150);
14
15 # Print id string to gkrellmd server, so it knows you're listening
16 # and starts transmitting data.
17 $gkrellmd->print('gkrellm 2.3.5');
18
19 # Read and use lines of data from gkrellmd server.
20 while ($done == 0) {
21 my $line = $gkrellmd->getline;
22 $IS_CPU = 1 if ($line =~ m/^<cpu>$/);
23 if (($IS_CPU == 1) && ($line =~ m/ /)) {
24 # we are looking at CPU data
25 my @CPU_DATA = split (/\s+/, $line);
26 print "CPU Warning: $CPU_DATA[1]\n" if ($CPU_DATA[1] > 3000000);
27 $IS_CPU = 0;
28 }
29 }
30 exit:
The script uses the Net:Telnet Perl module (available as native binary package on most distributions) to connect to gkrellmd and activate it (lines 11-17). Line 22 recognizes when CPU data are coming, and line 26 loads the second number of the next line, that is the user time. Whenever that number is above 3000000, a warning is printed.
Of course, as is, the script above is only useful to illustrate the concept, but it can be very easily extended to do whatever you want (send an email or even reboot the system) and to act on any combination of events signaled by the output of gkrellmd. It would also be trivial to use gkrellmd and the script to generate custom charts, if that’s what you need, of system conditions over any period of time. Just tell the script to print all or only the numbers you need from gkrellmd to a file, in a format usable by Gnuplot or similar utilities, and you’re done. Try playing with gkrellmd and let me know!