I like infrastructure monitoring tools and have tested quite a few different products. One product that I am currently testing is the latest version of a product that I used to run at a previous employer: Paessler PRTG. The latest version is PRTG 9, and it boasts some cool new features over older versions but also carries with it a serious limitation.
Bear in mind that I’m running PRTG 9 in a pretty basic state right now as I’m simply testing the product so I haven’t carefully categorized and pruned the items that are being monitored.
A PRTG primer
In PRTG-land, sensors are created to monitor individual performance elements. A sensor is the most basic monitoring item. In the PRTG FAQ, the company “define[s] one (1) sensor as any particular, individual monitoring entity.” A single sensor might be responsible for watching available disk space on the drives of a server while a second sensor might be responsible for watching the disk queue length. PRTG does not operate on the notion of monitored devices or IPs. Instead, you purchase sensor licenses and you can monitor as deeply or as high level as you like as long as you remain within the licensed sensor count.
The company indicates that any reasonable desktop computer should be able to easily monitor 1,000 or more sensors. Also from the PRTG FAQ: “SNMP V1/V2, PING, PORT, and HTTP are the recommended sensor types for scenarios with thousands of sensors. With these technologies up to 20.000 sensors are possible” in a single PRTG installation.
Object hierarchy
I’ve mentioned that the basic monitoring unit is a sensor, but there are higher-level groupings that contain these sensors. Immediately above sensors, you are at the device level. All of the sensors related to a single device fall into this hierarchy level.
Above that is a group. You can include many devices in a single group, which are used purely for organizational purposes. You can also nest groups to make it easier to navigate your monitoring hierarchy.
Next up, you’re at the probe level, which is included inside the root group. You can have many probes inside your single root group. A probe is the “platform on which the monitoring takes place. All objects configured below a probe will be monitored via that probe.”
Again, from the PRTG FAQ, here is a look at the object hierarchy (Figure A).
Figure A
The PRTG object hierarchy
A look at PRTG in action
My goal in the previous section was not to delve deeply into PRTG but to provide you with some context about what you’ll be looking at in the rest of this article. Again, the installation you’re looking at is for “play” only for now.
In Figure B, you’ll see a high level look at the monitored environment. Currently, I’m showing everything — error, warning, and good status sensors. By deselecting the appropriate checkbox at the top of the screen, I can more easily drill down into problem areas. For example, if I deselect everything except the red box, I will be shown just the sensors that are in an error state. This is one view that I really like in PRTG.
In Figure B, you’ll also notice that each monitored device is on a single line with sensors in various states to the right. Rather than show you every sensor, PRTG just tells you that, for example, 11 sensors are in a green state and highlights the ones that are problems.
I also want to note that I have not yet changed any of the default thresholds for PRTG’s monitors, so much more shows up as yellow or red than it would if I were to put PRTG into production.
Before you continue, take a peak at the upper right-hand corner of the screen. You’ll see that 24 sensors are in an error state, 44 are in a warning state, 1001 are green and 79 sensors are currently paused. I’ll explain a little later why 79 sensors are paused.
Figure B
Click to enlarge.
In Figure C, I’ve drilled down to a network device, which is a core router/switch. Here, I’m shown all of the sensors that are available for that device; there are 123. I’m primarily interested in bandwidth utilization here and have moved my primary item of interest – our Internet connection – to the top of the list so that I can see that first. The switch port is named “NetEnforcer switch – Inside Interface” on the switch.
Figure C
Click to enlarge
Once I click on the NetEnforcer interface sensor, I’m drilling a bit deeper into the statistics, as you can see in Figure D. Here, I’m getting detailed information about that port’s current and historical status. Currently, that port is in a OK state and utilization is just above 72 Mbps. At the right hand side of the window, you can see some other graphs. The top graph shows real-time live data while the graphs below get a bit less granular but show you trends.
Figure D
Click to enlarge
Figure E is a big picture view of the Live Data tab at the top of the port-monitoring window. This gives me a great overview of what’s going on. As you can see, during the monitored period, we’ve used as much as 94 Mbps of Internet bandwidth at any one time and dropped to a minimum of around 64 Mbps. Our connection to the Internet is 100 Mbps.
Figure E
Click to enlarge
The two day view of Internet traffic shown in Figure F shows you the ebb and flow of our Internet usage and identifies that we’ve peaked at just over 96 Mbps and dropped to as low as around 3 Mbps in the wee hours of the morning. This kind of graph identifies traffic patterns that can help us in planning.
The blue portion of the graph shows outgoing traffic, which is much, much lower than incoming.
Figure F
Click to enlarge
PRTG is much more than just a traffic monitor, though. The product has the ability to deeply monitor enterprise level services such as Exchange and provide metrics that help administrators take action when necessary. In Figure G you can see that average message delivery time is currently 3,179 ms on our Exchange system. Based on that information, I’ll independently verify that PRTG is reporting accurate information and, if it is, take action; that seems a bit high but I need to check it out.
At the top of this window, note also that you’re quickly shown the number of sensors in various states on this device. One sensor is in an alarm state while a second is in a warning state.
Figure G
Click to enlarge
Cons
PRTG is not perfect. I do like the tool, though. Although the interface at first was confusing, after a few days of use, I liked it. And, the product is not outrageously expensive.
I have noticed some sensors that report really crazy stuff. Upon further investigation, I’ve discovered that some sensors simply get bad information back from source systems or some sensors just don’t display accurate information. That said, the sensors that I’ve seen do this are far from critical and it hasn’t been common.
Perhaps the biggest shortcoming is PRTG 9’s current inability to monitor vSphere 5 and vCenter 5 systems. With most of our environment running in VMware (the hosts are all 4.1, bur vCenter is at version 5 due to a need to implement a VMware View 5 pilot), having no insight into VMware is a non-starter. As of October 6, Paessler indicated that there was no ETA for the monitoring capability. This, unfortunately, is a huge red mark for what seems to be an otherwise fine product.
Summary
Once vSphere/vCenter 5 support is included in PRTG 9, I believe that the product will be a great investment for many organizations. Although it’s not perfect, the kind of oversight monitoring that I need is done very well in PRTG 9, providing easily identified cues that indicate where action is required.