Leadership

A sample app to monitor performance counters and send alerts


This blog post is also available as a TechRepublic download, which includes a sample project containing all the code needed to get this utility up and running.

There are many instances where you may want to use the information gathered from a performance counter. From memory usage, to ASP.NET page recompilations, there is a wealth of information available from performance counters. The problem comes when you want to use that information in a practical manner. A performance counter graphing application is included with Windows, but it has limited functionality. It would be nice to have an application that can monitor a performance counter and send e-mail alerts when certain conditions are met.

For example, the system I work on uses Microsoft Message Queuing extensively. We needed a way to monitor the message queues to make sure that messages were flowing through them properly. It just so happens that each message queue publishes its current message count to a performance counter. We could use these performance counters to monitor the queues, but we had no notification services to alert us to issues during nights, weekends, or any other time when someone wasn't actively looking at the performance counter.

Here is how to implement a utility that can monitor multiple performance counters and send an e-mail to alert you that something isn't working like it should. For instance, this utility can be configured to alert you if the performance counter's value is above 500 and doesn't decrease for two minutes.

Configuration

This utility is configured by changing the values in an XML file called Logic.xml. This file controls which counters are monitored and what conditions cause an alert to be sent. There is an example of this included in the sample application included with the download version of this document. There are a few key sections of this file that you will want to pay close attention to:

  • MonitorEntry -- This section indicates that a specific performance counter should be monitored. Here you will instruct the utility on how to access the specific counter you wish to monitor.
  • Alert -- Each MonitorEntry can have one or more Alerts. Alerts are really the heart of this utility and are used to determine when an e-mail should be sent.
  • Constraint -- Each Alert will contain one or more constraints. These constraints are the rules that the Alert will use to determine if an e-mail should be sent.
  • Recipients -- Each Alert can be setup to send to a different e-mail address (or multiple addresses).
  • E-mailSettings -- Within this node you will define how the utility will communicate with your e-mail server.
If you take a look at the sample Logic.xml file, (Listing A) I think you will find that the configuration for this utility is easy to understand.

Listing A

  <?xml version="1.0" encoding="utf-8" ?>

- <Logic>

- <Monitor>

- <MonitorEntry PollTime="10000" CategoryName="MSMQ Queue" CounterName="Messages in Queue" InstanceName="dev-appheartbeat" MachineName="dev-app">

- <Alerts>

- <Alert Title="Alert 1" Message="Messages have been stagnent for two seconds.">

<Constraint Check="LastValue" GreaterThan="0" />

<Constraint Check="UnchangedSeconds" GreaterThan="2" />

- <Recipients>

<Recipient>zs_box@hotmail.com</Recipient>

</Recipients>

</Alert>

- <Alert Title="Alert 2" Message="There are more than 500 messages in the queue!">

<Constraint Check="LastValue" GreaterThan="500" />

- <Recipients>

<Recipient>zs_box@hotmail.com</Recipient>

</Recipients>

</Alert>

</Alerts>

</MonitorEntry>

</Monitor>

- <EmailSettings>

<FromAddress>someone@somedomain.com</FromAddress>

<SMTPServer>127.0.0.1</SMTPServer>

</EmailSettings>

</Logic>

Constraints

As mentioned above, each Alert has a set of constraints to go along with it. These constraints will evaluate to either true or false, and for an Alert to be sent, all constraints for that particular alert must return true.

For instance, the sample application includes these constraints in the logic.xml file for the first Alert:

<Constraint Check="LastValue" GreaterThan="0"/>
<Constraint Check="UnchangedSeconds" GreaterThan="2"/>

You can read that as:

Only send the alert if the performance counter has a value of greater than zero, and that value hasn't changed for over two seconds.

If either of those constraints returns false, then the alert will not be sent.

Constraints can be broken down into two parts, the value represented by the "ContraintCheck" and the value represented by "ConstraintType":

  1. ContraintCheck tells the constraint which value to examine. The following are valid ContraintCheck values:
    • LastValue - The last value of the performance counter
    • UnchangedSeconds - The number of seconds the performance counter has remained unchanged.
    • IncreasedSeconds - The number of seconds the value of the performance counter has continued to increase.
    • DecreasedSeconds - The number of seconds the value of the performance counter has continued to decrease.
  2. ConstraintType determines how to compare the value of the ConstraintCheck to the value supplied by you as a target value. The following are valid constraint checks, and they are pretty self-explanatory:
    • GreaterThan
    • LessThan
    • NotEqualTo
    • EqualTo

This can all be a little confusing, so I think some examples are in order. Lets assume that I need to be notified whenever the performance counter reaches 1,000 or more. I would create this constraint:

<Constraint Check="LastValue" GreaterThan="1000"/>

Now let's complicate this and say that I only want to be alerted when the counter is over 1,000 and it has not changed for over a minute:

<Constraint Check="LastValue" GreaterThan="1000"/>
<Constraint Check="UnchangedSeconds" GreaterThan="60"/>

That's obviously a little more complicated, but should be fairly readable and understandable. You can continue to stack constraints on top of each other in this fashion until you have the rule that you need.

Testing the constraints

Each MonitorEntry will run at an interval defined by the "PollTime" attribute in the MonitorEntry node of the Logic.xml file. Every time a MonitorEntry runs it pulls the current value from the specified performance counter and stores that value. It then uses that value to determine if the counter has increased, decreased, or stayed the same in comparison to the previous run. The code in Listing B shows this logic:

Listing B

    private void timer_Elapsed(object sender, ElapsedEventArgs e)

{

if(Counter.RawValue != this.LastValue)

lastChanged = DateTime.Now;

if(Counter.RawValue > this.LastValue)

this.lastIncrease = DateTime.Now;

if(Counter.RawValue < this.LastValue)

this.lastDecrease = DateTime.Now;

IncreasedSpan = new TimeSpan(DateTime.Now.Ticks - lastDecrease.Ticks);

DecreasedSpan = new TimeSpan(DateTime.Now.Ticks - lastIncrease.Ticks);

UnchangedSpan = new TimeSpan(DateTime.Now.Ticks - lastChanged.Ticks);

this.LastValue = Counter.RawValue;

SignalAlerts();

if(this.OnPolled != null)

this.OnPolled(this);

}
This code is in the MonitorEntry.cs file of the sample application. Up until the call to SignalAlerts, the code is simply recording the counter's data and determining if it has changed. SignalAlerts is the function that actually tests the constraints to see if an alert should be sent. This is done by calling the CheckConstraints function of the Alert class for each alert within the MonitorEntry. This code is shown in Listing C.

Listing C

   bool allMet = false;

long checkValue = 0;

for(int i = 0; i < Constraints.Count; i++)

{

Constraint constraint = (Constraint)Constraints[i];

if(constraint.ConstraintCheck == ConstraintCheck.LastValue)

checkValue = entry.LastValue;

else if(constraint.ConstraintCheck == ConstraintCheck.UnchangedSeconds)

checkValue =

Convert.ToInt64(Math.Round(entry.UnchangedSpan.TotalSeconds, 0));

else if(constraint.ConstraintCheck == ConstraintCheck.IncreasedSeconds)

checkValue = Convert.ToInt64(Math.Round(entry.IncreasedSpan.TotalSeconds, 0));

else if(constraint.ConstraintCheck == ConstraintCheck.DecreasedSeconds)

checkValue = Convert.ToInt64(Math.Round(entry.DecreasedSpan.TotalSeconds, 0));

if(constraint.Check(checkValue))

allMet = true;

else

{

allMet = false;

break;

}

}

if(allMet && this.OnConstraintsMet != null)

this.OnConstraintsMet(this);

As you can see this simply goes through each constraint for the alert and determines whether or not the constraint is met. If it is, the code continues to check all other constraints and if they are all met the OnConstraintsMet event is triggered. Once that is triggered an e-mail is sent to whoever is setup as the Recipient for the alert.

Download the sample project

While the complex parts of the utility are shown in this document, the sample project contains all the code needed to get this utility up and running. I encourage you to download the project and modify it to suit your needs. An interesting note is that performance counters can be read remotely, so you could easily monitor your production applications without having the utility directly installed on your production servers.

3 comments
Justin James
Justin James

Personally, for big systems that reqauire monitoring, I think SMNP alerting is the way to go, with email as a secondary option. Most shops now have SNMP monitoring on the hardware anyways and people who receive those critical alerts around the clock, so why not make use of that existing infrastructure, or at least give the option? J.Ja

zs_box
zs_box

SNMP can certainly be used - the code provided could easily be modified for that need. There are many cases, though, when email is the best option. Especially with the advent of text messaging on cell phones and the ability to send a text by sending an email.

Justin James
Justin James

I agree, email can be great. What I like about SNMP is that: * You can have an external vendor/service manage it for you * It lets the end user set up alerts the way you want * Most organizations above a certain size already have the tools and policies in place to watch it Thanks for the article though! I agree that SNMP would indeed be pretty easy to drop in there. J.Ja

Editor's Picks