Most people seem to accept the idea that computers aren't perfect and that, occasionally, system problems do happen. Even so, it never fails to amaze me how upset users get when they are unable to access their e-mail. Given this fact, I greatly prefer to avoid e-mail outages.
The only problem with avoiding e-mail outages is that an out of the box Exchange configuration does little to alert you to an impending disaster. Sure, there might be a few obscure messages in the server's Application log, but Exchange isn't going to send you an e-mail message telling you that the server is going to crash tomorrow unless you take some sort of action. This means that if you want to know if the server is about to have problems, you will have to keep an eye on the server yourself.
Personally, I don't have the time to be constantly checking up on my servers, and I'm betting that you don't have that sort of free time either. Therefore, my purpose in writing this article is to explain how you can automate the server monitoring process.
Before I begin
Before I get started, I want to point out that the techniques that I am about to show you are not the best method of monitoring your Exchange Server. The best way of monitoring your Exchange Server is to use Microsoft Operations Manager 2005 (MOM 2005). MOM 2005 monitors thousands of key system points related to both Windows and to Exchange Server. If MOM detects a problem, it can sometimes take corrective action on its own. If MOM is unable to take corrective action, then it is designed to contact an Administrator and provide them with information about the problem. The whole idea is that MOM 2005 allows you to take care of small problems before they have the chance to turn into big problems.
The down side to MOM 2005 is that it tends to get expensive. A MOM 2005 license costs $729 per processor. For optimal performance in an Exchange Server environment, you will also need a dedicated MOM database server. This server will need its own MOM licenses and a SQL Server 2000 license. These licenses are in addition to the MOM licenses required for the Exchange Servers that you are monitoring.
This article is going to detail how to monitor Exchange on a budget, taking advantage to the built-in monitoring tools Microsoft has already included in Exchange.
Exchange's built-in monitoring
A lot of people don't realize that Exchange has monitoring capabilities that are built in. You can monitor the Exchange Server services and the system's resources directly through the Exchange System Manager. Whenever a monitored resource hits a critical state, an event is written to the Event log.
To access Exchange level monitoring, open the Exchange System Manager and navigate through the console tree to your organization | Administrative Groups | your administrative group | Servers | the server that you want to monitor (you will have to configure monitoring separately for each server). Now right click on the server and select the Properties command from the resulting shortcut menu. When you do, System Manager will display the server's properties sheet.
As you might expect, server monitoring is done through the Monitoring tab. By default, Exchange is set to monitor the Default Microsoft Exchange Services. Any time that any of these services stop, the monitor generates a critical state error and writes it to the event log.
If you click the Add button you will see that Exchange allows you to monitor many other things as well, as shown in Figure A. Some of your available options include the available virtual memory, CPU utilization, free disk space, and SMTP growth. All of these monitoring options are fairly easy to configure. For example, select the CPU Utilization and click OK. You will now see the CPU Utilization Thresholds dialog box, as shown in Figure B.
|Exchange Server has a number of built-in monitoring options.|
|The CPU Utilization Threshold dialog box allows you to specify what you consider to be an acceptable level of CPU utilization.|
The first option that you will have to set is the duration. The reason why setting the duration is so important is because it's perfectly normal for the CPU usage to occasionally spike to 100%. Since the CPU spiking to 100% doesn't indicate a problem, you don't want for Exchange to treat the event as a problem. On the other hand, if the CPU usage shot up to 100% and stayed there for the next three hours, that's a huge problem.
As you can see, the amount of time that the CPU stays at or above your threshold value is as important as the value itself. This is where the duration option comes into play. The duration option allows you to tell Exchange how many minutes the CPU must be at or above the levels that you specify before an event is triggered.
After you fill in the duration, you must fill in a warning threshold and a critical threshold value. These values are entered as percentages. A CPU is usually considered to be running well if it averages below 80% utilization. Therefore, I would recommend setting the warning threshold somewhere around 85% and the critical threshold somewhere around 90%.
Click OK and Exchange will begin to monitor CPU utilization. You aren't done yet though because you must still configure the action that occurs when a warning or a critical state is reached. Although you must configure monitoring individually for each server, configuring Exchange's reaction to a warning or to a critical state is a global action, performed at the organizational level.
To configure Exchange Server's response, navigate through the Exchange System Manager to your organization | Tools | Monitoring and Status | Notifications. Right click on the Notifications container and select the New | E-Mail Notification commands from the resulting shortcut menu. When you do, you will see a properties sheet that you can use to configure e-mail notifications.
The trick to configuring notifications successfully is to understand that although notifications are global in nature, you might have to create multiple notifications. For example, there is no way that one notification can handle warnings and critical states both. If you want to be notified in both warning and critical states, you will have to create at least two notifications.
Filling in the properties sheet is just a matter of specifying a few key pieces of information. You will want to begin by entering the name of the server that is acting as a monitoring server. Next, specify which server or connector is being monitored. Your choices will be this server, all servers, all servers in the routing group, all connectors, and all connectors in the routing group. You also have the option of entering a custom list of servers or connectors.
At this point, you must select the state that the notification will respond to (critical or warning). Finally, enter the e-mail address that you want the notifications to be sent to and the name of the mail server that you want to use to send the notifications out. One important thing to know is that the server you select must permit anonymous mail relay or else the notifications will never be sent.
As for the alert itself, the subject and body of the e-mail message is automatically filled in for you, as shown in Figure C. However, you can customize the message as necessary.
|This is the configuration screen for sending out notifications|
Now that I have shown you how to monitor CPU utilization and how to generate a notification when your specified threshold value is crossed, you might want to go back and set up a few more notifications. I won't bore you with the specific details of how to configure additional alerts because doing so is very similar to configuring CPU monitoring. Instead, I would rather use the remainder of this article to demonstrate an additional monitoring technique.
Using Windows' Performance Monitor
The monitoring technique that I just showed you is nice because it is integrated into Exchange, it is easy to configure, and it produces clear, easy to understand, notifications. The down side to this type of monitoring though is that you are limited to monitoring the basics. For example, the technique that I just showed you has no way of monitoring your MTA queues to see if they are keeping up with the demand being placed on them. Fortunately, there is another method that you can use to monitor Exchange's performance, but this method tends to be a little more difficult to configure.
This particular trick relies on the Windows Performance Monitor. When you install Microsoft Exchange Server, the installation process adds 21 Exchange related performance objects, each with its own set of counters. I'll admit that most of these counters are fairly obscure and have little practical use for day to day monitoring, but there are a few useful ones.
For example, the Work Queue Length counter under the MSExchangeMTA performance object will tell you how many messages have yet to be processed by the MTA. This is a great counter to monitor, because if the BTA starts backing up it means that either the MTA is malfunctioning and messages are not being delivered or that someone is performing a denial of service attack against you, or that someone is using your server for SPAM. In any case, it's a problem that needs your attention.
OK, so we all know that the Performance Monitor will display the current state of any counters that you specify, but how do you get it to notify you when a condition exists that needs your attention? Notifications can be accomplished, but you will have to jump through a few hoops to get there. Let's begin by opening the Performance Monitor and removing all existing counters and then adding in the Work Queue Length counter that I just discussed.
After you have added the Work Queue Length counter to the Performance Monitor, expand the Performance Logs and Alerts container found on the left side of the screen. Next, right click on the Alerts container and select the New Alert Settings command from the resulting shortcut menu. When you do, you will be prompted to enter a name for the alert that you are creating. For this example, we will call the new alert MTA.
Next, click the Add button and add in the Work Queue Length counter. You will now have to specify an alert value. This value will really vary depending on what's normal for your own organization. I have seen some organizations in which it is normal to have 50 items in queue at a time. In my own organization though, having ten items in queue would probably indicate a major problem. Just pick a value that's appropriate for your organization and enter it into the Limit field. Make sure that the Alert When Value Is field is set to Over.
The next section of the dialog box, shown in Figure D, is the Sample Data section. By default, Windows is configured to sample data every five seconds. For what we are doing, I think that five seconds is way too often to sample data. Think about it for a moment. If the MTA queues were to back up, an alert would be issued every five seconds until you fix the problem. That could potentially flood you with e-mail and also has the potential to make the MTA problem worse.
|You can create alerts based on performance monitor counters.|
After you have filled in the dialog box's General tab, it's time to fill in the Action tab. The Action tab tells the server what course of action to take if an alert is issued. By default, the only course of action that is taken is to log an entry to the application event log. There is an option to send a network message, but I prefer not to use this option because doing so means that you are depending on the recipient running the Messenger service. Many people disable the Messenger service for security reasons, so there is a good chance that the recipient may never get the message. Instead, I prefer to use the Run This Program option.
The nice thing about the Run This Program option is that it allows you to create a script that not only notifies you of the problem, but that also takes corrective action. For example, if I were going to write a script that dealt with the MTA queues backing up, I would probably begin by using the net stop and net start commands to stop and restart the Microsoft Exchange MTA Stacks service. After that I would want to be notified by e-mail that the server was having problems.
There are ways to access Exchange Server from a command line and send an e-mail from a script. Keep in mind though that if your MTA queues are jammed up or if the server is having other types of problems, the message might never be delivered. For this reason, I prefer to use a third party utility for scripting an e-mail message. Such a utility will work whether Exchange is functioning or not. One popular utility for doing so is Febooti Command Line E-Mail. A typical Febooti command looks something like this:
Don't wait for problems to appear
Exchange Server has never been easy to manage. However, by taking advantage of the built-in monitoring tools and some elementary scripting, you can be alerted of minor problems before they become major issues.