Microsoft Operations Manager (MOM) collects a lot of data about your network. Agents, consolidators, and the Database Access Servers (DAS) all work together to funnel data into MOM’s central database. Although MOM’s main purpose is to give you the means to monitor systems and respond to events, much of the processing that takes place is automatic, based on your settings. Rules, alerts, responses, and reports in MOM all work together to keep your network running smoothly.
Although MOM’s processes are mostly automatic, you should be aware of how MOM works. You need to know how providers, events, alerts and rules all work together so you can tailor MOM to your needs. In this Daily Drill Down, I’ll explain MOM’s filters, rules, and other elements in detail. I’ll also show you some of the monitoring and reporting tools available in MOM.
Events collection and processing
MOM is really all about collecting and responding to network events. Providers collect information from a variety of sources. Here are some common providers MOM uses:
- Application log providers collect data from application log files and messages such as those from IIS log files, the Internet Locator Service, SQL Server trace log files, UNIX syslog files, and generic ASCII log files.
- Time event providers generate events at specified times. For example, you might configure MOM to generate an event at 8:00 A.M. to page an administrator as a reminder to verify that an overnight backup has completed properly.
- Windows Management Interface event providers collect information such as service status (started, stopped, etc.), SNMP traps, and other data. WMI numeric data providers collect performance counter data.
- Generic providers collect data from MOM itself, such as agent status and script-generated events.
The events can come from many sources. For example, agents can collect events from Windows 2000 and .NET event logs. This can help you keep track of Active Directory errors and changes, events in services such as DNS and RRAS, and other events that these platforms generate and store in their event logs.
In addition to tracking events that do happen, MOM uses timed events to track events that don’t happen. A processing rule can monitor for a specific event at a given time or within a specified period. If the event doesn’t occur, it’s a missing event, and the rule can generate a response as needed. For example, if a scheduled backup doesn’t start, the missing event can ultimately notify an administrator. Processing rules let you specify how MOM should collect data and what it should do with that data once collected. There are several types of event processing rules:
- Collection rules specify the events to be collected and their source, but don’t provide an alerting or response capability.
- Consolidation rules let you group similar events on a target computer into a single event.
- Missing event rules let you configure MOM to generate an event or some other response when an expected event fails to occur. Filtering rules let you weed out events that aren’t useful or significant.
- Event rules specify the processing that should occur for specific events not covered by other rules.
- Alert processing rules define how MOM responds to individual alerts or groups of alerts.
- Performance processing rules determine how MOM processes performance data.
- Measuring rules define numeric data to be collected from specific targets.
- Threshold rules define data to be collected when performance counters pass a specified threshold value.
As an example of how MOM’s rules can help you keep track of things, consider a worrisome backup. Is it completing every night or not? You can configure a missing event rule to generate an alert if the backup doesn’t started when scheduled. At that point, the matching alert processing rule pages an administrator to go kick the backup system.
MOM alerts you to events
MOM’s alerts tell you what you need to know about the events that occur across the enterprise every day—sometimes every minute.
Rules can generate alerts for specific events. Each alert has a severity attribute to help you categorize the alert and respond accordingly. Some of the possible severity values include service unavailable, security breach, critical error, error, warning, information, and success. MOM identifies the severity level of an alert using a specific icon in the console, making it easy, for example, to quickly spot security breaches or critical errors.
Alerts include quite a bit of information in addition to the severity attribute, such as the computer that generated the alert, Knowledge Base information, custom fields that you define, and information about the alert’s resolution.
MOM makes it easy to track your response to the alerts it generates. You use the resolution state to track the status of your response to the alert. MOM defines several default resolution states, all self-explanatory:
- New
- Acknowledged
- Level 1: Assigned to helpdesk or local support
- Level 2: Assigned to subject matter expert
- Level 3: Requires scheduled maintenance
- Level 4: Assigned to external group or vendor
- Resolved
You can also create your own custom resolution states according to your own practices or policies, or even modify the default resolution states, except for New and Resolved.
Each resolution state has a service level agreement time. The service level agreement time specifies the maximum amount of time the alert can remain in a given state before some kind of action is needed. For example, you might configure the New state to require an acknowledgement within 15 minutes of the alert, or the Acknowledged state to a maximum of 10 minutes. This helps ensure that all alerts are assigned some type of action within that amount of time.
If an alert remains in a given state past its service level agreement time, a service level exception occurs. You can use the MOM consoles to monitor alerts as well as service level exceptions to help you keep on top of ongoing issues. You can change the alert resolution state through the console. Scripts can also change the alert state.
Alerts also have a resolution history that helps you track the progress of each issue. History data is added automatically by rules, scripts, and other automated responses. You can’t change any of the history data that’s added automatically, but you can add your own historical data. This is an important tool for entering status notes, resolution steps, and other information to help your administrative team track and resolve problems.
Processing rules can have associated Knowledge Base entries. This data, which comes from the Microsoft Knowledge Base, provides targeted information about the causes of, and potential solutions to, specific events. You can view the Knowledge Base data when you access alerts through the management consoles.
In addition to the canned Knowledge Base information, you can create your own entries. For example, assume you’ve learned the most efficient way to resolve a recurring problem and which administrators have the skills to address it. Adding that information to the company knowledge base can save time when the problem returns.
Storm warning
Depending on the event that generates an alert and the amount of time you’ve specified for the service level agreement on an alert state, it’s possible for a problem to generate multiple, identical alerts, creating an event storm. This can flood the database and overwhelm administrators with unnecessary alerts.
To prevent event storms, MOM allows you to specify properties for alert suppression when you create a processing rule for an event. You specify the fields, events, or threshold values that need to match for MOM to consider a particular event a duplicate. When alerts match the suppression criteria, the consolidator combines the duplicate alerts into a single alert. The consolidator also keeps track of the number of alerts combined. It only combines alerts from a single computer, so if the same stream of events occurs on two computers, the result is two separate accumulated events—one for each computer. Consolidation lets you view and work with a single alert in the management consoles.
MOM also allows alerts to be forwarded from one configuration group to another. This helps you distribute management information. For example, a local configuration group would send alerts to local administrators while also forwarding those alerts to a central administrative team keeping tabs on the entire enterprise.
The local configuration groups are called zone configuration groups. Alerts are forwarded to a central group called the master configuration group. The zone groups don’t forward everything to the master group but instead forward only alerts and the events associated with them.
Forwarded alerts are not tied to the original alerts in terms of management. The consolidators in the zone configuration group forward the alerts as defined by their processing rules and then apply the duplicate alert suppression. This ensures that all alerts that fit the criteria for forwarding are actually forwarded. The alert suppression rules in the master configuration group then combine the forwarded alerts on that end.
The independence between groups means that changes you make to an alert in the zone group (such as resolution state or assignment) don’t apply in the master group. Likewise, changes you make to the alert in the master group don’t apply to the same alert in the zone group. This independence offers flexibility for processing and management and allows for different responses to the alert at different levels. For example, you might have an alert run a script at the zone group to start the resolution process and have it generate an e-mail notification at the master group.
A measured response
Although many events don’t require any type of response in the average enterprise, many do require some action. In MOM’s terms, a response is an action initiated by a processing rule match. MOM has several options. For example, you might need to have MOM send a notification when a particular processing rule match occurs. MOM supports SMTP and Microsoft Exchange for e-mail notification, and you can specify the information you want included in the message, such as alert severity, computer name, resolution state, and so on. Paging can also be part of MOM’s response, but you must use a paging service that supports SMTP messages. MOM simply sends a message to the SMTP paging address. Duplicate alert suppression helps keep MOM from flooding the mail server or pager when an event storm occurs.
In cases where SMTP won’t do the trick, you can use third-party software like Blat or WinMail to generate e-mail or pager notifications. In this scenario, MOM issues the command line necessary to send the message through the third-party program, so the software you choose for messaging needs to support command-line operations.
MOM uses notification groups to simplify notification handling. A notification group defines operators and their work schedules. When you add an operator to a group, you can specify notification for e-mail, paging, and external command notification. For each of these three you specify the days and hours the operator is available to receive each type of notification. MOM sends the notification to operators based on their scheduled times (which means MOM won’t get you out of bed to kick a stalled backup server). You can create as many groups as you need, and administrators can be members of multiple groups.
Notification groups also let you send notifications to specific operators to address specific problems. For example, you might use a security notification group to send notifications to security experts, a backup notification group to address backup problems, a SQL group to forward notifications to SQL gurus, and so on. Setup creates several default notification groups to fit specific services, such as Web Administrator, Database Administrator, Security Specialist, and so on.
You might need to run a script or batch file in response to an event. These can run on either the agent or the consolidator. MOM provides its own interface for creating scripts, or you can use any of several standard scripting languages, including VBScript, JavaScript, PERL, and Active REXX. Management pack modules generally contain scripts for specific applications or events. MOM stores scripts in the database and distributes them through consolidators.
Another response you might want to take is to set a state variable. For example, you might have a processing rule increment a variable each time a logon attempt fails. Then, create a scheduled event that runs a script to check the state of the variable. If the variable is higher than a specified level, the script can generate a notification to the security team.
Finally, MOM can generate SMTP traps in response to events. This is useful for integrating third-party monitoring applications within your MOM infrastructure. Both agents and consolidators can generate traps. Generating the traps at the agent helps you more easily track the source, while generating the event at the consolidator can reduce the number of traps. Whichever option or combination you take, SNMP service must be installed on the computer generating the traps.
Making use of the alerts and events
Collecting all of this information wouldn’t help much if you had no way to view or analyze it. Fortunately, MOM does as good a job of reporting information as it does collecting it.
MOM includes a run-time version of Microsoft Access along with over 100 preconfigured reports for viewing and analyzing the collected data for specific applications and purposes. You can view, save, and print the reports with the run-time version. You’ll need the full version of Microsoft Access to customize the existing reports and create new ones. MOM gives you a handful of options for generating and publishing reports. MOM makes reports available through both the Administrator and Web consoles. It can also publish reports in HTML for viewing with a Web browser.
Although you can install the reporting components on the server(s) where MOM is installed, it’s a good idea to install the reporting components on a separate computer for performance reasons. This way, MOM’s overhead won’t affect report generation, and vice versa.
Who takes better care of you than MOM?