Why Uptime isn't always accurate

Microsoft provides Uptime to help you track your server's reliability. But have you noticed that Uptime sometimes produces the wrong answers or displays errors? In this Daily Drill Down, John Sheesley shows you why Uptime may not always be accurate.

Microsoft's Uptime utility was created to help you track the reliability of your Windows NT and Windows 2000 servers, but you may notice that it sometimes doesn’t produce accurate results. Don’t let these mistakes cloud your judgment of the utility. In this Daily Drill Down, I’ll explain why Uptime isn’t always accurate and tell you what to do about it.

What's Uptime?
In case you’ve never heard of Uptime, it’s a utility that Microsoft introduced with Service Pack 4 for Windows NT. It checks the server’s event logs and determines the amount of time the server has been available. For more information about Uptime, see the Daily Feature “Measure your server's reliability with Uptime.”

Uptime's accuracy issues
You should only use Uptime to estimate your server’s reliability. Don’t rely on it to provide precise information about your server’s reliability. There are several reasons why Uptime may not be accurate:
  • There are problems with event logs.
  • You tried to run it without proper rights.
  • You’ve disabled the system heartbeat.
  • You’ve disabled Dr. Watson.
  • You’ve disabled Blue Screen event logging.
  • Your server may be a member of a cluster.
  • Your server may not be running Windows NT Service Pack 4 or later.
  • Your network may be suffering failures.
  • The time may not be accurate on your server.

Problems with event logs
Uptime bases its calculations on the event logs on your server. If, for some reason, the event logs have become corrupted, Uptime won’t be able to calculate the total amount of time your server has been running. Another common issue that will prevent event logs from containing the information needed by Uptime is the Event Viewer’s Log Size feature.

To prevent the logs from eventually consuming a server’s entire hard drive, Microsoft allows you to limit the size of the server’s log files. You can limit log files based on log file size or the amount of time expired. If a log grows beyond a certain size or contains information beyond a certain age, Windows deletes the oldest information first.

When Uptime checks the logs, it naturally finds only the information still contained in the logs. If Windows has expired and deleted information from a log, Uptime can’t use that information to calculate times. So Uptime only reports accurate times for the time information it can find in the logs.

However, you can modify how Windows deals with log file size in Event Viewer. Click Start | Programs | Administrative Tools | Event Viewer. When Event Viewer starts, right-click the System Log and select Properties. You’ll then see the System Log Properties window appear. In the Log Size box, you’ll see several options that control how Windows deals with log files.

To make sure Uptime finds the most accurate information, select Do Not Overwrite Events (Clear Log Manually). This prevents Windows from ever clearing the event logs. Click OK to save your changes and close the System Log Properties screen. Bring up the Properties pages for the other available logs in Event Viewer and make the same changes in the Log Size box.

From then on, when you run Uptime, it will have all the information it needs to run. However, to limit the size of the logs, you'll need to manually delete the logs after running Uptime. To do so, bring up each of the Properties pages for the logs, as mentioned above, and click Clear Log.

Know your rights
Uptime works best when you log in as Administrator or as a user with Administrator rights. If your user ID doesn’t have Administrator rights, Uptime can’t access the logs it needs to base its calculations on. The easiest way to solve the problem is to log off and log back on as a user with proper rights.

Make sure you have a heartbeat
When Microsoft added the Uptime utility in Service Pack 4 for Windows NT, it also added the system heartbeat. The heartbeat is a timestamp that Windows places in the system registry about every five minutes. Uptime uses this timestamp to help calculate system availability.

The heartbeat increases Uptime’s accuracy because of the frequency of writes to the registry. Should an application crash or a Blue Screen of Death cause your server to crash, Uptime can go back to the last system heartbeat to get an accurate time reading. Otherwise, if logs become corrupt or events get deleted due to log-size restrictions, Uptime doesn’t have anything else to use and will generate inaccurate results. A symptom of a missing heartbeat is an error message indicating that event logs do not contain sufficient information to calculate system availability.

The system heartbeat is normally enabled by default on Windows NT Service Pack 4 servers and Windows 2000 servers. You can check the heartbeat on Windows NT servers that run Service Pack 4 or later and on Windows 2000 machines by typing uptime /heartbeat and pressing [Enter] at the command line on your server. Uptime will display a message reminding you what the heartbeat does, along with a warning that because the heartbeat causes frequent writes to the system’s hard drive, it shouldn’t be used on systems that employ power management.

Uptime will also display a message that tells you the heartbeat’s current status, either enabled or disabled. Read carefully to see what the current status is. Following the status statement, Uptime will ask if you want to change the heartbeat’s status. So, if the heartbeat is enabled, Uptime will ask if you want to disable it. Make sure you have the heartbeat enabled. If you change the heartbeat’s status, the change won’t take effect until you reboot your server.

Calling Dr. Watson
Uptime can help you track the number of application crashes that have occurred on your server. You can display application crashes by typing uptime /a and pressing [Enter]. However, for Uptime to properly report application failures, you must have Dr. Watson enabled. Dr. Watson is enabled by default, but it’s possible that a previous administrator has turned it off.

To see if Dr. Watson is enabled, you’ll have to take a trip to the registry, but do so only if you suspect application crashes and Uptime reports none. Click Start | Run and enter regedt32 in the Run dialog box. Click OK to start Regedit. When the Registry Editor appears, navigate the left pane to the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ AeDebug key. When you find the key, check the Auto value and Debugger value. The Auto value should be set to 1, and the Debugger value should be set to drwtsn32 -p %ld -e %ld –g. If you don't find these keys, add them. If you've installed a different debugger, you'll need to change the values back. Just be very careful entering the values so you don't make a mistake. Be careful when editing the system registry. If you make a mistake, you may render your server unbootable.

Logging the Blue Screen of Death
Using the uptime /s command, you can view the total number of Blue Screens of Death your server has encountered. However, for Uptime to report about Blue Screens, you must have Blue Screen logging enabled on your server. By default, Blue Screen logging is enabled on both Windows 2000 and Windows NT servers.

If Uptime reports no Blue Screens, but you suspect your server is having them, you should make sure that Blue Screen logging is still configured on your server. On a Windows NT server, right-click My Computer and select Properties. When the System Properties window appears, click the Startup/Shutdown tab. Make sure the Write An Event To The System Log check box is selected.

On a Windows 2000 server, right-click My Computer and select Properties. When the System Properties window appears, click the Advanced tab. When the Advanced screen appears, click Startup And Recovery. Make sure the Write An Event To The System Log check box is selected.

Watch out for clusters
If your server is a member of a cluster, Uptime will display an error. The current version of Uptime doesn’t support clustered systems. Until Microsoft produces a version of Uptime that supports clustered servers, you’re out of luck.

Check your OS version
Microsoft designed Uptime to work with Windows NT 4.0 Service Pack 4 or later. If you’re still running Windows NT 3.x, don’t expect Uptime to produce the proper results. Likewise, if some of your Windows NT 4.0 servers are still only running Service Pack 3 or earlier, you can’t rely on the results. This is especially true when Uptime checks the server’s heartbeat, because the heartbeat wasn’t included on early versions of NT and pre-SP4 NT 4.0 Service Packs. If you’re running an unsupported version of Windows NT, then you have three choices: Don’t get upset if Uptime is wrong, don’t use Uptime at all, or upgrade your server.

Network failures
Uptime can only report the availability of your server’s operating system. You can’t use it to report the availability of applications, such as databases, that may be running on your server. Likewise, Uptime can’t report network failures, such as cable breaks or hub outages.

So if your users complain that the server is frequently unavailable, but Uptime says that it’s up 99.9 percent of the time, the fault may not lie with Uptime or your server. You may be experiencing network issues.

Time synchronization issues
By itself, Uptime doesn’t do anything but check your server’s event logs. The event logs are time-stamped from your server’s clock. If your server’s system clock drifts or changes time for any reason, events written to the event logs will be mistimed.

When Uptime checks the logs, it trusts that the information in the logs is accurate and does the math accordingly to produce its results. If your server’s clock is sufficiently off, Uptime’s results will seem off. For example, on one of our test machines, Uptime produced a result of Mean Time Between Reboots of -1,543.20 days. In essence, Uptime was saying that on the average, when the server was taken down, it would restart a little over four years before it finished shutting down. Clearly, there’s a time-clock problem on the server. Either that or Dr. Who borrowed the server.

While your server may not display that much of a time difference, the time may be off enough on your server to effectively make Uptime useless. To avoid this problem, you should ensure that your server has the proper time. For more information about resolving time issues on your server, see the Daily Drill Downs “Keeping time on your NT network” and “Keeping time on your Windows 2000 server.”

Caveat administrator
Uptime can be a very useful utility that provides important information about your server’s reliability. However, like all software, Uptime isn’t 100 percent perfect. Knowing the causes of Uptime’s inaccuracies can help you judge whether or not to trust its results.

Editor's Picks