Despite their reputation as being delicate, modern hard drives are incredibly reliable devices. Nevertheless, their reliance on moving components makes them vulnerable to mechanical failure and therefore more likely to fail than a solid-state device. Since hard drives often contain critical data that has yet to be backed up, it would be very useful to be able to predict hard drive failures before they occur. If you received some kind of alert that a hard drive was about to fail, you might be able to prevent data loss and minimize downtime. SMART (self-monitoring analysis and reporting technology) is a feature of most modern hard drives that attempts to do just that.
SMART operation and use
SMART is a nonproprietary standard that allows for predictive failure analysis of hard disk drives. SMART attempts to detect problems that worsen over time and that might eventually lead to complete drive failure. It is implemented with a SMART-compliant hard drive and either firmware included with your system’s BIOS or software provided by a third-party utility. The hard drive monitors certain mechanical and electronic characteristics (attributes) and compares their values to the drive manufacturer’s base (threshold) values to determine if a hard drive is beginning to fail or is currently failing. Then, the drive makes its status and attribute information available to the BIOS or software that polls it for this information. The software will poll all drives in the system at a set time interval. If a problem is detected—that is, if an attribute value approaches or falls below its threshold value—a SMART error occurs and an alert is added to the status information. When the BIOS or software polls the drive and detects the SMART error, it will notify the user that a problem has been detected. The user or system administrator can then take precautions to prevent data loss.
The drive manufacturer determines which attributes will be monitored on a particular drive model based on that model’s characteristics and previous reported failures of the technology used in the drive. While there are well-known attributes that tend to be commonly monitored on different manufacturer’s drives (including head flying height, error rates, spin up time, retries, throughput, and temperature), there are also attributes that are considered proprietary because they represent specific technology used in the drive. The threshold values are determined by empirical data obtained through analysis of the previously reported failures. These threshold values are placed in the drive firmware and cannot be changed by a user.
If a failure is reported by SMART, it usually means that drive failure is imminent. However, this is not always the case. Other problems related to the hard drive controller, IDE cable, or environmental conditions may be at fault. Nevertheless, a backup of critical files should be performed immediately after an error. You should try not to power down the system before performing the backup in case the hard drive fails totally at bootup. After a backup, you can proceed to analyze the problem in further detail. You may have to contact the drive manufacturer if a proprietary attribute has caused the alert. Unless you must return the drive to the manufacturer, you should continue to use it until it fails.
Enabling SMART through the system BIOS
It is not necessary for your system to have SMART support in its BIOS for it to use SMART. However, if SMART is a supported feature of your BIOS, it can be a simple way to monitor hard drives without adding any extra software. The main limitation with enabling SMART through the BIOS is that this method only supports IDE drives. If you need SCSI support, you will have to use a software utility. Also, the BIOS method will provide only limited details relating to the problem and is not configurable like a software solution.
If your system BIOS supports SMART, you may see a message indicating HDD SMART Capability during system startup. If your system does not display a message, you can look for a SMART setting in your BIOS Setup. You can try looking for the setting in the BIOS Features Setup, Advanced Chipset Setup, Advanced CMOS Setup, or any other area that deals with IDE channels or peripherals. Once you find it, you should set it to Enabled to use the feature. If you are going to be using a SMART utility instead, you should make sure that this feature is disabled before you attempt to use the software.
Windows SMART programs
There are many programs available that will allow you to take advantage of the SMART feature on your drives. These range from relatively simple diagnostic utilities to very elaborate enterprise solutions (such as IBM’s SMART Reaction, which is part of its Universal Manageability Services package). Programs created explicitly to monitor SMART-based drives often have several features in common. These include the ability to vary the polling rate of the software, notify an administrator via e-mail or through a network when a SMART alert is issued, and launch a program or script when an alert is detected. The latter could be used to automatically back up drive files or initiate a series of preventative steps.
Some relatively inexpensive Windows programs available for SMART monitoring are EZ-S.M.A.R.T from StorageSoft, SMART-ER from Apricorn, and SmartMonitor-Plus from SystemSoft. There are also hard drive diagnostic packages like Ontrack’s Data Advisor and various utilities provided by hard drive manufacturers. The diagnostic packages are not intended to constantly monitor your drives but are instead used to derive information when you are troubleshooting hard drive problems. Before you purchase a SMART program or diagnostic package, you should first check with the system’s drive manufacturer, as some versions of programs can be downloaded for free or are included with hardware. IBM, for example, makes available versions of EZ-S.M.A.R.T. (shown in Figure A) and Data Advisor for download to its clients.
|This is IBM's version of the EZ-S.M.A.R.T. program.|
EZ-S.M.A.R.T. can be used with both IDE and SCSI disk drives, supports notification by e-mail, and is very easy to use. SmartMonitor-Plus can only support IDE drives but includes built-in file backup to another local or network drive as well as e-mail notification. SMART-ER provides an offline diagnostic test as well as standard SMART monitoring and is capable of logging all errors. While these SMART monitoring programs should be more than sufficient for most administrators or users, there are more sophisticated programs that can monitor drives in an enterprise environment. SANtools' SMART Disk Monitor (SMARTmon) is one such program.
Unlike the other programs mentioned, SMARTmon is designed solely for technology professionals. As such, it provides many more capabilities and configuration options. It is especially useful for monitoring SCSI devices. SMARTmon can monitor serial SCSI devices such as Fibre Channel and SSA drives, parallel SCSI, and IDE/ATAPI. (Fibre Channel is a high-speed data transfer technology used mainly for connecting computers to peripherals. SSA [Serial Storage Architecture] as the name implies, is a serial transfer technology developed by IBM for connectivity to disk drives.)
SMARTmon’s main window, shown in Figure B, is divided into three main sections. The main area is a device and controllers box, where all of the monitored drives and their controllers are listed. On the right is the Status Polling section, where you can change the polling interval of the program and set up the e-mail notification feature. At the bottom is an event window, where polling results are displayed. If you select a drive in the device and controllers box and then press the Show Disk Or Adapter Details And Perform Advanced Functions button, SMARTmon can provide extensive details about the selected drive in a Device Info box. The details vary depending on whether the selected device is an IDE/ATAPI or SCSI device.
|SANtools' SMARTmon is shown monitoring two IDE Disks.|
Figure C shows the IDE Drive Information Box for an IBM IDE hard drive. Along with the technical drive information are the SMART Threshold And Status Attribute Values. Displayed in this section are the monitored attributes along with their names (if known), number, current, worst, and threshold values. All the data is color coded to make it easy to distinguish between good, degrading, or failing attributes. The Drive Information Box for a SCSI device contains information about the device and the features it supports. It also allows you to access SCSI Mode Page Editor and Caching Parameters sections. By changing values in these two sections, you can tweak your SCSI drive for optimum performance.
|SMARTmon's IDE Drive Information Box showing values for standard and proprietary attributes|
When SMARTmon detects an error, it provides a pop-up warning to the user that summarizes the problem. SCSI Sense codes that provide further information about a SMART error for SCSI drives are included in the warning. SMARTmon allows you to configure rules for which errors will display or send alerts. This can be useful if you want certain errors to display a pop-up warning but not send out an e-mail. Overall, SMART Disk Monitor is a very comprehensive and powerful program that does much more than basic SMART alerts.
SMART on Linux
Currently there are two main SMART programs for Linux. The first of these is UCSC SMART Suite. The latest beta version (2.0 Beta 2) of this suite supports both IDE and SCSI drives. The suite consists of two programs. The first is a command-line utility named Smartctl that checks the SMART status of a hard drive and provides a very descriptive output. The other program in this suite is a daemon (service) named Smartd that will poll the desired devices every 30 minutes. If an error is detected, it will notify the user. The main caveat about using this program is that work on it seems to be at a standstill.
The other program is ide-smart 1.4 (shown in Figure D). The ide-smart program can be used to check the immediate SMART status of a drive, or it can be used with its offline test option. It can also be added as a crontab entry to check drives periodically and send mail when an error is detected.
While these programs are not nearly as feature rich as their Windows brethren, they are quite informative and useful. The only possible problem I see with using these programs is that they are not being updated often. UCSC SMART Suite seems to be at a standstill while ide-smart gets updated occasionally. Both of these programs are available through freshmeat.net.
|This shows how ide-smart displays attribute values.|
SMART can be a very valuable tool to help prevent data loss and keep systems running smoothly. Programs that monitor SMART devices are easy to install, use very little resources, and can provide very easy to understand alerts. Using more sophisticated SMART utilities such as SMARTmon grants administrators more opportunities to fine-tune their hard drive monitoring. This makes them ideal tools for alerting end users and administrators to drive problems that could lead to a drive failure nightmare.