Given enough time, all hard disks will fail. Thankfully, most drives support a feature that could give users an early warning before the bitter end. Here’s how support pros can introduce S.M.A.R.T. reporting to their clients.

—————————————————————————————————————

“What do you mean, my hard drive failed? Hard drives can do that?”

Sadly, I encounter this kind of ignorance a lot as a support pro. Lots of people simply don’t understand that their computer’s storage medium has a limited life span. Maybe extremely durable older machines have influenced their expectations, or maybe no one has ever taken the time to explain to them how the disk mechanism works. Either way, it stinks to have to disappoint users by telling them the machine they rely on has gone belly up.

These days, many consumers relate to their computers as if they were appliances, and I think this contributes to their surprise when a machine’s components begin to fail. When explaining the situation to one of my clients, I usually try to liken computers to automobiles; both are complicated mechanical and electronic systems that can have individual components wear to the point of failure through normal use. Part of the routine maintenance procedures of both cars and computers is making sure worn parts are replaced before they fail completely and catastrophically.

There are a couple of obvious methods for determining when an auto is due for maintenance, and most consumers are familiar with their car’s dash status lights and the concept of the 20,000 mile checkup. Lately, I’ve been trying to train my users how they can determine when their computer needs maintenance, and I’ve been using the dash light and mileage metaphors to explain the value of using software to check the S.M.A.R.T. data reported by their hard disks. Experienced support pros will be aware of the Self-Monitoring, Analysis, and Reporting Technology built in to modern drives, but this information will likely be new to most of your clients.

You should make sure that your users understand that S.M.A.R.T. isn’t all knowing. Because it gathers past performance data, it can anticipate failure only due to gradual mechanical wear. It’s not suited to predicting accidental damage, for instance. S.M.A.R.T. status is a lot like the Check Engine light in most cars…just because it’s not lit now doesn’t mean something can’t go wrong tomorrow. And just as an activated Check Engine light means you should be planning to visit your mechanic, a S.M.A.R.T. error means you should be making sure that your data backups are sound and that you have a new hard drive on the way.

Once the concepts of drive wear and S.M.A.R.T. are explained, most users won’t have a problem understanding what’s going on. If you decide that you want to start providing your clients the resources to monitor the health of their own disks, there are a ton of software tools you can choose from. This table at Wikipedia offers a nice feature comparison to get you started. From that matrix, I have firsthand experience with HDD Health, SMARTReporter, and the smartmontools. All are quality third-party packages that I have no problem recommending. Your OS of choice may have some built-in utilities for monitoring disk health as well.