If your data center makes use of Linux machines, one of the administrative tasks you’ll want to undertake is regularly checking the health of the SSD drives used on those machines. Why? Because, even though solid state drives will dramatically outlast rotating platter drives, they do have a finite lifespan. The last thing you want to do is fall victim to that particular end of days. How do you check the health of those drives? As with everything in Linux, there are options. Although a GUI solution exists (GNOME Disks), I highly recommend going with a command line tool for this task. Why? Most of the time, your Linux servers won’t include a GUI; with the command line, you can easily make use it by secure shelling into your remote Linux server and run your tests from the terminal.
The tool in question is smartctl. With this command, you can get a quick glimpse of your SSD health. Of course, how much mileage you get from the command will depend upon what make/model of SSD you employ. Unfortunately, the S.M.A.R.T. tools aren’t always up to date with every SSD drive. Because of this, you cannot be certain the number of times your SSD chips have been written to. Even with that in mind, you can get a good estimation as to the wear and tear on your drives.
Let’s install and use smartctl.
I will be demonstrating with the Ubuntu platform (Ubuntu 17.10 to be exact). The required package is found on all the standard repositories, so adjust the installation command to fit your particular distribution of choice.
The smartctl utility is a part of the smartmontools package. This can be installed with a single command:
sudo apt install smartmontools
Do note, the above command will also install libgsasl7, libkyotocabinet16v5, libmailutils5, libntlm0, mailutils, mailutils-common, and postfix.
Once the package is installed, you’re ready to go.
SEE: Securing Linux policy (Tech Pro Research)
To use the smartctl tool, the first thing you will want to do is gather information about the drive, which is done via the command:
sudo smartctl -i /dev/sdX
Where sdX is the name of the drive to be tested.
The above command will print out the details associated with your drive (Figure A).
As you can see, the drive in question is in the smartctl database, so information should be up to date.
Let’s run a short test on the drive. These tests will actually wind up giving you the most accurate data on your drive (so it’s important to make use of these included tools). Issue the command:
sudo smartctl -t short -a /dev/sdX
This will immediately report some bits of information (Figure B).
I recommend you run a short and a long test weekly or (monthly) on your drives. To run a long test, the command is:
sudo smartctl -t long -a /dev/sdX
One of the first things you should see is the results of the SMART overall-health self-assessment test. That should say PASSED. If not, you know, right away, there’s something wrong with your SSD.
The short test will examine the following:
- Electrical Properties: The controller tests its own electronics, which is different for each manufacturer.
- Mechanical Properties: Servos and positioning mechanisms are tested (also specific to each manufacturer).
- Read/Verify: A certain area of the disk will be read to verify certain data (the size and position of the region read is unique to each manufacturer).
The long test runs everything included with the short test, while adding:
- No time restriction and in the Read/Verify segment.
- The entire disk is checked (as opposed to just a section).
The short test takes approximately two minutes to complete, whereas the long test will require between 20-60 minutes (depending upon your hardware). To view the results of the test, issue the command sudo smartctl -a /dev/sdX (Where sdX is the name of the drive tested).
The command will print out the results of the test, as well as all of the information you need to verify the health of your SSD (Figure C).
Beyond the Self-test log, there are two values in the output to be examined:
- Power_On_Hours — how many hours the drive has been powered on. Each make/model of drive has a recommended “shelf life” of hours it can be used. Most modern SSDs have fairly incredible lifespans, so chances are you’re not going to bump into the end of life. If you’re using an older drive, this can be an issue.
- Wear_Leveling_Count — Stands for the remaining endurance of the drive in percentage (starting from 100 and decreasing linearly as the drive is written to).
It is important to look at the value and worst value columns. As you can see, my Samsung SSD is currently at a 99 for Wear_Leveling_Count, which is a very healthy drive.
One thing to keep in mind is that different manufacturers will report different data with smartclt. For example, I have a older Intel and Kingston SSD drives attached to the same machine. Both of these drives report similar (and more comprehensive) data. However, neither report the Wear_Leveling_Count. Why? These are both older drives and do not report ID 177 (Wear_Leveling_Count). Instead, your best bet is to run both the short and long tests and verify the health of your drives via those reports.
SEE: Why Munich made the switch from Windows to Linux–and may be reversing course (TechRepublic)
The obvious caveat
There are actually two caveats with smartctl. First off, it’s easy to misinterpret the reported data. Because of this, it’s important that you know the make and model of the drive you are testing. Once you have that information in hand, you can research any anomalies with reported data. Second, it is crucial to make use of the testing tools. Although you can run a command like smartctl -A /dev/sdX, you don’t get the added benefit of the testing results. Make sure to regularly run the short and long tests, to get the most up to date information on your SSD drives as you can.