Smartmontools: Check & Monitor Hard Disk Health on Linux

SMART (an acronym for the Self-Monitoring, Analysis and Reporting Technology) is found in modern hard drives, and it enables a drive to detect and report on various conditions that may indicate impending failure. Smartmontools is a free software package, available for multiple platforms, that can utilizes the S.M.A.R.T. attributes of a hard drive to enquire about its state. With smartmontools, a tech-savvy admin/user will be adequately warned and prepared for a hard drive failure and can make backups before the drive becomes critical.

Installation

On Debian or Ubuntu systems, smartmontools is available via the default repositories.

sudo apt-get install smartmontools

On Fedora:

sudo yum install smartmontools

Installing smartmontools package delivers two programs to your system: smartctl, which should be used interactively, and smartd, which, as the name suggests, is a daemon program designed to run in the background.

Smartctl

Smartctl requires root permissions to run and so must be run by the root user or a user with sudo privileges. Smartctl monitors an entire hard drive (not partitions), hence when run, it should be given the required hard drive as the final argument. For this article, we use “/dev/sda” as the hard drive device file. Be sure to replace that with your hard drive’s file.

To get information about a drive, use the -i option.

sudo smartctl -i /dev/sda

The above image shows some information about the drive, and we can see, from the highlighted lines, that SMART support is both available and enabled for the drive. Excellent. However, if SMART support is available but not enabled, it can be turned on with the following command:

sudo smartctl -s on /dev/sda

To check the device’s health, use the -H option:

sudo smartctl -H /dev/sda

If the output for the above isn’t PASSED, the hard drive has either failed or it is predicting its impending doom. Backup your data immediately.

To view the SMART capabilities of the drive, use the -c switch:

sudo smartctl -c /dev/sda

From the above screenshot, the drive supports self-tests, and the time for short and extended self tests are estimated at 2 minutes and 95 minutes respectively. To run the short test, use the -t short switch, and correspondingly, the -t long option for the extended (and more thorough) test.

sudo smartctl -t short /dev/sda

The test runs in the background, enabling you to perform other tasks while it runs. To check the results of the test, run the following command:

sudo smartctl -l selftest /dev/sda

This shows the results of the last twenty self-tests and doesn’t give any indication of a current running test. Run an extended test also.

sudo smartctl -t long /dev/sda

If either test fails, back up your data immediately.

Smartd

While smartctl is a great tool, it needs to be run regularly and frequently. Smartd is a daemon that is designed to run in the background and periodically request SMART diagnostics from selected hard drives. This way, immediately when an error is received or a test is failed, the administrator can be notified for appropriate action.

The configuration file is normally located at “/etc/smartd.conf.” Open this file and check for a line that begins with “DEVICESCAN” and comment it out by adding “#” at the start. Then explicitly list the drives to be monitored by adding the following for every drive:

/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner

The options above indicate the following situations:

/dev/sda: The hard drive device file
-a: This enables some common options. You almost certainly want to use it.
-d sat: On my system, smartctl correctly guesses that I have a serial ATA drive. smartd on the other hand does not. If you had to add a -d TYPE parameter to the smartctl commands, you’ll almost certainly have to do the same here. If you didn’t, try leaving it out initially. You can add it later if smartd fails to start.
-o on: Enables SMART Automatic offline testing
-S on: Enables SMART autosave
-s (S/../.././02|L/../../6/03): Run both short (S/../.././02) and long (L/../../6/03) self tests at scheduled times. This sample indicates a short test at 2:00 A.M daily and a long test every Saturday at 3:00 A.M.
-m root: Send a mail to the address specified (root here). Can be separated by commas. Note that this requires a working email set up on the system
-M exec /usr/share/smartmontools/smartd-runner: This modifies the behaviour of the -m flag. On Debian and Ubuntu systems, smartd-runner executes other actions in addition to the mail (-m) option.

For more information, check out the smartd.conf man pages.

After configuring smartd, we have to ensure that it starts up on system start. To do this, open the configuration file “/etc/default/smartmontools” and uncomment the line #start_smartd=yes (remove the #). You can then start smartd by running:

sudo /etc/init.d/smartmontools start

The diagnostics would be logged to syslog, but errors will trigger an email alert. To test that the email works, add -M test to the line in “/etc/smartd.conf” and restart.

sudo /etc/init.d/smartmontools restart

This would send a test notification.

Conclusion

It is surprisingly hard to estimate the lifespan of hard drives (a very good article is available on this). Using the SMART capabilities of your hard drive, with smartmontools, can provide vital hours for a data migration before the drive experiences catastrophic failure. While there really is no substitute for a good backup plan, smartmontools can help alert a system owner/admin to possible failure.