What can tell your hard disk? Be S.M.A.R.T.


S.M.A.R.T. is a self-monitoring capability of hard disks that can provide many interesting details about the sanity level of the drive, and about the usage in general. There are several utilities around but smartmontools is multiplatform and quite low level giving all the details.

The hard disk has several monitoring attributes that are incremented during the lifetime and stored in a persistent memory when the disk is powered of. The list and description of such attributes is also posted on Wikipedia.

Some of these attributes deals with the usage or aging of the drive, while others are related to error conditions. The ones related to aging can be understood based on the mechanics of the hard drive. The drive is first powered up and then the disk starts spinning. Then the heads are moved out of the parking condition and perform their operation. Sometime, based on the operating system and the usage the heads get back into place or stop spinning.

These are the indicators with some numbers taken from my computer used almost daily since one year and half:

  • Power On Hours: 3211 hours, about 133 days.
  • Power Cycle Count: number of power activations of the drive: 2148, that means session of 1.48 hours in average
  • Start Stop Count: number of times the disk started rotating, that is 2231 a bit higher than the power cycles, meaning that usually the disk continue spinning
  • Load Cycle Count: number of times heads moved from park to the disk, that's 37120, really a lot. It seems that a notebook disk has a lifetime of 500k load cycle counts
While writing this post I had 3 Power Cycle Counts, 2 Start/Stops, 30 Load Cycle Counts.

An example of pre-fail error is the reallocation of sectors. This means that when a sector is found damaged the hard disk replaces it with another taken from specially reserved area without informing the operating system. In my case, this value is sadly 1. In case the hard disk is provided with a free fall sensors there are also measures of errors induced by the fall or vibrations.

Unfortunately many attributes are partially specified in the S.M.A.R.T. meaning that the provided value depends a lot on the manufacturer. Read-Error Rate is one of these numbers, any number above zero is a bad sign.

Let's look at a notebook hard drive that is behaving strangely and slowly. Only two usage attributes are available in this case:
  • Power On Hours: 529079. This is expressed in minutes because measuring it after a quarter of hour was 529094. This means meaning about 367 days
  • Power Cycle Count: 2313, if compared with min this is longer sessions of 3.8 hours in average
In this case the pre-fail measure all bad "Reaw Read Error Rate" (151000), "Reallocated Events" (51625!) and even works "Offline Uncorrectable" (above 1 million). 

Something is saying that is time for a new hard drive. I am already doing a low level clone with clonezilla.

Update: clonezilla is amazing, with reasonable compression it created the disk image at about 1.7GB/min and now it is restoring it at 3.0GB/min using a USB2 portable hard disk. The only problem is that I could not use the exFAT partition  on the USB disk because it is not yet supported by Linux due to licensing issues, and I had to use the HFS+ partition...

Update: after 5 months (about 150 days) ... Power On Hours got 4430 (1211, +50 days), Power Cycle Count 3068 (+920, that is sessions 1.3hours, Start Stop Count 3156 (+925)  and Load Cycle Count 52064 (+14944). The worst thing is the number of reallocation that moved from 1 to 16!

Comments

Popular posts from this blog

Docker for our ROS robotic overlords

cmakego: Simpler access to external libraries in CMake

Algebrical Data Types in C++