Hi, I was subscribed to an age old bug related to incorrect hard disk shutdown. And I came across another related bug with hard disk power management on laptops. Today I am going to talk about these two bugs and about your HD SMART parameters.
So before discussing those bugs, lets find your SMART parameters and understand what they could mean.
Windows users download a tool called SpeedFan from here.
Ubuntu users should install , smartmontools :
sudo apt-get install smartmontools
Lets first discuss some important parameters related to these bugs.
sudo smartctl -A /dev/sda
For e.g. my output is ,
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 095 094 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1158
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 38
7 Seek_Error_Rate 0x000f 083 060 030 Pre-fail Always - 209545802
9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 14469
10 Spin_Retry_Count 0x0013 100 100 034 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1131
187 Unknown_Attribute 0x0032 001 001 000 Old_age Always - 6318
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Temperature_Celsius 0x0022 068 045 045 Old_age Always In_the_past 69275156512
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 391
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 498605
194 Temperature_Celsius 0x0022 032 055 000 Old_age Always - 32 (Lifetime Min/Max 0/16)
195 Hardware_ECC_Recovered 0x001a 067 057 000 Old_age Always - 234359261
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 087 000 Old_age Always - 935
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
Now the four important parameters are:
Power_On_Hours = No of hours hard disk has been on.
Power_Cycle_Count = No of times hard disk has been powered on.
Power-Off_Retract_Count = No of times drive was powered off in an emergency, called Emergency Unload.
Load_Cycle_Count = This number is highly affected by your power management policies. For e.g. a too aggressive power management might put hard disk to sleep too often. This number is indicative of when your hard disk parks, unparks , spins up spins down.
Lets analyze my numbers:
Power_On_Hours : 14469 : Hard disk on for 14,468 hours =~ 602 days, [ I bought my laptop in JAN ' 06, i.e. around 650 days. Hence my laptop has been on around most of the time. That's correct as I don't switch it off, unless it runs out of battery, :D]
Power_Cycle_Count : 1131 : No of times drive was powered on. Indicative of system restarts. Hence, for my numbers I reboot every 12 hours on an average(1131 times for 14469 hours).
On to the possible hard disk killer bugs now.
1. Related to
Power-Off_Retract_Count. Bug tracker : here. My count = 391.
This is possibly a kernel bug. In my machine, I could replicate it for a “Shutdown” and “Suspend”. However if I restarted the count did not change. That implies my hard disk was incorrectly powered off 391 times. Of course, a number of these times would have been legitimate. But this count didn’t increase on Shutdowns in Windows XP.
As suggested in the bug tracker here,
Emergency unload is intended to be invoked in rare situations. Because this operation is inherently uncontrolled, it is more mechanically stressful than a normal unload. A single emergency unload operation is more stressful than 100 normal unloads. Use of emergency unload reduces the start/stop life of the drive at a rate at least 100 times faster than that of normal unload and may damage the drive.
2. Related to
Load_Cycle_Count. Bug tracker : here. My count :
Max safe age range of this count as suggested by my Seagate data sheet = 600,000 => My count is 83% of max value. Too close I should say ? This averages around 42/hour for my case. My hard disk has a 5 year warranty, and I have reached safe permissible count in 1.75 years.
Turns out that Ubuntu does this(parking, spin-ups, to safe power) too aggressively if “Laptop mode” is enabled( ENABLE_LAPTOP_MODE in /etc/default/acpi-support). A part of bug is that some how Ubuntu also enables this aggressive power management on Desktop hard disks. Anyways , there is a solution to tame the aggressiveness. However, as most of people agreed, Ubuntu just reads these hard disk power management policies from BIOS when Laptop Mode is disabled . Laptop manufacturers set these values in BIOS/ hard disk firmware(?).
A temporary solution could be
hdparm -B 255 /dev/sda.
The number here governs aggressiveness of power management. The number can be 1 to 255. 255 completely shuts down APM. However this might be unsafe for laptops as hard disks might never park. (Dunno?). So I chose a value of 200. That controls my rate pretty much. But this setting will go once you reboot. You could place this command in /etc/rc.local
The pretty solution which i finally followed is here.
Related Links :
http://ubuntuforums.org/showthread.php?t=566072 – https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810/
http://ubuntuforums.org/showthread.php?t=591564 – https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695
My hard disk, Model ST9100824A, data sheet
Summary : Windows have advantage over Linux in handling hardware because hardware manufacturers provide them official hardware specs and recommendations. However, Linux community has to learn these tricks themselves when manufacturers doesn’t oblige them as they should (e.g. ATI drivers).
However, Linux community is maturing fast, I get more battery backup using Powertop on Linux than I get on XP. And I am sure Ubuntu handles my hardware better than Windows in most of the situations. And I’ll let the penguins dance on my Laptop.
Ipod Tip of the Day
Often people store their songs collection in a high bitrate format. This not only uses large space but also reduces battery life of IPOD. A 15mb song is as good as three 5mb songs in terms of battery consumption. In the long run, battery life is severely affected. Anyways, IPOD has not the best quality sound output and you wont be able to feel a audio quality increase for more than 192kbps VBR ( =~ 160 kbps CBR) .