Smartmontools - Linux中監控硬碟的健康狀態及溫度告警| ContentParty ...

Smartmontools - Linux中監控硬碟的健康狀態及溫度告警

Linux當中沒有很多方便又有圖形化的監控工具，大部分的監控可以透過SNMP或是其他工具輔助，像我通常都使用單顆硬碟居多，硬碟的健康狀況總是得偶爾關心一下，而且老實說我真的沒有在意過Linux的硬碟溫度，還真是糟糕.....不過既然得知了有方便的工具，當然得使用看看，多關心硬碟一點，壽命久一點！

「Smartmontools」在Linux當中可以透過yum來安裝即可，我要安裝的同時才發現我系統內原本就有這個套件，之前都沒有用到還真是小小浪費了～來看看除了文件檔之外有哪些檔案呢？

$ rpm -ql smartmontools
/etc/rc.d/init.d/smartd
/etc/smartd.conf
/etc/sysconfig/smartmontools
/usr/sbin/smartctl
/usr/sbin/smartd

原來這工具就是系統內一直都沒開啟使用的「smartd」服務啊^^a，一直以來我都沒有開啟過，一直以來都只開起【必要開啟的服務】，其他的能省則省。

$ smartctl -a /dev/sdb3
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: ST31000333AS
Serial Number:    9TE10MBY
Firmware Version: CC1H
User Capacity:    1,000,203,804,160 bytes
Device is:    Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is:    Tue Feb 8 14:25:49 2011 CST
SMART support is: Available - device has SMART capability.
SMART support is: Disabled（未啟動HDD SMART）

SMART Disabled. Use option -s with argument 'on' to enable it.（未啟動服務）

啟動服務後就會有一堆硬碟資訊出現了～

真的還不少，展開來看吧！ more..

真的還不少，展開來看吧！ less..

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x82)    Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:    ( 0)    The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 625) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:    (0x0003)    Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:    (0x01)    Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 212) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities:    (0x103f)    SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME    FLAG VALUE WORST THRESH TYPE    UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 120 099 006    Pre-fail Always - 238786274
3 Spin_Up_Time    0x0003 100 100 000    Pre-fail Always - 0
4 Start_Stop_Count    0x0032 100 100 020    Old_age Always - 26
5 Reallocated_Sector_Ct 0x0033 100 100 036    Pre-fail Always - 6
7 Seek_Error_Rate 0x000f 069 060 030    Pre-fail Always - 9620914
9 Power_On_Hours    0x0032 081 081 000    Old_age Always - 16940
10 Spin_Retry_Count    0x0013 100 100 097    Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 037 020    Old_age Always - 34
184 Unknown_Attribute 0x0032 100 100 099    Old_age Always - 0
187 Reported_Uncorrect    0x0032 001 001 000    Old_age Always - 643
188 Unknown_Attribute 0x0032 100 100 000    Old_age Always - 0
189 High_Fly_Writes 0x003a 044 044 000    Old_age Always - 56
190 Airflow_Temperature_Cel 0x0022 059 041 045    Old_age Always In_the_past 41 (7 68 41 41)
194 Temperature_Celsius 0x0022 041 059 000    Old_age Always - 41 (0 25 0 0)
195 Hardware_ECC_Recovered 0x001a 034 029 000    Old_age Always - 238786274
197 Current_Pending_Sector 0x0012 100 100 000    Old_age Always - 16
198 Offline_Uncorrectable 0x0010 100 100 000    Old_age Offline    - 16
199 UDMA_CRC_Error_Count    0x003e 200 200 000    Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000    Old_age Offline    - 87746181860942
241 Unknown_Attribute 0x0000 100 253 000    Old_age Offline    - 702344905
242 Unknown_Attribute 0x0000 100 253 000    Old_age Offline    - 353203500

SMART Error Log Version: 1
ATA Error Count: 635 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 635 occurred at disk power-on lifetime: 3659 hours (152 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.