Home > Article > System Tutorial > Detailed explanation of how to detect SSD health and lifespan under CentOS
On the entire Internet, there are only Intel SSDs to check the hard drive life data. It is too unfair for poor people like us who can only use Crucial and OCZ. Like me, there is really no way to use a RAID card. Have you looked at the lifespan of SSDs from other vendors?
After some research, all commands to view SSD, as long as they are through RAID, require the use of MegaCli and smartCtl to obtain the ssd disk usage. After careful research, I currently use
The RAID cards are LSI Logic / Symbios Logic MegaRAID SAS 1078 and 2108. Use the usual MegaCli to query:
This is the download address:
MegaCli for Centos5
MegaCli for Centos6
The whole process is divided into two steps. The first step is to obtain the information of the following hard disk from the RAID card. Next, use smartCtl to display the detailed information of the hard disk.
Use MegaCli to obtain the hard drive information under the RAID card:Then use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
This way you can find out the content under the RAID card. It will be displayed as follows:
Enclosure Device ID: 252
Slot Number: 7
Device ID: 28
Sequence Number: 2
Media Error Count: 0
Other Error Count: 1
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 119.242 GB [0xee7c2b0 Sectors]
Non Coerced Size: 118.742 GB [0xed7c2b0 Sectors]
Coerced Size: 118.277 GB [0xec8e000 Sectors]
Firmware state: Online, Spun Up
SAS Address(0): 0x1e394d57aa996b80
Connected Port Number: 7(path0)
Inquiry Data: 0000000011070303A99EC300-CTFDDAC128MAG 0007
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 1.5Gb/s
Media Type: Solid State Device
Pay attention to the above places. A lot of information will be output. Only Media Type: Solid State Device indicates that this is an SSD. Device Id: 28 needs to be written down. This will be needed later when querying using smartctl. .We can see that the hard drive model is displayed above: Inquiry Data: 0000000011070303A99EC300-CTFDDAC128MAG 0007. There is also a sign telling you whether this SSD is normal. Firmware state: Online, Spun Up this option, so if you make an SSD To monitor alarms, just monitor this parameter directly.
Use smartctl to get detailed information of SSD hard drive
Please note that the information of different manufacturers and different types of disks is different. The information of hard disks such as Intel will not be introduced. The following is the command used to query. Among them, -a is to display all the information. -d is used to set Hard drive. At this time, you need to note that the interfaces used by different RAID cards may be different, so there may be minor differences.
For example, for Intel's hard drive, just use -d megaraid, 27 and it will work fine. But after I used the raid card above, I needed to specify the sat parameter, and it became like this:
smartctl -a -d sat megaraid,27 /dev/sdb1 -s on
The sat above refers to the device converted from SCSI to ATA, and parameters such as scsi and ata can be added.
At this time, the following information will be displayed:
Model Family: Crucial/Micron RealSSD C300/C400
Device Model: C300-CTFDDAC128MAG
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 000 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 5572
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3
170 Grown_Failing_Block_Ct 0x0033 100 100 000 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Wear_Levelling_Count 0x0033 090 090 000 Pre-fail Always - 536
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 1
181 Non4k_Aligned_Access 0x0022 100 100 000 Old_age Always - 0 0 0
183 SATA_Iface_Downshift 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 Factory_Bad_Block_Ct 0x000e 100 100 000 Old_age Always - 250
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 0
202 Perc_Rated_Life_Used 0x0018 090 090 000 Old_age Offline - 10
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
如果是 OCZ 的:
Device Model: OCZ-AGILITY3
Serial Number: OCZ-1OX963Q8B5X2V684
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 086 086 050 Pre-fail Always - 135388659
5 Reallocated_Sector_Ct 0x0033 100 100 003 Pre-fail Always - 9
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 265772576277126
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 15
171 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 9
172 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 0
174 Unknown_Attribute 0x0030 000 000 000 Old_age Offline - 13
177 Wear_Leveling_Count 0x0000 000 000 000 Old_age Offline - 1
181 Program_Fail_Cnt_Total 0x0032 000 000 000 Old_age Always - 9
182 Erase_Fail_Count_Total 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 030 030 000 Old_age Always - 30 (Lifetime Min/Max 30/30)
195 Hardware_ECC_Recovered 0x001c 120 120 000 Old_age Offline - 135388659
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 9
201 Soft_Read_Error_Rate 0x001c 120 120 000 Old_age Offline - 135388659
204 Soft_ECC_Correction 0x001c 120 120 000 Old_age Offline - 135388659
230 Head_Amplitude 0x0013 100 100 000 Pre-fail Always - 100
231 Temperature_Celsius 0x0013 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0000 000 000 000 Old_age Offline - 2531
234 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 3465
241 Total_LBAs_Written 0x0032 000 000 000 Old_age Always - 3465
242 Total_LBAs_Read 0x0032 000 000 000 Old_age Always - 2030
SSD 是否健康的参数分析:
Note that the service life at this time is no longer the Media_Wearout_Indicator parameter like intel ssd (of course OCZ also has it, and it becomes Perc_Rated_Life_Used in Crucial). But in fact, we need to see whether the SSD is healthy, mainly through the Wear Leveling Count (particle Average number of erases and writes) this parameter and the parameter Grown Failling Block Ct.
Pay attention to the following two lines:
170 Grown_Failing_Block_Ct 0x0033 100 100 000 Pre-fail Always - - 0
173 Wear_Levelling_Count 0x0033 090 090 000 Pre-fail Always - 536
The above two parameters are the key:
Wear Leveling Count: Let’s talk about this parameter first. It’s more important. Let’s first declare that this hard drive is an SSD hard drive that has been used for one year. The data shown in the picture is 536, which is the number of this 128G hard drive. The number of full disk write/erase (P/E) times is 536, indicating that there is still 90% life. So the life of the flash memory particles used in this hard drive is approximately 5,000 times. 536 is about 10% of 5,000, so The value of this item is 90 (CA). Grown Failing Block Count (number of new bad blocks in use): This item represents the number of bad blocks (similar to bad sectors of HDD) that occur when the SSD flash memory particles are used. The data here If it is 0, it means that there are no bad blocks. If you are not lucky. When you buy a new SSD and use it normally, this data will change greatly in a short period of time, which means there may be a problem with the disk. Please contact after-sales service as soon as possible.
Introduction to common parameter combinations of MegaCli:
MegaCli -cfgdsply -aALL | grep “Error” [Normal is 0]
MegaCli -LDGetProp -Cache -LALL -a0
MegaCli -cfgdsply -aALL | grep “Memory” [Memory size]MegaCli -LDInfo -Lall -aALL
MegaCli -AdpAllInfo -aALL
MegaCli -PDList -aALLMegaCli -AdpBbuCmd -aAll
MegaCli -FwTermLog -Dsply -aALL 【View RAID card log】MegaCli -adpCount
MegaCli -AdpGetTime –aALL
MegaCli -AdpAllInfo -aAllMegaCli -LDInfo -LALL -aAll [Display all logical disk group information]
MegaCli -PDList -aAll
MegaCli -AdpBbuCmd -GetBbuStatus -aALL |grep “Charger Status” [View charging status]MegaCli -AdpBbuCmd -GetBbuStatus -aALL
MegaCli -AdpBbuCmd -GetBbuCapacityInfo -aALL 【Display BBU capacity information】
MegaCli -AdpBbuCmd -GetBbuDesignInfo -aALL
MegaCli -AdpBbuCmd -GetBbuProperties -aALLMegaCli -cfgdsply -aALL
Changes in tape status, from disk removal to disk insertion:Device |Normal|Damage|Rebuild|Normal
Virtual Drive |Optimal|Degraded|Degraded|Optimal
Physical Drive |Online|Failed –> Unconfigured|Rebuild|Online
The above is the detailed content of Detailed explanation of how to detect SSD health and lifespan under CentOS. For more information, please follow other related articles on the PHP Chinese website!