Home  >  Article  >  System Tutorial  >  Detailed explanation of how to detect SSD health and lifespan under CentOS

Detailed explanation of how to detect SSD health and lifespan under CentOS

WBOY
WBOYforward
2024-01-08 13:18:201068browse

On the entire Internet, there are only Intel SSDs to check the hard drive life data. It is too unfair for poor people like us who can only use Crucial and OCZ. Like me, there is really no way to use a RAID card. Have you looked at the lifespan of SSDs from other vendors?

After some research, all commands to view SSD, as long as they are through RAID, require the use of MegaCli and smartCtl to obtain the ssd disk usage. After careful research, I currently use

The RAID cards are LSI Logic / Symbios Logic MegaRAID SAS 1078 and 2108. Use the usual MegaCli to query:

This is the download address:

MegaCli for Centos5

MegaCli for Centos6

The whole process is divided into two steps. The first step is to obtain the information of the following hard disk from the RAID card. Next, use smartCtl to display the detailed information of the hard disk.

Use MegaCli to obtain the hard drive information under the RAID card:

Then use the following command:

/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL

This way you can find out the content under the RAID card. It will be displayed as follows:

Enclosure Device ID: 252

Slot Number: 7

Device ID: 28

Sequence Number: 2

Media Error Count: 0

Other Error Count: 1

Predictive Failure Count: 0

Last Predictive Failure Event Seq Number: 0

PD Type: SATA

Raw Size: 119.242 GB [0xee7c2b0 Sectors]

Non Coerced Size: 118.742 GB [0xed7c2b0 Sectors]

Coerced Size: 118.277 GB [0xec8e000 Sectors]

Firmware state: Online, Spun Up

SAS Address(0): 0x1e394d57aa996b80

Connected Port Number: 7(path0)

Inquiry Data: 0000000011070303A99EC300-CTFDDAC128MAG               0007 

FDE Capable: Not Capable

FDE Enable: Disable

Secured: Unsecured

Locked: Unlocked

Needs EKM Attention: No

Foreign State: None

Device Speed: 6.0Gb/s

Link Speed: 1.5Gb/s

Media Type: Solid State Device

Pay attention to the above places. A lot of information will be output. Only Media Type: Solid State Device indicates that this is an SSD. Device Id: 28 needs to be written down. This will be needed later when querying using smartctl. .We can see that the hard drive model is displayed above: Inquiry Data: 0000000011070303A99EC300-CTFDDAC128MAG             0007. There is also a sign telling you whether this SSD is normal. Firmware state: Online, Spun Up this option, so if you make an SSD To monitor alarms, just monitor this parameter directly.

Use smartctl to get detailed information of SSD hard drive

Please note that the information of different manufacturers and different types of disks is different. The information of hard disks such as Intel will not be introduced. The following is the command used to query. Among them, -a is to display all the information. -d is used to set Hard drive. At this time, you need to note that the interfaces used by different RAID cards may be different, so there may be minor differences.

For example, for Intel's hard drive, just use -d megaraid, 27 and it will work fine. But after I used the raid card above, I needed to specify the sat parameter, and it became like this:

smartctl -a -d sat megaraid,27 /dev/sdb1 -s on

The sat above refers to the device converted from SCSI to ATA, and parameters such as scsi and ata can be added.

At this time, the following information will be displayed:

Model Family: Crucial/Micron RealSSD C300/C400

Device Model: C300-CTFDDAC128MAG

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0

5 Reallocated_Sector_Ct 0x0033 100 100 000 Pre-fail Always - 0

9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 5572

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3

170 Grown_Failing_Block_Ct  0x0033   100   100   000    Pre-fail  Always       -       0

171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0

172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0

173 Wear_Levelling_Count    0x0033   090   090   000    Pre-fail  Always       -       536

174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       1

181 Non4k_Aligned_Access    0x0022   100   100   000    Old_age   Always       -       0 0 0

183 SATA_Iface_Downshift    0x0032   100   100   000    Old_age   Always       -       0

184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0

189 Factory_Bad_Block_Ct    0x000e   100   100   000    Old_age   Always       -       250

195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       0

202 Perc_Rated_Life_Used    0x0018   090   090   000    Old_age   Offline      -       10

206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0

如果是 OCZ 的:

Device Model:     OCZ-AGILITY3

Serial Number:    OCZ-1OX963Q8B5X2V684

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate     0x000f   086   086   050    Pre-fail  Always       -       135388659

5 Reallocated_Sector_Ct   0x0033   100   100   003    Pre-fail  Always       -       9

9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       265772576277126

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       15

171 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       9

172 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       0

174 Unknown_Attribute       0x0030   000   000   000    Old_age   Offline      -       13

177 Wear_Leveling_Count     0x0000   000   000   000    Old_age   Offline      -       1

181 Program_Fail_Cnt_Total  0x0032   000   000   000    Old_age   Always       -       9

182 Erase_Fail_Count_Total  0x0032   000   000   000    Old_age   Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

194 Temperature_Celsius     0x0022   030   030   000    Old_age   Always       -       30 (Lifetime Min/Max 30/30)

195 Hardware_ECC_Recovered  0x001c   120   120   000    Old_age   Offline      -       135388659

196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       9

201 Soft_Read_Error_Rate    0x001c   120   120   000    Old_age   Offline      -       135388659

204 Soft_ECC_Correction     0x001c   120   120   000    Old_age   Offline      -       135388659

230 Head_Amplitude          0x0013   100   100   000    Pre-fail  Always       -       100

231 Temperature_Celsius     0x0013   100   100   010    Pre-fail  Always       -       0

233 Media_Wearout_Indicator 0x0000   000   000   000    Old_age   Offline      -       2531

234 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       3465

241 Total_LBAs_Written      0x0032   000   000   000    Old_age   Always       -       3465

242 Total_LBAs_Read         0x0032   000   000   000    Old_age   Always       -       2030

SSD 是否健康的参数分析:

Note that the service life at this time is no longer the Media_Wearout_Indicator parameter like intel ssd (of course OCZ also has it, and it becomes Perc_Rated_Life_Used in Crucial). But in fact, we need to see whether the SSD is healthy, mainly through the Wear Leveling Count (particle Average number of erases and writes) this parameter and the parameter Grown Failling Block Ct.

Pay attention to the following two lines:

170 Grown_Failing_Block_Ct 0x0033 100 100 000 Pre-fail Always - - 0

173 Wear_Levelling_Count 0x0033 090 090 000 Pre-fail Always - 536

The above two parameters are the key:

Wear Leveling Count: Let’s talk about this parameter first. It’s more important. Let’s first declare that this hard drive is an SSD hard drive that has been used for one year. The data shown in the picture is 536, which is the number of this 128G hard drive. The number of full disk write/erase (P/E) times is 536, indicating that there is still 90% life. So the life of the flash memory particles used in this hard drive is approximately 5,000 times. 536 is about 10% of 5,000, so The value of this item is 90 (CA). Grown Failing Block Count (number of new bad blocks in use): This item represents the number of bad blocks (similar to bad sectors of HDD) that occur when the SSD flash memory particles are used. The data here If it is 0, it means that there are no bad blocks. If you are not lucky. When you buy a new SSD and use it normally, this data will change greatly in a short period of time, which means there may be a problem with the disk. Please contact after-sales service as soon as possible.

Introduction to common parameter combinations of MegaCli:

MegaCli -cfgdsply -aALL | grep “Error” [Normal is 0]

MegaCli -LDGetProp -Cache -LALL -a0                                                                                                                                                                                        

MegaCli -cfgdsply -aALL | grep “Memory” [Memory size]

MegaCli -LDInfo -Lall -aALL                                                                                                                                                      

MegaCli -AdpAllInfo -aALL                                                                                                                                                              

MegaCli -PDList -aALL                                                                                                                                                                          

MegaCli -AdpBbuCmd -aAll                                                                                                                                                                      

MegaCli -FwTermLog -Dsply -aALL 【View RAID card log】

MegaCli -adpCount                                                                                                                                                                                                      

MegaCli -AdpGetTime –aALL                                                                                                                                                                                      

MegaCli -AdpAllInfo -aAll                                                                                                                                                                              

MegaCli -LDInfo -LALL -aAll [Display all logical disk group information]

MegaCli -PDList -aAll                                                                                                                                                                            

MegaCli -AdpBbuCmd -GetBbuStatus -aALL |grep “Charger Status” [View charging status]

MegaCli -AdpBbuCmd -GetBbuStatus -aALL                                                                                                                

MegaCli -AdpBbuCmd -GetBbuCapacityInfo -aALL 【Display BBU capacity information】

MegaCli -AdpBbuCmd -GetBbuDesignInfo -aALL                                                                                                                                                                                  

MegaCli -AdpBbuCmd -GetBbuProperties -aALL                                                                                                                                                              

MegaCli -cfgdsply -aALL                                                                                                                                                        

Changes in tape status, from disk removal to disk insertion:

Device      |Normal|Damage|Rebuild|Normal

Virtual Drive |Optimal|Degraded|Degraded|Optimal

Physical Drive |Online|Failed –> Unconfigured|Rebuild|Online

The above is the detailed content of Detailed explanation of how to detect SSD health and lifespan under CentOS. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jb51.net. If there is any infringement, please contact admin@php.cn delete