привет!
вопрос мой про софтовый рейд и винты хитачи. есть /dev/md0, собраный из пяти одинаковых винтов Hitachi HDS723020BLA642 в raid5.
сегодня я проснулся с головной болью и^W^W^W^W открыл консоль после жалоб, что некоторые шары не работают и посмотрел /proc/mdstat, который радостно рапортовал, что sdb1 зафейлился.
вот смарт инфо всех пяти:
/dev/sdb
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 82
3 Spin_Up_Time 0x0007 156 156 024 Pre-fail Always - 423 (Average 312)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 11
5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail Always FAILING_NOW 2005
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 133 133 020 Pre-fail Offline - 27
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 9314
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 307
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 307
194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 36 (Lifetime Min/Max 25/46)
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 2632
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
/dev/sdc
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 81
3 Spin_Up_Time 0x0007 154 154 024 Pre-fail Always - 425 (Average 318)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 11
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 133 133 020 Pre-fail Offline - 27
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 9314
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 303
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 303
194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 36 (Lifetime Min/Max 25/46)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 81
3 Spin_Up_Time 0x0007 154 154 024 Pre-fail Always - 424 (Average 319)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 11
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 133 133 020 Pre-fail Offline - 27
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 9314
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 308
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 308
194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 36 (Lifetime Min/Max 25/46)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 137 137 054 Pre-fail Offline - 77
3 Spin_Up_Time 0x0007 157 157 024 Pre-fail Always - 421 (Average 311)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 11
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 133 133 020 Pre-fail Offline - 27
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 9314
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 316
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 316
194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 36 (Lifetime Min/Max 25/46)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 137 137 054 Pre-fail Offline - 78
3 Spin_Up_Time 0x0007 153 153 024 Pre-fail Always - 430 (Average 320)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 11
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 130 130 020 Pre-fail Offline - 28
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 9314
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 312
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 312
194 Temperature_Celsius 0x0002 176 176 000 Old_age Always - 34 (Lifetime Min/Max 25/44)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
интересует, мог ли смарт ошибиться, и это просто проблемы в контакте, предположим, платы хдд с разъемом головок (тем, что под платой)? как видите, винты проработали более 9 тысяч часов без остановок в кондиционируемом помещении. что делать посоветуете? каковы шансы, что остальные винты пойдут по пути /dev/sdb?
спасибо!