В течении пары месяцев система подвисала на две минуты (при чтении с жесткого диска) и в /var/log/messages валилось следующее http://bpaste.net/show/226995/
Mar 31 01:14:26 local_big_pc kernel: [ 5077.088103] ata3: soft resetting link
Mar 31 01:14:27 local_big_pc kernel: [ 5077.720081] ata3.00: configured for UDMA/133
Mar 31 01:14:27 local_big_pc kernel: [ 5077.720105] ata3: EH complete
Mar 31 01:15:38 local_big_pc kernel: [ 5149.024100] ata3: soft resetting link
Mar 31 01:15:38 local_big_pc kernel: [ 5149.221920] ata3.00: configured for UDMA/133
Mar 31 01:15:38 local_big_pc kernel: [ 5149.221945] sd 4:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08
Mar 31 01:15:38 local_big_pc kernel: [ 5149.221951] sd 4:0:0:0: [sdb] Sense Key : 0xb [current] [descriptor]
Mar 31 01:15:38 local_big_pc kernel: [ 5149.221961] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Mar 31 01:15:38 local_big_pc kernel: [ 5149.221976] 00 00 00 00
Mar 31 01:15:38 local_big_pc kernel: [ 5149.221983] sd 4:0:0:0: [sdb] ASC=0x0 ASCQ=0x0
Mar 31 01:15:38 local_big_pc kernel: [ 5149.221988] sd 4:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 04 14 2a 10 00 00 08 00
Mar 31 01:15:38 local_big_pc kernel: [ 5149.222049] ata3: EH complete
Сегодня умерли журналы на нескольких разделах sdb1 и sdb6, а sdb5 выпал из raid-1 и в messages уже писалось чуть другое http://bpaste.net/show/227044/ :
Apr 23 06:32:18 local_big_pc kernel: [ 167.330542] ata3: EH complete
Apr 23 06:32:18 local_big_pc kernel: [ 170.342426] ata3.00: configured for UDMA/133
Apr 23 06:32:18 local_big_pc kernel: [ 170.342442] ata3: EH complete
Apr 23 06:32:18 local_big_pc kernel: [ 173.354329] ata3.00: configured for UDMA/133
Apr 23 06:32:18 local_big_pc kernel: [ 173.354349] sd 4:0:0:0: [sdb] Unhandled sense code
Apr 23 06:32:18 local_big_pc kernel: [ 173.354353] sd 4:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08
Apr 23 06:32:18 local_big_pc kernel: [ 173.354358] sd 4:0:0:0: [sdb] Sense Key : 0x3 [current] [descriptor]
Apr 23 06:32:18 local_big_pc kernel: [ 173.354368] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Apr 23 06:32:18 local_big_pc kernel: [ 173.354384] 10 9d 90 d8
Apr 23 06:32:18 local_big_pc kernel: [ 173.354390] sd 4:0:0:0: [sdb] ASC=0x11 ASCQ=0x4
Apr 23 06:32:18 local_big_pc kernel: [ 173.354395] sd 4:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 10 9d 90 d8 00 00 08 00
Apr 23 06:32:18 local_big_pc kernel: [ 173.354458] ata3: EH complete
также в dmesg http://bpaste.net/show/227043/ :
[ 66.534264] ata3: EH complete
[ 69.520423] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 69.526263] ata3.00: BMDMA stat 0x25
[ 69.531996] ata3.00: failed command: READ DMA EXT
[ 69.537675] ata3.00: cmd 25/00:f0:08:90:9d/00:00:10:00:00/e0 tag 0 dma 122880 in
[ 69.537677] res 51/40:1f:d8:90:9d/40:00:10:00:00/e0 Emask 0x9 (media error)
[ 69.549253] ata3.00: status: { DRDY ERR }
[ 69.555008] ata3.00: error: { UNC }
[ 69.598368] ata3.00: configured for UDMA/133
[ 69.603994] sd 4:0:0:0: [sdb] Unhandled sense code
[ 69.609638] sd 4:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08
[ 69.615375] sd 4:0:0:0: [sdb] Sense Key : 0x3 [current] [descriptor]
[ 69.621151] Descriptor sense data with sense descriptors (in hex):
[ 69.626897] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 69.632868] 10 9d 90 d8
[ 69.638644] sd 4:0:0:0: [sdb] ASC=0x11 ASCQ=0x4
[ 69.644370] sd 4:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 10 9d 90 08 00 00 f0 00
[ 69.650327] end_request: I/O error, dev sdb, sector 278761688
[ 69.656085] JBD2: Failed to read block at offset 12062
[ 69.656091] ata3: EH complete
и в syslog http://bpaste.net/show/227045/ :
Apr 23 06:32:18 local_big_pc kernel: [ 167.292666] ata3.00: status: { DRDY ERR }
Apr 23 06:32:18 local_big_pc kernel: [ 167.292669] ata3.00: error: { UNC }
Apr 23 06:32:18 local_big_pc kernel: [ 170.304549] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr 23 06:32:18 local_big_pc kernel: [ 170.304553] ata3.00: BMDMA stat 0x25
Apr 23 06:32:18 local_big_pc kernel: [ 170.304559] ata3.00: failed command: READ DMA EXT
Apr 23 06:32:18 local_big_pc kernel: [ 170.304566] ata3.00: cmd 25/00:08:d8:90:9d/00:00:10:00:00/e0 tag 0 dma 4096 in
Apr 23 06:32:18 local_big_pc kernel: [ 170.304568] res 51/40:08:d8:90:9d/40:00:10:00:00/e0 Emask 0x9 (media error)
Apr 23 06:32:18 local_big_pc kernel: [ 170.304572] ata3.00: status: { DRDY ERR }
Apr 23 06:32:18 local_big_pc kernel: [ 170.304575] ata3.00: error: { UNC }
Apr 23 06:32:18 local_big_pc kernel: [ 173.316451] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr 23 06:32:18 local_big_pc kernel: [ 173.316456] ata3.00: BMDMA stat 0x25
Apr 23 06:32:18 local_big_pc kernel: [ 173.316461] ata3.00: failed command: READ DMA EXT
Apr 23 06:32:18 local_big_pc kernel: [ 173.316470] ata3.00: cmd 25/00:08:d8:90:9d/00:00:10:00:00/e0 tag 0 dma 4096 in
Apr 23 06:32:18 local_big_pc kernel: [ 173.316471] res 51/40:08:d8:90:9d/40:00:10:00:00/e0 Emask 0x9 (media error)
Apr 23 06:32:18 local_big_pc kernel: [ 173.316476] ata3.00: status: { DRDY ERR }
Apr 23 06:32:18 local_big_pc kernel: [ 173.316479] ata3.00: error: { UNC }
Вот smartctl --all /dev/sdb http://bpaste.net/show/227097/ упал на self-test, хотя when failed чистое
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 50% 3047 345648
Что происходит с диском и может ли эта проблема быть из-за действительно слабого БП? И что такое этот soft reseting link?
Говорю честно, искал, но своей проблемы в точности не нашел. Что вообще означают эти сообщения и коды?