Здравствуйте, имею следующую проблему(вернее, она меня):
Упал сервер в kernel panic, в логах успело появиться:
kernel.log:
Apr 9 01:38:12 dc-116 kernel: [6036876.289013] sas: command 0xffff88013930dd80, task 0xffff880274ab3d00, timed out: BLK_EH_NOT_HANDLED
Apr 9 01:38:12 dc-116 kernel: [6036876.289072] sas: Enter sas_scsi_recover_host
Apr 9 01:38:12 dc-116 kernel: [6036876.289078] sas: trying to find task 0xffff880274ab3d00
Apr 9 01:38:12 dc-116 kernel: [6036876.289082] sas: sas_scsi_find_task: aborting task 0xffff880274ab3d00
Apr 9 01:38:12 dc-116 kernel: [6036876.289155] sas: sas_scsi_find_task: task 0xffff880274ab3d00 is done
Apr 9 01:38:12 dc-116 kernel: [6036876.289160] sas: sas_eh_handle_sas_errors: task 0xffff880274ab3d00 is done
Apr 9 01:38:12 dc-116 kernel: [6036876.289165] sas: sas_ata_task_done: SAS error 8d
Apr 9 01:38:12 dc-116 kernel: [6036876.289175] ata7: sas eh calling libata port error handler
Apr 9 01:38:12 dc-116 kernel: [6036876.289191] ata8: sas eh calling libata port error handler
Apr 9 01:38:12 dc-116 kernel: [6036876.289198] ata9: sas eh calling libata port error handler
Apr 9 01:38:12 dc-116 kernel: [6036876.289205] ata10: sas eh calling libata port error handler
Apr 9 01:38:12 dc-116 kernel: [6036876.289213] ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 t0
Apr 9 01:38:12 dc-116 kernel: [6036876.289235] ata10.00: failed command: FLUSH CACHE EXT
Apr 9 01:38:12 dc-116 kernel: [6036876.289255] ata10.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Apr 9 01:38:12 dc-116 kernel: [6036876.289257] res 01/04:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
Apr 9 01:38:12 dc-116 kernel: [6036876.289289] ata10.00: status: { ERR }
Apr 9 01:38:12 dc-116 kernel: [6036876.289300] ata10.00: error: { ABRT }
Apr 9 01:38:12 dc-116 kernel: [6036876.289317] ata10: hard resetting link
Apr 9 01:38:37 dc-116 kernel: [6036901.332959] isci 0000:04:00.0: isci_port_perform_hard_reset: iport = ffff880273aa2dd0; hard reset failed (0x21)
Apr 9 01:38:37 dc-116 kernel: [6036901.332988] isci 0000:04:00.0: isci_port_perform_hard_reset: iport = ffff880273aa2dd0; hard reset failed (0x21) - driving explicit link fail for all phys
Apr 9 01:38:37 dc-116 kernel: [6036901.333022] sas: sas_ata_hard_reset: Unable to reset I T nexus?
Apr 9 01:38:37 dc-116 kernel: [6036901.333025] sas: sas_ata_hard_reset: Found ATA device.
Apr 9 01:38:37 dc-116 kernel: [6036901.333030] sas: sas_ata_soft_reset: Unable to soft reset
Apr 9 01:38:37 dc-116 kernel: [6036901.333032] sas: sas_ata_soft_reset: Found ATA device.
Apr 9 01:38:37 dc-116 kernel: [6036901.333037] ata10: hard resetting link
Apr 9 01:38:37 dc-116 kernel: [6036901.333040] sas: sas_ata_hard_reset: Found ATA device.
Apr 9 01:38:37 dc-116 kernel: [6036901.333057] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x41)
Apr 9 01:38:37 dc-116 kernel: [6036901.333061] ata10.00: revalidation failed (errno=-5)
Apr 9 01:38:37 dc-116 kernel: [6036901.333998] sd 0:0:3:0: [sdd] Synchronizing SCSI cache
Apr 9 01:38:42 dc-116 kernel: [6036906.324156] ata10: hard resetting link
Apr 9 01:38:42 dc-116 kernel: [6036906.324162] sas: sas_ata_hard_reset: Found ATA device.
Apr 9 01:38:42 dc-116 kernel: [6036906.324180] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x41)
Apr 9 01:38:42 dc-116 kernel: [6036906.324183] ata10.00: revalidation failed (errno=-5)
Apr 9 01:38:47 dc-116 kernel: [6036911.315331] ata10: hard resetting link
Apr 9 01:38:47 dc-116 kernel: [6036911.315337] sas: sas_ata_hard_reset: Found ATA device.
Apr 9 01:38:47 dc-116 kernel: [6036911.315355] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x41)
Apr 9 01:38:47 dc-116 kernel: [6036911.315358] ata10.00: revalidation failed (errno=-5)
Apr 9 01:38:47 dc-116 kernel: [6036911.315374] ata10.00: disabled
Apr 9 01:38:47 dc-116 kernel: [6036911.315382] ata10.00: device reported invalid CHS sector 0
Apr 9 01:38:47 dc-116 kernel: [6036911.315389] ata10: EH complete
Apr 9 01:38:47 dc-116 kernel: [6036911.315394] sas: --- Exit sas_scsi_recover_host
Apr 9 01:38:47 dc-116 kernel: [6036911.315657] sd 0:0:3:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 9 01:38:47 dc-116 kernel: [6036911.315666] sd 0:0:3:0: [sdd] Stopping disk
Apr 9 01:38:47 dc-116 kernel: [6036911.315748] sd 0:0:3:0: [sdd] START_STOP FAILED
Apr 9 01:38:47 dc-116 kernel: [6036911.315754] sd 0:0:3:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 9 01:38:47 dc-116 kernel: [6036911.315953] md/raid1:md4: Disk failure on sdd1, disabling device.
Apr 9 01:38:47 dc-116 kernel: [6036911.315956] md/raid1:md4: Operation continuing on 1 devices.
Apr 9 01:38:48 dc-116 kernel: [6036911.649230] RAID1 conf printout:
Apr 9 01:38:48 dc-116 kernel: [6036911.649236] --- wd:1 rd:2
Apr 9 01:38:48 dc-116 kernel: [6036911.649240] disk 0, wo:0, o:1, dev:sdc1
Apr 9 01:38:48 dc-116 kernel: [6036911.649244] disk 1, wo:1, o:0, dev:sdd1
Apr 9 01:38:48 dc-116 kernel: [6036911.714653] RAID1 conf printout:
Apr 9 01:38:48 dc-116 kernel: [6036911.714659] --- wd:1 rd:2
Apr 9 01:38:48 dc-116 kernel: [6036911.714663] disk 0, wo:0, o:1, dev:sdc1
В СМАРТе диска появилось
184 End-to-End_Error 0x0032 094 094 099 Old_age Always FAILING_NOW 6
После перезагрузки диск не появился пока не вытащил-вставил его из корзины, что косвенно указывает на контакт(шлейф?).
Диск(Seagate) часть RAID1+LVM, OS Debian Wheezy.
Сервер - Supermicro, с питанием проблем нет.
Товарищи, подскажите возможный источник проблемы.(Скрин с паникой выложу если нужно)