Диск sda подключен к контроллеру «Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller» в момент записи на диск он отваливается.
Диагностика:
Информация lspci по RAID bus controller
02:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
Subsystem: Silicon Image, Inc. Device 7132
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f7d84000 (64-bit, non-prefetchable) [size=128]
Region 2: Memory at f7d80000 (64-bit, non-prefetchable) [size=16K]
Region 4: I/O ports at e000 [size=128]
Expansion ROM at f7d00000 [disabled] [size=512K]
Capabilities: [54] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [5c] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v1) Legacy Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
Kernel driver in use: sata_sil24
Информация о том как система «сбрасывает» диск
hard resetting link
Jul 29 01:04:48 bacula kernel: [28716.231729] ata2: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Jul 29 01:04:48 bacula kernel: [28716.233751] ata2: SError: { PHYRdyChg }
Jul 29 01:04:48 bacula kernel: [28716.235755] ata2: hard resetting link
Jul 29 01:04:51 bacula kernel: [28718.917945] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jul 29 01:04:51 bacula kernel: [28719.046239] ata2.00: configured for UDMA/33
Jul 29 01:04:51 bacula kernel: [28719.046248] ata2: EH complete
Jul 29 01:05:20 bacula kernel: [28748.279258] ata2: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Jul 29 01:05:20 bacula kernel: [28748.281283] ata2: SError: { PHYRdyChg }
Jul 29 01:05:20 bacula kernel: [28748.283288] ata2: hard resetting link
Jul 29 01:05:22 bacula kernel: [28750.580883] ata2: COMRESET failed (errno=-19)
Jul 29 01:05:22 bacula kernel: [28750.582860] ata2: reset failed (errno=-19), retrying in 8 secs
Jul 29 01:05:30 bacula kernel: [28758.278685] ata2: hard resetting link
Jul 29 01:05:32 bacula kernel: [28760.458082] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jul 29 01:05:32 bacula kernel: [28760.570380] ata2.00: configured for UDMA/33
Jul 29 01:05:32 bacula kernel: [28760.570389] ata2: EH complete
Jul 29 01:05:38 bacula kernel: [28766.813705] ata2: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Jul 29 01:05:38 bacula kernel: [28766.815706] ata2: SError: { PHYRdyChg }
Jul 29 01:05:38 bacula kernel: [28766.817689] ata2: hard resetting link
Jul 29 01:05:40 bacula kernel: [28768.011906] ata2: COMRESET failed (errno=-19)
Jul 29 01:05:40 bacula kernel: [28768.013861] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:05:48 bacula kernel: [28776.813394] ata2: hard resetting link
Jul 29 01:05:50 bacula kernel: [28777.977059] ata2: COMRESET failed (errno=-19)
Jul 29 01:05:50 bacula kernel: [28777.978978] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:05:58 bacula kernel: [28786.810510] ata2: hard resetting link
Jul 29 01:06:00 bacula kernel: [28788.438070] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:00 bacula kernel: [28788.439950] ata2: reset failed (errno=-19), retrying in 34 secs
Jul 29 01:06:33 bacula kernel: [28821.800519] ata2: hard resetting link
Jul 29 01:06:35 bacula kernel: [28823.332106] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:35 bacula kernel: [28823.333955] ata2: reset failed, giving up
Jul 29 01:06:35 bacula kernel: [28823.335766] ata2.00: disabled
Jul 29 01:06:35 bacula kernel: [28823.335775] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen t4
Jul 29 01:06:35 bacula kernel: [28823.337625] ata2: SError: { PHYRdyChg CommWake }
Jul 29 01:06:35 bacula kernel: [28823.339448] ata2: hard resetting link
Jul 29 01:06:37 bacula kernel: [28825.651448] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:37 bacula kernel: [28825.653254] ata2: reset failed (errno=-19), retrying in 8 secs
Jul 29 01:06:45 bacula kernel: [28833.333251] ata2: hard resetting link
Jul 29 01:06:47 bacula kernel: [28834.960789] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:47 bacula kernel: [28834.962553] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:06:55 bacula kernel: [28843.330397] ata2: hard resetting link
Jul 29 01:06:56 bacula kernel: [28844.334107] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:56 bacula kernel: [28844.335834] ata2: reset failed (errno=-19), retrying in 34 secs
Jul 29 01:07:30 bacula kernel: [28878.320406] ata2: hard resetting link
Jul 29 01:07:31 bacula kernel: [28879.819973] ata2: COMRESET failed (errno=-19)
Jul 29 01:07:31 bacula kernel: [28879.821664] ata2: reset failed, giving up
Jul 29 01:07:31 bacula kernel: [28879.823334] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen t3
Jul 29 01:07:31 bacula kernel: [28879.825029] ata2: SError: { PHYRdyChg CommWake }
Jul 29 01:07:31 bacula kernel: [28879.826737] ata2: hard resetting link
Jul 29 01:07:33 bacula kernel: [28880.987639] ata2: COMRESET failed (errno=-19)
Jul 29 01:07:33 bacula kernel: [28880.989314] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:07:41 bacula kernel: [28889.821095] ata2: hard resetting link
Jul 29 01:07:44 bacula kernel: [28892.496350] ata2: COMRESET failed (errno=-32)
Jul 29 01:07:44 bacula kernel: [28892.497991] ata2: reset failed (errno=-32), retrying in 8 secs
Jul 29 01:07:51 bacula kernel: [28899.818266] ata2: hard resetting link
Jul 29 01:07:52 bacula kernel: [28900.726007] ata2: COMRESET failed (errno=-19)
Jul 29 01:07:52 bacula kernel: [28900.727618] ata2: reset failed (errno=-19), retrying in 35 secs
Jul 29 01:08:26 bacula kernel: [28934.808272] ata2: hard resetting link
Jul 29 01:08:28 bacula kernel: [28936.211843] ata2: COMRESET failed (errno=-19)
Jul 29 01:08:28 bacula kernel: [28936.213422] ata2: reset failed, giving up
Jul 29 01:08:28 bacula kernel: [28936.214974] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen t2
Jul 29 01:08:28 bacula kernel: [28936.216591] ata2: SError: { PHYRdyChg CommWake }
Jul 29 01:08:28 bacula kernel: [28936.218185] ata2: hard resetting link
Jul 29 01:08:29 bacula kernel: [28937.411524] ata2: COMRESET failed (errno=-19)
Jul 29 01:08:29 bacula kernel: [28937.413099] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:08:38 bacula kernel: [28946.213007] ata2: hard resetting link
Jul 29 01:08:41 bacula kernel: [28949.120180] ata2: COMRESET failed (errno=-19)
Jul 29 01:08:41 bacula kernel: [28949.121724] ata2: reset failed (errno=-19), retrying in 8 secs
Jul 29 01:08:48 bacula kernel: [28956.210158] ata2: hard resetting link
Jul 29 01:08:49 bacula kernel: [28957.773711] ata2: COMRESET failed (errno=-19)
Jul 29 01:08:49 bacula kernel: [28957.775253] ata2: reset failed (errno=-19), retrying in 34 secs
Jul 29 01:09:23 bacula kernel: [28991.200166] ata2: hard resetting link
Jul 29 01:09:24 bacula kernel: [28992.043922] ata2: COMRESET failed (errno=-19)
Jul 29 01:09:24 bacula kernel: [28992.045424] ata2: reset failed, giving up
Jul 29 01:09:24 bacula kernel: [28992.046910] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen t1
Jul 29 01:09:24 bacula kernel: [28992.048425] ata2: SError: { PHYRdyChg CommWake }
Jul 29 01:09:24 bacula kernel: [28992.049952] ata2: hard resetting link
Jul 29 01:09:25 bacula kernel: [28993.211561] ata2: COMRESET failed (errno=-19)
Jul 29 01:09:25 bacula kernel: [28993.213058] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:09:34 bacula kernel: [29002.045068] ata2: hard resetting link
Jul 29 01:09:35 bacula kernel: [29003.672602] ata2: COMRESET failed (errno=-19)
Jul 29 01:09:35 bacula kernel: [29003.674065] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:09:44 bacula kernel: [29012.042220] ata2: hard resetting link
Jul 29 01:09:45 bacula kernel: [29013.045931] ata2: COMRESET failed (errno=-19)
Jul 29 01:09:45 bacula kernel: [29013.047352] ata2: reset failed (errno=-19), retrying in 34 secs
Jul 29 01:10:19 bacula kernel: [29047.032223] ata2: hard resetting link
Jul 29 01:10:20 bacula kernel: [29047.939961] ata2: COMRESET failed (errno=-19)
Jul 29 01:10:20 bacula kernel: [29047.941350] ata2: reset failed, giving up
Jul 29 01:10:20 bacula kernel: [29047.942706] ata2: EH pending after 5 tries, giving up
Jul 29 01:10:20 bacula kernel: [29047.944099] ata2: EH complete
Jul 29 01:10:20 bacula kernel: [29047.944113] ata2.00: detaching (SCSI 2:0:0:0)
Jul 29 01:10:20 bacula kernel: [29047.945018] sd 2:0:0:0: [sda] Synchronizing SCSI cache
Jul 29 01:10:20 bacula kernel: [29047.945060] sd 2:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 29 01:10:20 bacula kernel: [29047.945065] sd 2:0:0:0: [sda] Stopping disk
Jul 29 01:10:20 bacula kernel: [29047.945073] sd 2:0:0:0: [sda] START_STOP FAILED
Jul 29 01:10:20 bacula kernel: [29047.945076] sd 2:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Система считает что на файловая структура содержит ошибки
Remounting filesystem read-only
Jul 29 04:00:00 bacula kernel: [39225.515551] EXT4-fs error (device sda1): ext4_find_entry:932: inode #2: comm bacula-fd: reading directory lblock 0
Jul 29 04:00:00 bacula kernel: [39225.528285] quiet_error: 35 callbacks suppressed
Jul 29 04:00:00 bacula kernel: [39225.528291] Buffer I/O error on device sda1, logical block 30441472
Jul 29 04:00:00 bacula kernel: [39225.529738] lost page write due to I/O error on sda1
Jul 29 04:00:00 bacula kernel: [39225.529742] JBD2: I/O error detected when updating journal superblock for sda1-8.
Jul 29 04:00:00 bacula kernel: [39225.531217] Aborting journal on device sda1-8.
Jul 29 04:00:00 bacula kernel: [39225.532677] Buffer I/O error on device sda1, logical block 30441472
Jul 29 04:00:00 bacula kernel: [39225.534109] lost page write due to I/O error on sda1
Jul 29 04:00:00 bacula kernel: [39225.535014] JBD2: I/O error detected when updating journal superblock for sda1-8.
Jul 29 04:00:00 bacula kernel: [39225.536531] journal commit I/O error
Jul 29 04:00:00 bacula kernel: [39225.537952] EXT4-fs error (device sda1): ext4_journal_start_sb:327: Detected aborted journal
Jul 29 04:00:00 bacula kernel: [39225.539426] EXT4-fs (sda1): Remounting filesystem read-only
По факту диска уже нет /dev/sda уже нет.
# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Smartctl open device: /dev/sda failed: No such device
Но его можно найти
echo "- - -" >/sys/class/scsi_host/host2/scan
Jul 29 10:03:34 bacula kernel: [61033.006707] sd 2:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB)
Jul 29 10:03:34 bacula kernel: [61033.006760] sd 2:0:0:0: [sda] Write Protect is off
Jul 29 10:03:34 bacula kernel: [61033.006763] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jul 29 10:03:34 bacula kernel: [61033.006785] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 29 10:03:34 bacula kernel: [61033.007290] sd 2:0:0:0: Attached scsi generic sg0 type 0
Jul 29 10:03:34 bacula kernel: [61033.027146] sda: sda1
Jul 29 10:03:34 bacula kernel: [61033.027376] sd 2:0:0:0: [sda] Attached SCSI disk
Jul 29 10:03:35 bacula kernel: [61034.578778] EXT4-fs (sda1): warning: mounting fs with errors, running e2fsck is recommended
Jul 29 10:03:35 bacula kernel: [61034.579301] EXT4-fs (sda1): recovery complete
Jul 29 10:03:35 bacula kernel: [61034.579304] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
И посмотреть SMART
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10
Device Model: ST3250310AS
Serial Number: 9RY17X2G
Firmware Version: 3.AAC
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Jul 29 10:07:17 2017 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 110 082 006 Pre-fail Always - 208897990
3 Spin_Up_Time 0x0003 099 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 2032
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 030 Pre-fail Always - 145392
9 Power_On_Hours 0x0032 046 046 000 Old_age Always - 47552
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 2036
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 078 078 000 Old_age Always - 22
190 Airflow_Temperature_Cel 0x0022 057 047 045 Old_age Always - 43 (Min/Max 42/44)
194 Temperature_Celsius 0x0022 043 053 000 Old_age Always - 43 (0 14 0 0)
195 Hardware_ECC_Recovered 0x001a 056 049 000 Old_age Always - 139832161
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 161 000 Old_age Always - 159614
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0