LINUX.ORG.RU
ФорумAdmin

SATA hard resetting link

 phyrdychg,


0

1

Диск sda подключен к контроллеру «Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller» в момент записи на диск он отваливается.

Диагностика:
Информация lspci по RAID bus controller

02:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
	Subsystem: Silicon Image, Inc. Device 7132
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at f7d84000 (64-bit, non-prefetchable) [size=128]
	Region 2: Memory at f7d80000 (64-bit, non-prefetchable) [size=16K]
	Region 4: I/O ports at e000 [size=128]
	Expansion ROM at f7d00000 [disabled] [size=512K]
	Capabilities: [54] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [5c] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v1) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: sata_sil24

Информация о том как система «сбрасывает» диск
hard resetting link

Jul 29 01:04:48 bacula kernel: [28716.231729] ata2: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Jul 29 01:04:48 bacula kernel: [28716.233751] ata2: SError: { PHYRdyChg }
Jul 29 01:04:48 bacula kernel: [28716.235755] ata2: hard resetting link
Jul 29 01:04:51 bacula kernel: [28718.917945] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jul 29 01:04:51 bacula kernel: [28719.046239] ata2.00: configured for UDMA/33
Jul 29 01:04:51 bacula kernel: [28719.046248] ata2: EH complete
Jul 29 01:05:20 bacula kernel: [28748.279258] ata2: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Jul 29 01:05:20 bacula kernel: [28748.281283] ata2: SError: { PHYRdyChg }
Jul 29 01:05:20 bacula kernel: [28748.283288] ata2: hard resetting link
Jul 29 01:05:22 bacula kernel: [28750.580883] ata2: COMRESET failed (errno=-19)
Jul 29 01:05:22 bacula kernel: [28750.582860] ata2: reset failed (errno=-19), retrying in 8 secs
Jul 29 01:05:30 bacula kernel: [28758.278685] ata2: hard resetting link
Jul 29 01:05:32 bacula kernel: [28760.458082] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jul 29 01:05:32 bacula kernel: [28760.570380] ata2.00: configured for UDMA/33
Jul 29 01:05:32 bacula kernel: [28760.570389] ata2: EH complete
Jul 29 01:05:38 bacula kernel: [28766.813705] ata2: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Jul 29 01:05:38 bacula kernel: [28766.815706] ata2: SError: { PHYRdyChg }
Jul 29 01:05:38 bacula kernel: [28766.817689] ata2: hard resetting link
Jul 29 01:05:40 bacula kernel: [28768.011906] ata2: COMRESET failed (errno=-19)
Jul 29 01:05:40 bacula kernel: [28768.013861] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:05:48 bacula kernel: [28776.813394] ata2: hard resetting link
Jul 29 01:05:50 bacula kernel: [28777.977059] ata2: COMRESET failed (errno=-19)
Jul 29 01:05:50 bacula kernel: [28777.978978] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:05:58 bacula kernel: [28786.810510] ata2: hard resetting link
Jul 29 01:06:00 bacula kernel: [28788.438070] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:00 bacula kernel: [28788.439950] ata2: reset failed (errno=-19), retrying in 34 secs
Jul 29 01:06:33 bacula kernel: [28821.800519] ata2: hard resetting link
Jul 29 01:06:35 bacula kernel: [28823.332106] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:35 bacula kernel: [28823.333955] ata2: reset failed, giving up
Jul 29 01:06:35 bacula kernel: [28823.335766] ata2.00: disabled
Jul 29 01:06:35 bacula kernel: [28823.335775] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen t4
Jul 29 01:06:35 bacula kernel: [28823.337625] ata2: SError: { PHYRdyChg CommWake }
Jul 29 01:06:35 bacula kernel: [28823.339448] ata2: hard resetting link
Jul 29 01:06:37 bacula kernel: [28825.651448] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:37 bacula kernel: [28825.653254] ata2: reset failed (errno=-19), retrying in 8 secs
Jul 29 01:06:45 bacula kernel: [28833.333251] ata2: hard resetting link
Jul 29 01:06:47 bacula kernel: [28834.960789] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:47 bacula kernel: [28834.962553] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:06:55 bacula kernel: [28843.330397] ata2: hard resetting link
Jul 29 01:06:56 bacula kernel: [28844.334107] ata2: COMRESET failed (errno=-19)
Jul 29 01:06:56 bacula kernel: [28844.335834] ata2: reset failed (errno=-19), retrying in 34 secs
Jul 29 01:07:30 bacula kernel: [28878.320406] ata2: hard resetting link
Jul 29 01:07:31 bacula kernel: [28879.819973] ata2: COMRESET failed (errno=-19)
Jul 29 01:07:31 bacula kernel: [28879.821664] ata2: reset failed, giving up
Jul 29 01:07:31 bacula kernel: [28879.823334] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen t3
Jul 29 01:07:31 bacula kernel: [28879.825029] ata2: SError: { PHYRdyChg CommWake }
Jul 29 01:07:31 bacula kernel: [28879.826737] ata2: hard resetting link
Jul 29 01:07:33 bacula kernel: [28880.987639] ata2: COMRESET failed (errno=-19)
Jul 29 01:07:33 bacula kernel: [28880.989314] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:07:41 bacula kernel: [28889.821095] ata2: hard resetting link
Jul 29 01:07:44 bacula kernel: [28892.496350] ata2: COMRESET failed (errno=-32)
Jul 29 01:07:44 bacula kernel: [28892.497991] ata2: reset failed (errno=-32), retrying in 8 secs
Jul 29 01:07:51 bacula kernel: [28899.818266] ata2: hard resetting link
Jul 29 01:07:52 bacula kernel: [28900.726007] ata2: COMRESET failed (errno=-19)
Jul 29 01:07:52 bacula kernel: [28900.727618] ata2: reset failed (errno=-19), retrying in 35 secs
Jul 29 01:08:26 bacula kernel: [28934.808272] ata2: hard resetting link
Jul 29 01:08:28 bacula kernel: [28936.211843] ata2: COMRESET failed (errno=-19)
Jul 29 01:08:28 bacula kernel: [28936.213422] ata2: reset failed, giving up
Jul 29 01:08:28 bacula kernel: [28936.214974] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen t2
Jul 29 01:08:28 bacula kernel: [28936.216591] ata2: SError: { PHYRdyChg CommWake }
Jul 29 01:08:28 bacula kernel: [28936.218185] ata2: hard resetting link
Jul 29 01:08:29 bacula kernel: [28937.411524] ata2: COMRESET failed (errno=-19)
Jul 29 01:08:29 bacula kernel: [28937.413099] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:08:38 bacula kernel: [28946.213007] ata2: hard resetting link
Jul 29 01:08:41 bacula kernel: [28949.120180] ata2: COMRESET failed (errno=-19)
Jul 29 01:08:41 bacula kernel: [28949.121724] ata2: reset failed (errno=-19), retrying in 8 secs
Jul 29 01:08:48 bacula kernel: [28956.210158] ata2: hard resetting link
Jul 29 01:08:49 bacula kernel: [28957.773711] ata2: COMRESET failed (errno=-19)
Jul 29 01:08:49 bacula kernel: [28957.775253] ata2: reset failed (errno=-19), retrying in 34 secs
Jul 29 01:09:23 bacula kernel: [28991.200166] ata2: hard resetting link
Jul 29 01:09:24 bacula kernel: [28992.043922] ata2: COMRESET failed (errno=-19)
Jul 29 01:09:24 bacula kernel: [28992.045424] ata2: reset failed, giving up
Jul 29 01:09:24 bacula kernel: [28992.046910] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen t1
Jul 29 01:09:24 bacula kernel: [28992.048425] ata2: SError: { PHYRdyChg CommWake }
Jul 29 01:09:24 bacula kernel: [28992.049952] ata2: hard resetting link
Jul 29 01:09:25 bacula kernel: [28993.211561] ata2: COMRESET failed (errno=-19)
Jul 29 01:09:25 bacula kernel: [28993.213058] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:09:34 bacula kernel: [29002.045068] ata2: hard resetting link
Jul 29 01:09:35 bacula kernel: [29003.672602] ata2: COMRESET failed (errno=-19)
Jul 29 01:09:35 bacula kernel: [29003.674065] ata2: reset failed (errno=-19), retrying in 9 secs
Jul 29 01:09:44 bacula kernel: [29012.042220] ata2: hard resetting link
Jul 29 01:09:45 bacula kernel: [29013.045931] ata2: COMRESET failed (errno=-19)
Jul 29 01:09:45 bacula kernel: [29013.047352] ata2: reset failed (errno=-19), retrying in 34 secs
Jul 29 01:10:19 bacula kernel: [29047.032223] ata2: hard resetting link
Jul 29 01:10:20 bacula kernel: [29047.939961] ata2: COMRESET failed (errno=-19)
Jul 29 01:10:20 bacula kernel: [29047.941350] ata2: reset failed, giving up
Jul 29 01:10:20 bacula kernel: [29047.942706] ata2: EH pending after 5 tries, giving up
Jul 29 01:10:20 bacula kernel: [29047.944099] ata2: EH complete
Jul 29 01:10:20 bacula kernel: [29047.944113] ata2.00: detaching (SCSI 2:0:0:0)
Jul 29 01:10:20 bacula kernel: [29047.945018] sd 2:0:0:0: [sda] Synchronizing SCSI cache
Jul 29 01:10:20 bacula kernel: [29047.945060] sd 2:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 29 01:10:20 bacula kernel: [29047.945065] sd 2:0:0:0: [sda] Stopping disk
Jul 29 01:10:20 bacula kernel: [29047.945073] sd 2:0:0:0: [sda] START_STOP FAILED
Jul 29 01:10:20 bacula kernel: [29047.945076] sd 2:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Система считает что на файловая структура содержит ошибки
Remounting filesystem read-only

Jul 29 04:00:00 bacula kernel: [39225.515551] EXT4-fs error (device sda1): ext4_find_entry:932: inode #2: comm bacula-fd: reading directory lblock 0
Jul 29 04:00:00 bacula kernel: [39225.528285] quiet_error: 35 callbacks suppressed
Jul 29 04:00:00 bacula kernel: [39225.528291] Buffer I/O error on device sda1, logical block 30441472
Jul 29 04:00:00 bacula kernel: [39225.529738] lost page write due to I/O error on sda1
Jul 29 04:00:00 bacula kernel: [39225.529742] JBD2: I/O error detected when updating journal superblock for sda1-8.
Jul 29 04:00:00 bacula kernel: [39225.531217] Aborting journal on device sda1-8.
Jul 29 04:00:00 bacula kernel: [39225.532677] Buffer I/O error on device sda1, logical block 30441472
Jul 29 04:00:00 bacula kernel: [39225.534109] lost page write due to I/O error on sda1
Jul 29 04:00:00 bacula kernel: [39225.535014] JBD2: I/O error detected when updating journal superblock for sda1-8.
Jul 29 04:00:00 bacula kernel: [39225.536531] journal commit I/O error
Jul 29 04:00:00 bacula kernel: [39225.537952] EXT4-fs error (device sda1): ext4_journal_start_sb:327: Detected aborted journal
Jul 29 04:00:00 bacula kernel: [39225.539426] EXT4-fs (sda1): Remounting filesystem read-only

По факту диска уже нет /dev/sda уже нет.

# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl open device: /dev/sda failed: No such device

Но его можно найти

echo "- - -" >/sys/class/scsi_host/host2/scan
Jul 29 10:03:34 bacula kernel: [61033.006707] sd 2:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB)
Jul 29 10:03:34 bacula kernel: [61033.006760] sd 2:0:0:0: [sda] Write Protect is off
Jul 29 10:03:34 bacula kernel: [61033.006763] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jul 29 10:03:34 bacula kernel: [61033.006785] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 29 10:03:34 bacula kernel: [61033.007290] sd 2:0:0:0: Attached scsi generic sg0 type 0
Jul 29 10:03:34 bacula kernel: [61033.027146]  sda: sda1
Jul 29 10:03:34 bacula kernel: [61033.027376] sd 2:0:0:0: [sda] Attached SCSI disk
Jul 29 10:03:35 bacula kernel: [61034.578778] EXT4-fs (sda1): warning: mounting fs with errors, running e2fsck is recommended
Jul 29 10:03:35 bacula kernel: [61034.579301] EXT4-fs (sda1): recovery complete
Jul 29 10:03:35 bacula kernel: [61034.579304] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)

И посмотреть SMART

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3250310AS
Serial Number:    9RY17X2G
Firmware Version: 3.AAC
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Jul 29 10:07:17 2017 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   110   082   006    Pre-fail  Always       -       208897990
  3 Spin_Up_Time            0x0003   099   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       2032
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   090   060   030    Pre-fail  Always       -       145392
  9 Power_On_Hours          0x0032   046   046   000    Old_age   Always       -       47552
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   020    Old_age   Always       -       2036
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   078   078   000    Old_age   Always       -       22
190 Airflow_Temperature_Cel 0x0022   057   047   045    Old_age   Always       -       43 (Min/Max 42/44)
194 Temperature_Celsius     0x0022   043   053   000    Old_age   Always       -       43 (0 14 0 0)
195 Hardware_ECC_Recovered  0x001a   056   049   000    Old_age   Always       -       139832161
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   161   000    Old_age   Always       -       159614
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

★★★★★

Последнее исправление: petav (всего исправлений: 1)

Высокий UDMA_CRC_Error_Count обычно намекает на проблемы со шнуром. А вот High_Fly_Writes

High Fly Writes – означает, что записывающая головка парила над поверхностью выше, чем необходимо, в этот момент магнитное поле может быть недостаточным для надежной записи носителя. Причиной этому может быть – внешнее воздействие (удар), вибрация, дефект носителя или загрязнение (головок).

Больше ничего необычного в выводе SMART не вижу.

menzoberronzan
()

Кабель первым делом замени и разъемы посмотри.

Radjah ★★★★★
()
Ответ на: комментарий от menzoberronzan

Исправил файловую систему, подмонтировал, большой кусок данных погонял на чтение/запись с диска, скорость ожидаемая, в syslog тишина. Наблюдать!

petav ★★★★★
() автор топика
Вы не можете добавлять комментарии в эту тему. Тема перемещена в архив.