LINUX.ORG.RU

Накрывается жесткий диск?


2

1

Заметил в messages такие сообщения:

Apr  5 21:30:49 schultz kernel: ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x4010000 action 0xe frozen
Apr  5 21:30:49 schultz kernel: ata1.00: irq_stat 0x00400040, connection status changed
Apr  5 21:30:49 schultz kernel: ata1: SError: { PHYRdyChg DevExch }
Apr  5 21:30:49 schultz kernel: ata1.00: failed command: READ FPDMA QUEUED
Apr  5 21:30:49 schultz kernel: ata1.00: cmd 60/20:00:07:ee:e2/00:00:02:00:00/40 tag 0 ncq 16384 in
Apr  5 21:30:49 schultz kernel: res 40/00:04:07:ee:e2/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Apr  5 21:30:49 schultz kernel: ata1.00: status: { DRDY }
Apr  5 21:30:49 schultz kernel: ata1: hard resetting link
Apr  5 21:30:55 schultz kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  5 21:30:55 schultz kernel: ata1.00: configured for UDMA/133
Apr  5 21:30:55 schultz kernel: ata1: EH complete
Apr  5 21:30:56 schultz kernel: ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x4010000 action 0xe frozen
Apr  5 21:30:56 schultz kernel: ata1.00: irq_stat 0x00400040, connection status changed
Apr  5 21:30:56 schultz kernel: ata1: SError: { PHYRdyChg DevExch }
Apr  5 21:30:56 schultz kernel: ata1.00: failed command: READ FPDMA QUEUED
Apr  5 21:30:56 schultz kernel: ata1.00: cmd 60/08:00:d7:f4:e1/00:00:02:00:00/40 tag 0 ncq 4096 in
Apr  5 21:30:56 schultz kernel: res 40/00:04:d7:f4:e1/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Apr  5 21:30:56 schultz kernel: ata1.00: status: { DRDY }
Apr  5 21:30:56 schultz kernel: ata1: hard resetting link
Apr  5 21:31:02 schultz kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  5 21:31:02 schultz kernel: ata1.00: configured for UDMA/133
Apr  5 21:31:02 schultz kernel: ata1: EH complete

Это жесткий диск или контроллер глючит?

Это кабель «от ветра» колышет. Или переткни или поставь другой

darkshvein ☆☆
()

Словил такое один раз:

Apr  5 23:13:12 schultz kernel: ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
Apr  5 23:13:12 schultz kernel: ata1.00: failed command: READ FPDMA QUEUED
Apr  5 23:13:12 schultz kernel: ata1.00: cmd 60/00:00:be:cd:5c/01:00:25:00:00/40 tag 0 ncq 131072 in
Apr  5 23:13:12 schultz kernel: res 40/00:04:5f:4e:35/00:00:00:00:00/40 Emask 0x4 (timeout)
Apr  5 23:13:12 schultz kernel: ata1.00: status: { DRDY }
Apr  5 23:13:12 schultz kernel: ata1.00: failed command: READ FPDMA QUEUED
Apr  5 23:13:12 schultz kernel: ata1.00: cmd 60/00:08:be:ce:5c/01:00:25:00:00/40 tag 1 ncq 131072 in
Apr  5 23:13:12 schultz kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  5 23:13:12 schultz kernel: ata1.00: status: { DRDY }
Apr  5 23:13:12 schultz kernel: ata1: hard resetting link
Apr  5 23:13:17 schultz kernel: ata1: link is slow to respond, please be patient (ready=0)
Apr  5 23:13:22 schultz kernel: ata1: COMRESET failed (errno=-16)
Apr  5 23:13:22 schultz kernel: ata1: hard resetting link
Apr  5 23:13:27 schultz kernel: ata1: link is slow to respond, please be patient (ready=0)
Apr  5 23:13:32 schultz kernel: ata1: COMRESET failed (errno=-16)
Apr  5 23:13:32 schultz kernel: ata1: hard resetting link
Apr  5 23:13:37 schultz kernel: ata1: link is slow to respond, please be patient (ready=0)
Apr  5 23:14:07 schultz kernel: ata1: COMRESET failed (errno=-16)
Apr  5 23:14:07 schultz kernel: ata1: limiting SATA link speed to 1.5 Gbps
Apr  5 23:14:07 schultz kernel: ata1: hard resetting link
Apr  5 23:14:12 schultz kernel: ata1: COMRESET failed (errno=-16)
Apr  5 23:14:12 schultz kernel: ata1: reset failed, giving up
Apr  5 23:14:12 schultz kernel: ata1.00: disabled
Apr  5 23:14:12 schultz kernel: ata1.00: device reported invalid CHS sector 0
Apr  5 23:14:12 schultz kernel: ata1: EH complete
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] Unhandled error code
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 25 5c ce be 00 01 00 00
Apr  5 23:14:12 schultz kernel: end_request: I/O error, dev sda, sector 626839230
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] Unhandled error code
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 25 5c cd be 00 01 00 00
Apr  5 23:14:12 schultz kernel: end_request: I/O error, dev sda, sector 626838974
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] Unhandled error code
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr  5 23:14:12 schultz kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 25 5c cd be 00 00 08 00
Apr  5 23:14:12 schultz kernel: end_request: I/O error, dev sda, sector 626838974

После этого пришлось перезагружаться reset-ом.

Rubystar ★★
() автор топика
Ответ на: комментарий от GotF

Кабель поменял. Вроде ошибок пока нет. Спасибо, потестирую.

Rubystar ★★
() автор топика

проверь SMART

smartctl -a /dev/sda
если там ошибок нет/мало/они старые — значит с диском всё впорядке и глючит контроллер.

pupok ★★
()
Ответ на: комментарий от GotF
schultz rubystar # smartctl -a /dev/sda
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Second Generation Serial ATA family
Device Model:     WDC WD5000AAKS-65YGA0
Serial Number:    WD-WCAS85399010
Firmware Version: 12.01C02
User Capacity:    500 107 862 016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Apr  8 07:15:40 2010 YEKST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (13200) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 154) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   253   183   021    Pre-fail  Always       -       2116
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       272
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       17599
 10 Spin_Retry_Count        0x0012   100   100   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       262
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       168
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       275
194 Temperature_Celsius     0x0022   107   096   000    Old_age   Always       -       43
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Rubystar ★★
() автор топика
Ответ на: комментарий от Rubystar

Показатели нормальные. Диск довольно старый. Неплохо бы погонять его на другом компьютере. Если проблем не будет, то, как уже сказали выше, это может быть контроллер.

GotF ★★★★★
()
Ответ на: комментарий от Rubystar

У меня подобная ситуация:

Жесткий: WD1000EADS, 20 сентября купленный и отформатированный в ext3

Контроллер: вспомогательный Promise (вроде) sata378 (2 родных интеловских заняты другими жесткими).

dmesg выхлоп совпадает с твоим. Но был еще симптом: cat /dev/sdc вызывал такие ошибки буквально через несколько секунд после запуска - в начале диска проблема.

4 раза запускал fsck.ext3 -c (с проверкой на бэды). Всегда были помечены одни и теже сектора. Сейчас на жестком 15 гигов свободно, перевел в RO, буду смотреть.

smartctl -a /dev/sdc:

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EADS-00M2B0
Serial Number:    WD-WCAV51085686
Firmware Version: 01.00A01
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Apr 10 15:25:54 2010 MSD
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 (21180) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 244) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x303f)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       67
  3 Spin_Up_Time            0x0027   149   107   021    Pre-fail  Always       -       5541
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       253
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4803
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       248
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       227
193 Load_Cycle_Count        0x0032   190   190   000    Old_age   Always       -       32218
194 Temperature_Celsius     0x0022   100   092   000    Old_age   Always       -       47
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       2
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
Warning: ATA error count 462 inconsistent with error log pointer 1

ATA Error Count: 462 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Все ошибки smartctl начал отображать только после fsck, до этого был такой же вывод как у тебя

Warbozz
()
Ответ на: комментарий от Warbozz

С сегодняшнего дня проблемы усилились. На компьютере, где он стоял загрузочным с него больше не загрузиться (почти :)). Подключил к другому компьютеру - без проблем читаю данные. Мистика. Грешу на контроллер.

Rubystar ★★
() автор топика
Вы не можете добавлять комментарии в эту тему. Тема перемещена в архив.