Один из дисков очень часто перемонтируется в ro. Иногда даже при начальном монтировании.
История этого процесса:
# dmesg | grep sdb
[ 2.210392] sd 2:0:0:0: [sdb] 1250263728 512-byte logical blocks: (640 GB/596 GiB)
[ 2.210438] sd 2:0:0:0: [sdb] Write Protect is off
[ 2.210442] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 2.210460] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.686823] sdb: sdb1
[ 2.687051] sd 2:0:0:0: [sdb] Attached SCSI disk
[ 35.311370] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 35.311446] ata3: tag : dhfis dmafis sdbfis sacitve
...
[ 38.546345] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 38.550967] ata3: tag : dhfis dmafis sdbfis sacitve
[ 39.090495] sd 2:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 39.090498] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 39.090514] sd 2:0:0:0: [sdb] Add. Sense: Scsi parity error
[ 39.090519] sd 2:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 08 08 00 01 30 00
[ 39.090526] end_request: I/O error, dev sdb, sector 2056
[ 39.092058] Buffer I/O error on device sdb1, logical block 1
[ 39.093607] lost page write due to I/O error on sdb1
...
[ 39.105388] Buffer I/O error on device sdb1, logical block 10
[ 39.106783] lost page write due to I/O error on sdb1
[ 39.212846] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 39.217022] ata3: tag : dhfis dmafis sdbfis sacitve
...
[ 41.812896] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 41.817033] ata3: tag : dhfis dmafis sdbfis sacitve
[ 42.573877] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[ 126.523884] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 126.523888] ata3: tag : dhfis dmafis sdbfis sacitve
...
[ 129.657289] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 129.657293] ata3: tag : dhfis dmafis sdbfis sacitve
[ 130.180472] sd 2:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 130.180475] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 130.180489] sd 2:0:0:0: [sdb] Add. Sense: Scsi parity error
[ 130.180493] sd 2:0:0:0: [sdb] CDB: Write(10): 2a 00 1c 80 09 00 00 00 10 00
[ 130.180499] end_request: I/O error, dev sdb, sector 478152960
[ 130.180505] Buffer I/O error on device sdb1, logical block 59768864
[ 130.180506] lost page write due to I/O error on sdb1
[ 130.180511] Buffer I/O error on device sdb1, logical block 59768865
[ 130.180513] lost page write due to I/O error on sdb1
[ 140.124085] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 140.124089] ata3: tag : dhfis dmafis sdbfis sacitve
...
[ 143.279760] dhfis 0x1F dmafis 0x1 sdbfis 0x0
[ 143.279764] ata3: tag : dhfis dmafis sdbfis sacitve
[ 143.800386] sd 2:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 143.800389] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 143.800403] sd 2:0:0:0: [sdb] Add. Sense: Scsi parity error
[ 143.800407] sd 2:0:0:0: [sdb] CDB: Write(10): 2a 00 25 44 08 50 00 00 08 00
[ 143.800413] end_request: I/O error, dev sdb, sector 625215568
...
[ 143.800544] sd 2:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 143.800546] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 143.800558] sd 2:0:0:0: [sdb] Add. Sense: Scsi parity error
[ 143.800561] sd 2:0:0:0: [sdb] CDB: Write(10): 2a 00 25 44 08 28 00 00 08 00
[ 143.800566] end_request: I/O error, dev sdb, sector 625215528
[ 143.800577] Aborting journal on device sdb1-8.
[ 143.901953] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 143.901957] ata3: tag : dhfis dmafis sdbfis sacitve
...
[ 147.035314] dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 147.035318] ata3: tag : dhfis dmafis sdbfis sacitve
[ 147.561290] sd 2:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 147.561293] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 147.561309] sd 2:0:0:0: [sdb] Add. Sense: Scsi parity error
[ 147.561313] sd 2:0:0:0: [sdb] CDB: Write(10): 2a 00 25 44 08 00 00 00 08 00
[ 147.561320] end_request: I/O error, dev sdb, sector 625215488
[ 147.561324] Buffer I/O error on device sdb1, logical block 78151680
[ 147.561326] lost page write due to I/O error on sdb1
[ 147.561343] JBD2: I/O error detected when updating journal superblock for sdb1-8.
[ 290.354226] EXT4-fs error (device sdb1): ext4_journal_start_sb:260: Detected aborted journal
[ 290.354232] EXT4-fs (sdb1): Remounting filesystem read-only
Интересно, что mount с этим не согласен...
# mount | grep storage
/dev/sdb1 on /mnt/storage type ext4 (rw)
bindfs on /mnt/storage type fuse.bindfs (rw)
SMART говорит вот что:
# smartctl --all /dev/sdb
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green family
Device Model: WDC WD6400AADS-00M2B0
Serial Number: WD-WCAV57630362
Firmware Version: 01.00A01
User Capacity: 640 135 028 736 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Apr 4 01:52:32 2012 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (15060) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 175) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3037) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 128 103 021 Pre-fail Always - 6558
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 266
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 078 078 000 Old_age Always - 16462
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 262
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 140
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 689002
194 Temperature_Celsius 0x0022 113 096 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 198 000 Old_age Always - 8509
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 14069 -
# 2 Extended offline Interrupted (host reset) 90% 14069 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
До этого проблема достала настолько, что переформатировал этот диск - ну поработал он пару недель и опять все пошло повторятся...
Куда копать?