Периодически сыплются в лог (dmesg) сообщения такого содержания:
eb 13 22:29:55 sysresccd kernel: end_request: I/O error, dev cciss/c0d0, sector 0
Feb 13 22:29:55 sysresccd kernel: end_request: I/O error, dev cciss/c0d0, sector 0
Feb 13 22:29:57 sysresccd kernel: end_request: I/O error, dev cciss/c0d0, sector 0
Feb 13 22:29:57 sysresccd kernel: end_request: I/O error, dev cciss/c0d0, sector 0
Feb 13 22:29:58 sysresccd kernel: end_request: I/O error, dev cciss/c0d0, sector 0
Feb 13 22:29:58 sysresccd kernel: end_request: I/O error, dev cciss/c0d0, sector 0
Наблюдаю это с systemrescuecd:
Linux sysresccd 2.6.31.12-std135-amd64 #1 SMP Mon Jan 18 19:19:54 UTC 2010 x86_64 Intel(R) Xeon(TM) CPU 3.40GHz GenuineIntel GNU/Linux
00:00.0 Host bridge [0600]: Intel Corporation E7520 Memory Controller Hub [8086:3590] (rev 0c)
00:02.0 PCI bridge [0604]: Intel Corporation E7525/E7520/E7320 PCI Express Port A [8086:3595] (rev 0c)
00:04.0 PCI bridge [0604]: Intel Corporation E7525/E7520 PCI Express Port B [8086:3597] (rev 0c)
00:05.0 PCI bridge [0604]: Intel Corporation E7520 PCI Express Port B1 [8086:3598] (rev 0c)
00:1d.0 USB Controller [0c03]: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 [8086:24d2] (rev 02)
00:1d.1 USB Controller [0c03]: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 [8086:24d4] (rev 02)
00:1d.2 USB Controller [0c03]: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 [8086:24d7] (rev 02)
00:1d.3 USB Controller [0c03]: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 [8086:24de] (rev 02)
00:1d.7 USB Controller [0c03]: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller [8086:24dd] (rev 02)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev c2)
00:1f.0 ISA bridge [0601]: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge [8086:24d0] (rev 02)
00:1f.1 IDE interface [0101]: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller [8086:24db] (rev 02)
01:03.0 VGA compatible controller [0300]: ATI Technologies Inc Rage XL [1002:4752] (rev 27)
01:04.0 System peripheral [0880]: Compaq Computer Corporation Integrated Lights Out Controller [0e11:b203] (rev 01)
01:04.2 System peripheral [0880]: Compaq Computer Corporation Integrated Lights Out Processor [0e11:b204] (rev 01)
02:00.0 PCI bridge [0604]: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A [8086:0329] (rev 09)
02:00.2 PCI bridge [0604]: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B [8086:032a] (rev 09)
03:01.0 PCI bridge [0604]: IBM PCI-X to PCI-X Bridge [1014:01a7] (rev 03)
03:03.0 SCSI storage controller [0100]: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI [1000:0030] (rev 08)
03:03.1 SCSI storage controller [0100]: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI [1000:0030] (rev 08)
04:04.0 RAID bus controller [0104]: Compaq Computer Corporation Smart Array 64xx [0e11:0046] (rev 01)
07:02.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme BCM5703 Gigabit Ethernet [14e4:16c7] (rev 10)
07:03.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme BCM5703 Gigabit Ethernet [14e4:16c7] (rev 10)
% smartctl -d cciss,0 -a /dev/cciss/c0d0
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: COMPAQ BD1468A4C5 Version: HPB4
Serial number: 3KS5WFYW000097201L2U
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Sat Feb 13 22:40:04 2010 UTC
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature: 30 C
Drive Trip Temperature: 68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 210401724
Blocks received from initiator = 1209101223
Blocks read from cache and sent to initiator = 987843349
Number of read and write commands whose size <= segment size = 210061562
Number of read and write commands whose size > segment size = 15514
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 23829.72
number of minutes until next internal SMART test = 94
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 0.000 0
write: 0 0 0 0 0 0.000 0
Non-medium error count: 3
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 0 - [- - -]
Long (extended) Self Test duration: 2643 seconds [44.0 minutes]
С подобным железом (HP ProLiant ML370) дело имею впервые, так что прошу помощи у более опытных людей.
P.S. И вообще, можно ли диск подключить «в обход» аппаратного RAID-контроллера? А то он всего один и от контроллера всё-равно нет никакого толку. А проблемы есть, например невозможно получить информацию о диске с помощью hdparm...