LINUX.ORG.RU

Периодические тормоза при чтении с HDD


0

1

В последнее время стали появляться странные паузы при работе с диском: лампочка горит, top показывает близкий к 100% wa, но iotop говорит, что записи и чтения нет. Через полминуты отвисает и дальше работает нормально. Повторяться может с совершенно разной частотой, от раза в <5 минут до раза в час. Происходит при попытке какого-нибудь чтения: Chrome в кэш полез или в окно IDEA переключились.

Перезагружался в восьмерочку, там тоже воспроизводится.

В SMART при этом ничего подозрительного не видно

# smartctl -a /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.10.0-pf] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Travelstar 5K500.B
Device Model:     Hitachi HTS545050B9A300
...
Firmware Version: PB4OC66G
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Wed Jan  1 20:48:13 2014 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  645) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 158) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   095   095   062    Pre-fail  Always       -       917504
  2 Throughput_Performance  0x0005   100   100   040    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   155   155   033    Pre-fail  Always       -       1
  4 Start_Stop_Count        0x0012   099   099   000    Old_age   Always       -       2808
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   056   056   000    Old_age   Always       -       19478
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       820
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       30
193 Load_Cycle_Count        0x0012   097   097   000    Old_age   Always       -       33891
194 Temperature_Celsius     0x0002   141   141   000    Old_age   Always       -       39 (Min/Max 13/50)
196 Reallocated_Event_Count 0x0032   094   094   000    Old_age   Always       -       701
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
223 Load_Retry_Count        0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     19454         -
# 2  Extended offline    Interrupted (host reset)      90%     19451         -
# 3  Short offline       Completed without error       00%     19450         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Собственно, почему такое может быть (или что я упустил в smartctl)? Пора бежать за новым?

что я упустил в smartctl

196 Reallocated_Event_Count 0x0032   094   094   000    Old_age   Always       -       701

И кроме SMART есть ещё dmesg, top, iotop.

Gotf ★★★
()

whdd, проверить поверхность. Или mhdd/victoria...

NiTr0 ★★★★★
()

Gotf,

top показывает около 100%wa, iotop — disk read/write 0 B/s

196 Reallocated_Event_Count 0x0032 094 094 000 Old_age Always - 701

Точно, казалось бы, ремапов не очень много, но это значение растет, за 12 часов стало 704

NiTr0,

whdd, проверить поверхность

Запустил whdd, прочитал весь диск

Получил такое:

<3ms    3311858   
<10ms   485741    
<50ms   15292     
<150ms  2257      
<500ms  331       
>500ms  42        
ERR     0         
TIME    0         
UNC     0         
IDNF    0         
ABRT    0         
AMNF    0

В dmesg за это время посыпались ошибки

[ 7366.536812] sd 0:0:0:0: [sda] Unhandled error code
[ 7366.536819] sd 0:0:0:0: [sda]  
[ 7366.536822] Result: hostbyte=0x00 driverbyte=0x06
[ 7366.536825] sd 0:0:0:0: [sda] CDB: 
[ 7366.536827] cdb[0]=0x2a: 2a 00 31 71 a2 2c 00 01 48 00
[ 7366.536842] end_request: I/O error, dev sda, sector 829530668
[ 7366.536846] Buffer I/O error on device sda8, logical block 2674560
...
[ 7366.536950] EXT4-fs warning (device sda8): ext4_end_bio:286: I/O error writing to inode 1756839 (offset 524288 size 167936 starting block 103691374)
[ 7366.536957] sd 0:0:0:0: [sda] Unhandled error code
[ 7366.536958] sd 0:0:0:0: [sda]  
[ 7366.536959] Result: hostbyte=0x00 driverbyte=0x06
[ 7366.536960] sd 0:0:0:0: [sda] CDB: 
[ 7366.536961] cdb[0]=0x2a: 2a 00 31 71 d3 6c 00 01 10 00
[ 7366.536965] end_request: I/O error, dev sda, sector 829543276
[ 7366.536967] Buffer I/O error on device sda8, logical block 2676136
...
[ 7366.537009] EXT4-fs warning (device sda8): ext4_end_bio:286: I/O error writing to inode 1769193 (offset 2408448 size 139264 starting block 103692943)
[ 7366.537013] sd 0:0:0:0: [sda] Unhandled error code
[ 7366.537014] sd 0:0:0:0: [sda]  
[ 7366.537015] Result: hostbyte=0x00 driverbyte=0x06
[ 7366.537016] sd 0:0:0:0: [sda] CDB: 
[ 7366.537016] cdb[0]=0x2a: 2a 00 31 d7 57 4c 00 00 08 00
[ 7366.537021] end_request: I/O error, dev sda, sector 836196172
[ 7366.537022] Buffer I/O error on device sda8, logical block 3507748
[ 7366.537024] EXT4-fs warning (device sda8): ext4_end_bio:286: I/O error writing to inode 1766648 (offset 49152 size 4096 starting block 104524522)
[ 7366.537028] sd 0:0:0:0: [sda] Unhandled error code
[ 7366.537029] sd 0:0:0:0: [sda]  
[ 7366.537030] Result: hostbyte=0x00 driverbyte=0x06
[ 7366.537031] sd 0:0:0:0: [sda] CDB: 
[ 7366.537032] cdb[0]=0x2a: 2a 00 31 d7 57 bc 00 00 08 00
[ 7366.537036] end_request: I/O error, dev sda, sector 836196284
[ 7366.537037] Buffer I/O error on device sda8, logical block 3507762
[ 7366.537039] EXT4-fs warning (device sda8): ext4_end_bio:286: I/O error writing to inode 1766648 (offset 106496 size 4096 starting block 104524536)
[ 7366.537042] sd 0:0:0:0: [sda] Unhandled error code
[ 7366.537043] sd 0:0:0:0: [sda]  
[ 7366.537044] Result: hostbyte=0x00 driverbyte=0x06
[ 7366.537045] sd 0:0:0:0: [sda] CDB: 
[ 7366.537045] cdb[0]=0x2a: 2a 00 31 d7 58 4c 00 00 08 00
[ 7366.537049] end_request: I/O error, dev sda, sector 836196428
[ 7366.537051] Buffer I/O error on device sda8, logical block 3507780
[ 7366.537053] EXT4-fs warning (device sda8): ext4_end_bio:286: I/O error writing to inode 1766648 (offset 180224 size 4096 starting block 104524554)
....
[ 7366.537302] sd 0:0:0:0: [sda] CDB: 
[ 7366.537303] cdb[0]=0x2a: 2a 00 33 6b a4 64 00 00 08 00
[ 7378.865906] ------------[ cut here ]------------
[ 7378.865921] WARNING: at fs/buffer.c:1120 mark_buffer_dirty+0x75/0x80()
[ 7378.865922] Modules linked in: ipv6 rfcomm bnep fuse easy_slow_down_manager(O) gspca_main nf_nat vboxnetflt(O) vboxnetadp(O) vboxdrv(O) usb_storage uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev usbhid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel coretemp snd_hda_codec btusb bluetooth samsung_backlight(O) samsung_laptop kvm_intel snd_hwdep kvm ehci_pci ehci_hcd fglrx(PO) snd_pcm snd_page_alloc i2c_i801 snd_timer i2c_core sky2 agpgart lib80211_crypt_tkip wl(PO) thermal battery cfg80211 lib80211 fan video backlight ac button acpi_cpufreq mperf processor thermal_sys
[ 7378.865968] CPU: 1 PID: 2684 Comm: jbd2/sda8-8 Tainted: P           O 3.10.0-pf #2
[ 7378.865970] Hardware name: SAMSUNG ELECTRONICS CO., LTD. R540/R580/R780/SA41/E452/E852/R540/R580/R780/SA41/E452/E852, BIOS 03KP.M008.20100927.LDG 09/27/20
[ 7378.865972]  ffffffff815054df 0000000000000000 ffffffff81033c9a ffff880207156478
[ 7378.865974]  ffff88020a29e700 ffff88020e323fd8 ffff880207156478 ffff88020715647a
[ 7378.865976]  ffffffff811149e5 ffff88020c2253f0 ffffffff811ebb49 ffff880210727bbc
[ 7378.865978] Call Trace:
[ 7378.865986]  [<ffffffff815054df>] ? dump_stack+0xd/0x17
[ 7378.865993]  [<ffffffff81033c9a>] ? warn_slowpath_common+0x6a/0xa0
[ 7378.865997]  [<ffffffff811149e5>] ? mark_buffer_dirty+0x75/0x80
[ 7378.866003]  [<ffffffff811ebb49>] ? __jbd2_journal_unfile_buffer+0x9/0x20
[ 7378.866007]  [<ffffffff811ef932>] ? jbd2_journal_commit_transaction+0xf72/0x1800
[ 7378.866011]  [<ffffffff81040ce3>] ? lock_timer_base.isra.27+0x33/0x70
[ 7378.866016]  [<ffffffff811f3080>] ? kjournald2+0xb0/0x240
[ 7378.866020]  [<ffffffff810542d0>] ? finish_wait+0x90/0x90
[ 7378.866022]  [<ffffffff811f2fd0>] ? journal_init_common+0x160/0x160
[ 7378.866024]  [<ffffffff81053d23>] ? kthread+0xb3/0xc0
[ 7378.866028]  [<ffffffff81050000>] ? __task_pid_nr_ns+0x70/0xd0
[ 7378.866029]  [<ffffffff81053c70>] ? kthread_create_on_node+0x120/0x120
[ 7378.866032]  [<ffffffff8150f8ec>] ? ret_from_fork+0x7c/0xb0
[ 7378.866034]  [<ffffffff81053c70>] ? kthread_create_on_node+0x120/0x120
[ 7378.866035] ---[ end trace 0f6075781a00213f ]---
[49774.316311] sd 0:0:0:0: [sda] Unhandled error code
[49774.316315] sd 0:0:0:0: [sda]  
[49774.316318] Result: hostbyte=0x00 driverbyte=0x06
[49774.316319] sd 0:0:0:0: [sda] CDB: 
[49774.316320] cdb[0]=0x2a: 2a 00 33 6b 7c d4 00 00 08 00
[49774.316328] blk_update_request: 14 callbacks suppressed
[49774.316329] end_request: I/O error, dev sda, sector 862682324
[49774.316331] quiet_error: 7 callbacks suppressed
...

Спасибо, пойду за новым, пока окончательно все не посыпалось

BlackHawk
() автор топика
Ответ на: комментарий от BlackHawk

WARNING: at fs/buffer.c:1120 mark_buffer_dirty+0x75/0x80()

От этой вещи тоже тормоза могут быть.

Programmist11180 ★★★
()

Софтбэды.
Обычно означают проблемы с электроникой или просто то, что его встряхнули/уронили недавно. Или просто он работал под углом к земле на коленках.

pekmop1024 ★★★★★
()
Вы не можете добавлять комментарии в эту тему. Тема перемещена в архив.