Есть два iSCSI хранилища, которые реплицируют через DRBD друг на друга LUNы для отказоустойчивости (Pacemaker)
Если делаю какому-либо drbd ресурсу «drbdadm verify <res>», он начинает проверку, но через какое-то рандомное время проверка отваливается по таймауту:
[25658.263180] block drbd0: Starting Online Verify from sector 0
[31424.442758] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 6
[31514.609426] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 6
[36769.957660] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 6
[36779.938477] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 5
[38429.218483] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 6
[38439.199298] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 5
[38449.180123] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 4
[38459.160945] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 3
[38469.141767] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3779] sock_sendmsg time expired, ko = 2
[38478.943765] d-con VM_STORAGE2_1: sock_sendmsg returned -104
[38478.943804] block drbd0: Online Verify reached sector 2773202184
[38478.943842] d-con VM_STORAGE2_1: peer( Primary -> Unknown ) conn( VerifyS -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
[38478.943897] block drbd0: drbd_alloc_pages interrupted!
[38478.943929] d-con VM_STORAGE2_1: error receiving OVReply, e: -12 l: 20!
[38478.947573] d-con VM_STORAGE2_1: meta connection shut down by peer.
[38478.947605] d-con VM_STORAGE2_1: asender terminated
[38478.947633] d-con VM_STORAGE2_1: Terminating drbd_a_VM_STORA
[38478.948676] d-con VM_STORAGE2_1: Connection closed
[38478.948716] d-con VM_STORAGE2_1: conn( BrokenPipe -> Unconnected )
[38478.948746] d-con VM_STORAGE2_1: receiver terminated
Таймауты всякие менял, без толку.
Конфиг:
global {
usage-count no;
}
common {
protocol B;
handlers {
}
startup {
wfc-timeout 10;
}
disk {
c-plan-ahead 0;
al-extents 6433;
resync-rate 400M;
disk-barrier no;
disk-flushes no;
disk-drain yes;
}
net {
sndbuf-size 1024k;
rcvbuf-size 1024k;
max-buffers 8192; # x PAGE_SIZE
max-epoch-size 8192; # x PAGE_SIZE
unplug-watermark 8192;
timeout 100;
ping-int 15;
ping-timeout 60; # x 0.1sec
connect-int 15;
timeout 50; # x 0.1sec
verify-alg sha1;
csums-alg sha1;
data-integrity-alg crc32c;
cram-hmac-alg sha1;
shared-secret "xxx";
use-rle;
}
}
Канал репликации - 4х1Гбит в бонде, линки не падают, вообще никаких проблем. Ядро 3.4.42, drbd 8.4.3 (обновлю до 3.10 и 8.4.4 скоро, но хз поможет ли)