Коллеги, всем привет! Вынес я себе мозг знатно, уже идей не осталось. Есть новый сервант DELL PowerEdge R630, и в нем же торчит два NVME накопителя - INTEL DC P3605 HHHL. Поверх накопителей накатан mirror ZFS. Казалось бы все ок, норм, кайф и как по учебнику, но это было бы очень просто. Внутри гостевой машины наблюдается катастрофическое падение производительности файловой системы. Если тестирование пула на голом железе даёт ~ от 12к iops до 90к (sync=disabled) то в виртуалке всё это падает в 10 раз а то и больше (60-5к iops). Перебрал все варианты, менял контроллеры в гостевых машинах, игрался с размерами блоков etc… Я бы понимал такие падения если бы запускался на гражданских носителях. Но тут эентерпрайз левел. Мейби кто то сталкивался с этим траблом?
Исходные данные:
Гипервизор:
- Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
- 256 гигов какой то оперативы
- pve-manager/6.4-15/af7986e6 (running kernel: 5.4.203-1-pve)
Создание ZFS:
zpool create -o ashift=12 nvtank mirror /dev/nvme0n1 /dev/nvme1n1
Тест везде используется один и тот же:
fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randrw --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting
Тест ZFS на голом железе (sync=disabled)
TEST: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [m(1)][50.0%][r=418MiB/s,w=416MiB/s][r=107k,w=106k IOPS][eta 00m:07s]
Jobs: 1 (f=1): [m(1)][92.9%][r=282MiB/s,w=281MiB/s][r=72.2k,w=71.0k IOPS][eta 00m:01s]
Jobs: 1 (f=1): [m(1)][100.0%][r=418MiB/s,w=415MiB/s][r=107k,w=106k IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=28781: Mon Aug 28 14:23:39 2023
read: IOPS=93.6k, BW=366MiB/s (383MB/s)(5133MiB/14040msec)
slat (usec): min=2, max=415, avg= 3.32, stdev= 4.34
clat (nsec): min=523, max=53651, avg=583.63, stdev=495.25
lat (usec): min=2, max=417, avg= 3.96, stdev= 4.41
clat percentiles (nsec):
| 1.00th=[ 532], 5.00th=[ 532], 10.00th=[ 532], 20.00th=[ 532],
| 30.00th=[ 540], 40.00th=[ 540], 50.00th=[ 540], 60.00th=[ 540],
| 70.00th=[ 548], 80.00th=[ 588], 90.00th=[ 684], 95.00th=[ 684],
| 99.00th=[ 892], 99.50th=[ 964], 99.90th=[ 2040], 99.95th=[17024],
| 99.99th=[18304]
bw ( KiB/s): min=164648, max=428672, per=99.94%, avg=374139.11, stdev=74716.19, samples=28
iops : min=41162, max=107168, avg=93534.75, stdev=18679.07, samples=28
write: IOPS=93.1k, BW=364MiB/s (381MB/s)(5107MiB/14040msec); 0 zone resets
slat (usec): min=4, max=577, avg= 5.50, stdev= 4.97
clat (nsec): min=532, max=216252, avg=614.70, stdev=535.96
lat (usec): min=4, max=581, avg= 6.18, stdev= 5.06
clat percentiles (nsec):
| 1.00th=[ 540], 5.00th=[ 548], 10.00th=[ 548], 20.00th=[ 556],
| 30.00th=[ 564], 40.00th=[ 564], 50.00th=[ 572], 60.00th=[ 580],
| 70.00th=[ 588], 80.00th=[ 620], 90.00th=[ 708], 95.00th=[ 732],
| 99.00th=[ 948], 99.50th=[ 1048], 99.90th=[ 2192], 99.95th=[17024],
| 99.99th=[18304]
bw ( KiB/s): min=165336, max=426904, per=99.94%, avg=372271.57, stdev=74077.66, samples=28
iops : min=41334, max=106726, avg=93067.86, stdev=18519.35, samples=28
lat (nsec) : 750=97.62%, 1000=1.84%
lat (usec) : 2=0.43%, 4=0.01%, 10=0.01%, 20=0.08%, 50=0.01%
lat (usec) : 100=0.01%, 250=0.01%
cpu : usr=20.49%, sys=79.44%, ctx=60, majf=0, minf=37
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=1313957,1307483,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=366MiB/s (383MB/s), 366MiB/s-366MiB/s (383MB/s-383MB/s), io=5133MiB (5382MB), run=14040-14040msec
WRITE: bw=364MiB/s (381MB/s), 364MiB/s-364MiB/s (381MB/s-381MB/s), io=5107MiB (5355MB), run=14040-14040msec```
Тет внутри виртуалки
TEST: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [m(1)][11.7%][r=12.1MiB/s,w=12.1MiB/s][r=3095,w=3108 IOPS][eta 00m:53s]
Jobs: 1 (f=1): [m(1)][21.7%][r=12.1MiB/s,w=12.1MiB/s][r=3097,w=3099 IOPS][eta 00m:47s]
Jobs: 1 (f=1): [m(1)][31.7%][r=14.1MiB/s,w=14.1MiB/s][r=3598,w=3622 IOPS][eta 00m:41s]
Jobs: 1 (f=1): [m(1)][41.7%][r=12.9MiB/s,w=12.7MiB/s][r=3296,w=3261 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [m(1)][51.7%][r=9852KiB/s,w=9168KiB/s][r=2463,w=2292 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [m(1)][61.7%][r=10.4MiB/s,w=10.0MiB/s][r=2661,w=2565 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [m(1)][71.7%][r=11.9MiB/s,w=12.1MiB/s][r=3052,w=3088 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [m(1)][81.7%][r=13.3MiB/s,w=13.1MiB/s][r=3392,w=3355 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [m(1)][91.7%][r=12.6MiB/s,w=12.7MiB/s][r=3217,w=3250 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [m(1)][100.0%][r=12.4MiB/s,w=11.8MiB/s][r=3187,w=3032 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=1318: Mon Aug 28 12:03:04 2023
read: IOPS=2887, BW=11.3MiB/s (11.8MB/s)(677MiB/60001msec)
slat (usec): min=3, max=2241, avg=10.77, stdev= 8.60
clat (nsec): min=1767, max=16231k, avg=21005.89, stdev=53925.90
lat (usec): min=6, max=16249, avg=33.02, stdev=56.27
clat percentiles (usec):
| 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[ 3],
| 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 17], 60.00th=[ 28],
| 70.00th=[ 30], 80.00th=[ 32], 90.00th=[ 45], 95.00th=[ 78],
| 99.00th=[ 106], 99.50th=[ 118], 99.90th=[ 149], 99.95th=[ 165],
| 99.99th=[ 404]
bw ( KiB/s): min= 5096, max=15296, per=99.93%, avg=11540.55, stdev=2112.41, samples=119
iops : min= 1274, max= 3824, avg=2885.12, stdev=528.11, samples=119
write: IOPS=2871, BW=11.2MiB/s (11.8MB/s)(673MiB/60001msec)
slat (usec): min=7, max=471, avg=16.02, stdev= 5.90
clat (usec): min=2, max=42852, avg=44.43, stdev=117.43
lat (usec): min=28, max=42880, avg=61.79, stdev=117.92
clat percentiles (usec):
| 1.00th=[ 26], 5.00th=[ 28], 10.00th=[ 29], 20.00th=[ 31],
| 30.00th=[ 32], 40.00th=[ 34], 50.00th=[ 36], 60.00th=[ 39],
| 70.00th=[ 44], 80.00th=[ 51], 90.00th=[ 80], 95.00th=[ 90],
| 99.00th=[ 122], 99.50th=[ 137], 99.90th=[ 178], 99.95th=[ 217],
| 99.99th=[ 1254]
bw ( KiB/s): min= 5136, max=15320, per=99.91%, avg=11475.44, stdev=2083.43, samples=119
iops : min= 1284, max= 3830, avg=2868.84, stdev=520.87, samples=119
lat (usec) : 2=0.34%, 4=24.49%, 10=0.05%, 20=0.84%, 50=60.16%
lat (usec) : 100=11.90%, 250=2.19%, 500=0.02%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=9.66%, sys=16.22%, ctx=919252, majf=0, minf=13
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=173240,172305,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=11.3MiB/s (11.8MB/s), 11.3MiB/s-11.3MiB/s (11.8MB/s-11.8MB/s), io=677MiB (710MB), run=60001-60001msec
WRITE: bw=11.2MiB/s (11.8MB/s), 11.2MiB/s-11.2MiB/s (11.8MB/s-11.8MB/s), io=673MiB (706MB), run=60001-60001msec
Disk stats (read/write):
sda: ios=86374/546386, merge=0/29494, ticks=4084/40624, in_queue=0, util=0.00%
ЗЫ: Есть еще тест внутри виндовой гостевой машины. Что интересно она была развернута в LVM на одном из дисков:
CrystalDiskMark 6.0.2 x64 (C) 2007-2018 hiyohiyo
Crystal Dew World : https://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes
Sequential Read (Q= 32,T= 1) : 7618.642 MB/s
Sequential Write (Q= 32,T= 1) : 7200.250 MB/s
Random Read 4KiB (Q= 8,T= 8) : 673.224 MB/s [ 164361.3 IOPS]
Random Write 4KiB (Q= 8,T= 8) : 735.549 MB/s [ 179577.4 IOPS]
Random Read 4KiB (Q= 32,T= 1) : 328.916 MB/s [ 80301.8 IOPS]
Random Write 4KiB (Q= 32,T= 1) : 371.193 MB/s [ 90623.3 IOPS]
Random Read 4KiB (Q= 1,T= 1) : 51.624 MB/s [ 12603.5 IOPS]
Random Write 4KiB (Q= 1,T= 1) : 45.044 MB/s [ 10997.1 IOPS]
Test : 16384 MiB [C: 36.4% (11.5/31.5 GiB)] (x5) [Interval=5 sec]
Date : 2023/08/25 22:50:15
OS : Windows Server 2016 Server Standard (full installation) [10.0 Build 17763] (x64)
Тестов на ZFS не осталось, но там тоже было падение раз в 10.