Всем привет.
Писал в группу OpenNebula в телеграме, никто не ответил. Писал на оф. форуме ON, ответил один человек, но тоже не помогает. Гугл вообще молчит. Единственная дока мне не помогла, хоть и по ней делал https://docs.opennebula.io/6.0/open_cluster_deployment/kvm_node/pci_passthrough.html
OpenNebula 6.
Хост OpenNebula
OS: CentOS 8.3
Kernel: Linux 4.18.0-240.22.1.el8_3.x86_64
IP: 192.168.10.171
Установленные компоненты:
yum -y install opennebula opennebula-sunstone opennebula-fireedge opennebula-gate opennebula-flow opennebula-provision
Хост KVM (добавлен в OpenNebula)
OS: CentOS 8.3
Kernel: Linux 4.18.0-240.22.1.el8_3.x86_64
IP: 192.168.10.169
Установленные компоненты:
yum -y install opennebula-node-kvm
Цель: чтобы ON увидел nVidia RTX 2080
Мои действия на хосте KVM:
[root@kvm-gpu-test ~]# lspci -nn | grep -i nvidia
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
01:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:1ad6] (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:1ad7] (rev a1)
[root@kvm-gpu-test ~]# lspci -vs 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. Device 2489
Flags: bus master, fast devsel, latency 0, IRQ 141
Memory at 95000000 (32-bit, non-prefetchable) [size=16M]
Memory at 80000000 (64-bit, prefetchable) [size=256M]
Memory at 90000000 (64-bit, prefetchable) [size=32M]
I/O ports at 6000 [size=128]
Expansion ROM at 96000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Resizable BAR <?>
Kernel driver in use: nouveau
Kernel modules: nouveau
Сначала добавил vfio-pci:
vi /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7
echo 'vfio-pci' > /etc/modules-load.d/vfio-pci.conf
grubby --update-kernel=ALL --args="rd.driver.blacklist=nouveau nouveau.modeset=0"
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
echo 'blacklist nouveau' > /etc/modprobe.d/nouveau-blacklist.conf
reboot
После ребута kernel driver стал vfio-pci:
[root@kvm-gpu-test ~]# lspci -vs 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. Device 2489
Flags: fast devsel, IRQ 11
Memory at 95000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Memory at 80000000 (64-bit, prefetchable) [disabled] [size=256M]
Memory at 90000000 (64-bit, prefetchable) [disabled] [size=32M]
I/O ports at 6000 [disabled] [size=128]
Expansion ROM at 96000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Resizable BAR <?>
Kernel driver in use: vfio-pci
Kernel modules: nouveau
Затем в /etc/libvirt/qemu.conf добавил:
[root@kvm-gpu-test ~]# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/7/devices/0000:00:17.0
/sys/kernel/iommu_groups/15/devices/0000:03:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:15.1
/sys/kernel/iommu_groups/5/devices/0000:00:15.0
/sys/kernel/iommu_groups/13/devices/0000:00:1f.0
/sys/kernel/iommu_groups/13/devices/0000:00:1f.5
/sys/kernel/iommu_groups/13/devices/0000:00:1f.4
/sys/kernel/iommu_groups/3/devices/0000:00:12.0
/sys/kernel/iommu_groups/11/devices/0000:00:1c.1
**/sys/kernel/iommu_groups/1/devices/0000:00:01.0**
**/sys/kernel/iommu_groups/1/devices/0000:01:00.2**
**/sys/kernel/iommu_groups/1/devices/0000:01:00.0**
**/sys/kernel/iommu_groups/1/devices/0000:01:00.3**
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/8/devices/0000:00:1b.0
/sys/kernel/iommu_groups/16/devices/0000:06:00.0
/sys/kernel/iommu_groups/16/devices/0000:05:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:16.4
/sys/kernel/iommu_groups/6/devices/0000:00:16.0
/sys/kernel/iommu_groups/14/devices/0000:02:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.2
/sys/kernel/iommu_groups/4/devices/0000:00:14.0
/sys/kernel/iommu_groups/12/devices/0000:00:1e.0
/sys/kernel/iommu_groups/2/devices/0000:00:08.0
/sys/kernel/iommu_groups/10/devices/0000:00:1c.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:1b.5
vi /etc/libvirt/qemu.conf
cgroup_device_acl = [
"/dev/null", "/dev/full", "/dev/zero",
"/dev/random", "/dev/urandom",
"/dev/ptmx", "/dev/kvm", "/dev/kqemu",
"/dev/rtc","/dev/hpet", "/dev/vfio/vfio",
"/dev/vfio/1"
]
Затем на хосте OpenNebula в /etc/libvirt/qemu.conf добавил:
vi /var/lib/one/remotes/etc/im/kvm-probes.d/pci.conf
# #
# Unless required by applicable law or agreed to in writing, software #
# distributed under the License is distributed on an "AS IS" BASIS, #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. #
# See the License for the specific language governing permissions and #
# limitations under the License. #
#--------------------------------------------------------------------------- #
# This option specifies the main filters for PCI card monitoring. The format
# is the same as used by lspci to filter on PCI card by vendor:device(:class)
# identification. Several filters can be added as a list, or separated
# by commas. The NULL filter will retrieve all PCI cards.
#
# From lspci help:
# -d [<vendor>]:[<device>][:<class>]
# Show only devices with specified vendor, device and class ID.
# The ID's are given in hexadecimal and may be omitted or given
# as "*", both meaning "any value"#
#
# For example:
# :filter:
# - '10de:*' # all NVIDIA VGA cards
# - '10de:11bf' # only GK104GL [GRID K2]
# - '*:10d3' # only 82574L Gigabit Network cards
# - '8086::0c03' # only Intel USB controllers
#
# or
#
# :filter: '*:*' # all devices
#
# or
#
# :filter: '0:0' # no devices
#
:filter:
- '10de:*'
# The PCI cards list restricted by the :filter option above can be even more
# filtered by the list of exact PCI addresses (bus:device.func).
#
# For example:
# :short_address:
# - '07:00.0'
# - '06:00.0'
#
:short_address: []
# The PCI cards list restricted by the :filter option above can be even more
# filtered by matching the device name against the list of regular expression
# case-insensitive patterns.
#
# For example:
# :device_name:
# - 'Virtual Function'
# - 'Gigabit Network'
# - 'USB.*Host Controller'
# - '^MegaRAID'
#
:device_name:
- 'VGA'
reboot
После ребута хоста KVM, на хосте OpenNebula запустил синхронизацию с хостом KVM:
[oneadmin@on-test ~]$ onehost sync -f
* Adding 192.168.10.169 to upgrade
[========================================] 1/1 192.168.10.169
All hosts updated successfully.
[oneadmin@on-test ~]$ onehost forceupdate
All hosts updated successfully.
Проверяю:
[root@on-test ~]# onehost show 1
HOST 1 INFORMATION
ID : 1
NAME : 192.168.10.169
CLUSTER : default
STATE : MONITORED
IM_MAD : kvm
VM_MAD : kvm
LAST MONITORING TIME : 05/09 17:16:35
HOST SHARES
RUNNING VMS : 0
MEMORY
TOTAL : 31.1G
TOTAL +/- RESERVED : 31.1G
USED (REAL) : 388.8M
USED (ALLOCATED) : 0K
CPU
TOTAL : 400
TOTAL +/- RESERVED : 400
USED (REAL) : 8
USED (ALLOCATED) : 0
MONITORING INFORMATION
ARCH="x86_64"
CPUSPEED="4433"
HOSTNAME="kvm-gpu-test"
HYPERVISOR="kvm"
IM_MAD="kvm"
KVM_CPU_MODEL="Skylake-Client-IBRS"
KVM_CPU_MODELS="486 pentium pentium2 pentium3 pentiumpro coreduo n270 core2duo qemu32 kvm32 cpu64-rhel5 cpu64-rhel6 qemu64 kvm64 Conroe Penryn Nehalem Nehalem-IBRS Westmere Westmere-IBRS SandyBridge SandyBridge-IBRS IvyBridge IvyBridge-IBRS Haswell-noTSX Haswell-noTSX-IBRS Haswell Haswell-IBRS Broadwell-noTSX Broadwell-noTSX-IBRS Broadwell Broadwell-IBRS Skylake-Client Skylake-Client-IBRS Skylake-Client-noTSX-IBRS Skylake-Server Skylake-Server-IBRS Skylake-Server-noTSX-IBRS Cascadelake-Server Cascadelake-Server-noTSX Icelake-Client Icelake-Client-noTSX Icelake-Server Icelake-Server-noTSX Cooperlake athlon phenom Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5 EPYC EPYC-IBPB Dhyana"
KVM_MACHINES="pc-i440fx-rhel7.6.0 pc pc-i440fx-rhel7.0.0 pc-q35-rhel7.6.0 pc-i440fx-rhel7.5.0 pc-q35-rhel8.2.0 q35 pc-i440fx-rhel7.1.0 pc-i440fx-rhel7.2.0 pc-q35-rhel7.3.0 pc-q35-rhel7.4.0 pc-i440fx-rhel7.3.0 pc-q35-rhel8.0.0 pc-i440fx-rhel7.4.0 pc-q35-rhel8.1.0 pc-q35-rhel7.5.0"
MODELNAME="Intel(R) Xeon(R) E-2224 CPU @ 3.40GHz"
RESERVED_CPU=""
RESERVED_MEM=""
VERSION="6.0.0.1"
VM_MAD="kvm"
NUMA NODES
ID CORES USED FREE
0 - - - - 0 4
NUMA MEMORY
NODE_ID TOTAL USED_REAL USED_ALLOCATED FREE
0 31G 0K 0K 0K
NUMA HUGEPAGES
NODE_ID SIZE TOTAL FREE USED
0 2M 0 0 0
0 1024M 0 0 0
WILD VIRTUAL MACHINES
NAME IMPORT_ID CPU MEMORY
VIRTUAL MACHINES
ID USER GROUP NAME STAT CPU MEM HOST TIME
[root@on-test ~]#
В списке должен появиться раздел примерно такого содержания:
PCI DEVICES
VM ADDR TYPE NAME
00:00.0 8086:0a04:0600 Haswell-ULT DRAM Controller
00:02.0 8086:0a16:0300 Haswell-ULT Integrated Graphics Controller
123 00:03.0 8086:0a0c:0403 Haswell-ULT HD Audio Controller
00:14.0 8086:9c31:0c03 8 Series USB xHCI HC
00:16.0 8086:9c3a:0780 8 Series HECI #0
00:1b.0 8086:9c20:0403 8 Series HD Audio Controller
00:1c.0 8086:9c10:0604 8 Series PCI Express Root Port 1
00:1c.2 8086:9c14:0604 8 Series PCI Express Root Port 3
00:1d.0 8086:9c26:0c03 8 Series USB EHCI #1
00:1f.0 8086:9c43:0601 8 Series LPC Controller
00:1f.2 8086:9c03:0106 8 Series SATA Controller 1 [AHCI mode]
00:1f.3 8086:9c22:0c05 8 Series SMBus Controller
02:00.0 8086:08b1:0280 Wireless 7260
Но как видно, ничего нет ((