Proxmox 7 vGPU – v2

Intro

This is a complete rewrite of my previous tutorial on how to get vGPU’s working with a consumer grade Nvidia GPU in Proxmox 7. A lot has changed since dual coder initial release of his vgpu_unlock code and things got a lot simpler if you have a compatible Nvidia GPU supported by driver 460.73.01. If your GPU is not supported by this driver you’d have to patch and build the driver yourself.

I have written a script that automatically installs the host vGPU driver all by itself.

You can check it out in my new blog post at https://wvthoog.nl/proxmox-7-vgpu-v3/

Ready ? Let’s go !

Supported GPU’s

Nvidia cardGPU ChipvGPU unlock profile
GTX 900 Series (first gen)GM107 x4Tesla M10
GTX 900 series (second gen)GM204 x2Tesla M60
GTX 1000 SeriesGP102/104/106Tesla P40
Titan V, Quadro GV100GV100Tesla V100 16GB
RTX 2000 SeriesTU102/104Quadro RTX 6000
RTX 3000 SeriesGA10xAmpere is not supported

I’m focusing in this tutorial on consumer grade GPU’s like GTX and RTX series. From what I’ve read it should also be possible to set it up on business orientated GPU’s like the Tesla M40 which have vGPU support build in. (credits PolloLoco)

The first step is setting up the basic stuff in Proxmox

Dependencies

Comment out the pve-enterprise repository

sed -e '/deb/ s/^#*/#/' -i /etc/apt/sources.list.d/pve-enterprise.list

Install dependencies for vgpu_unlock

apt update && apt upgrade -y
apt install -y git build-essential pve-headers-`uname -r` dkms jq cargo mdevctl unzip uuid

Enable IOMMU

Configure IOMMU

nano /etc/default/grub

For Intel CPU’s edit this line

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

For AMD CPU’s edit this line

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

Save file and update grub

update-grub

Load VFIO modules at boot

/etc/modules-load.d/modules.conf

Insert these lines

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Create a couple of files in modprobe.d

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

Do you have two or more GPU’s in your system ? Then you need to pass through all other cards you don’t intent to use with vgpu_unlock as it only supports vGPU’s on one GPU. Expand to see how

First retrieve all the Nvidia cards in your system

lspci|grep -i nvidia

Example output

root@pve:~# lspci|grep -i nvidia
27:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
27:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
28:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] (rev a1)
28:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
28:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
28:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)

You can see that i have a GTX 1060 6GB and a RTX 2070 Super 8GB in my system. I'm going to pass through the GTX 1060 6GB directly to a Windows 11 VM. For that i need all the PCI ID's of that device. (VGA and Audio controller) As you can see this card is located at PCI Bus 27:00.

lspci -n -s 27:00

When i run this command it will return all the associated PCI ID's i need to pass through

Example output

root@pve:~# lspci -n -s 27:00
27:00.0 0300: 10de:1c03 (rev a1)
27:00.1 0403: 10de:10f1 (rev a1)

Now i have all the relevant information i'm going to create a file in modprobe.d

nano /etc/modprobe.d/vfio.conf

And insert this line

options vfio-pci ids=10de:1c03,10de:10f1

Update initramfs

update-initramfs -u

Reboot Proxmox

reboot

And verify that IOMMU is enabled

dmesg | grep -e DMAR -e IOMMU

Example output

[    0.954526] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.958348] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.959202] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

vgpu_unlock

Git clone vgpu_unlock

git clone https://github.com/DualCoder/vgpu_unlock
chmod -R +x vgpu_unlock

The easy way

If you’re in luck and your GPU is supported and doesn’t trow any error messages when booting up a vGPU we can do it the easy way. Which means installing the merged and patched driver from Erin Allison. I’ve tested this on my RTX 2070 Super and GTX 1060 which worked right out of the box.

Nvidia Driver

Download the merged and patched Nvidia vGPU (460.73.01) driver here and install. (credits Erin Allison)

chmod +x NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run
./NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run --dkms

vgpu_unlock-rs

Git clone vgpu_unlock-rs (credits Matt Bilker)

git clone https://github.com/mbilker/vgpu_unlock-rs.git

Cargo build vgpu_unlock-rs

cd vgpu_unlock-rs
cargo build --release

Copy the shared library over to /lib/nvidia/

cp target/release/libvgpu_unlock_rs.so /lib/nvidia/libvgpu_unlock_rs.so

And reboot

reboot

If everything went according to plan, you should now have a working vGPU. Please consult journalctl for any errors

journalctl -u nvidia-vgpud.service
journalctl -u nvidia-vgpu-mgr.service

So no errors ? And nvidia-smi reports a working GPU ?

Good, proceed to Mdevctl to create the vGPU profiles.

The hard way

So the easy way didn’t work for you. Not to worry, patching the driver yourself and setting up the services isn’t that hard. Please keep in mind that only drivers up until 460.73.01 are reported to be working correctly. Versions released after 460.73.01 build correctly on the host (Proxmox) but have huge stability issues to the point that the VM’s are not usable at all.

In an order to get the 2060 12GB version (TU106) running I’ve tested driver version 460.107 which builds correctly on the host (Proxmox) and recognizes the card. vGPU hooks are also working but when firing up the VM the log gets flooded with XID 43 and NVlog messages.

So please stay below version 460.73.01 when patching the driver yourself. Driver 460.107 used here is only used in a test environment.

Nvidia driver

Download Nvidia’s vGPU drivers. You need to apply for a trial period to download those driver here. Copy them to the Proxmox server using SCP and unzip.

scp nvidia/NVIDIA-GRID-Linux-KVM-460.107-460.106.00-463.15.zip root@192.168.2.10:~
unzip NVIDIA-GRID-Linux-KVM-460.107-460.106.00-463.15.zip

For Proxmox 7 (based on Debian Bullseye) with kernel version 5.15 i needed to patch a couple of files.

Extract the Nvidia driver

chmod +x NVIDIA-Linux-x86_64-460.107-vgpu-kvm.run
./NVIDIA-Linux-x86_64-460.107-vgpu-kvm.run -x

Kernel version

vgpu_unlock has been reported to work with kernel versions pve-kernel-5.11 and pve-kernel-5.15. (pve-kernel-5.13 has issues)

Patching

Kernel versionUse patch
pve-kernel-5.11twelve.patch
pve-kernel-5.15proxmox515.patch

Download either twelve.patch or proxmox515.patch which is applicable to your setup and apply

Proxmox515.patch is a patch I’ve created by merging twelve and fourteen patch and removing the drm related lines.

wget wget https://raw.githubusercontent.com/rupansh/vgpu_unlock_5.12/master/twelve.patch

cd NVIDIA-Linux-x86_64-460.107-vgpu-kvm/

patch -p0 < ../twelve.patch

wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/proxmox515.patch

cd NVIDIA-Linux-x86_64-460.107-vgpu-kvm/

patch -p0 < ../proxmox515.patch

Install the driver. Which hopefully won’t give you any errors. (tested on 460.107) If the build fails analyse the make.log file located in /var/lib/dkms/nvidia/<version>/build/make.log for any hints why it failed.

chmod +x nvidia-installer
./nvidia-installer --dkms

vgpu_unlock-rs

Git clone vgpu_unlock-rs (credits Matt Bilker)

git clone https://github.com/mbilker/vgpu_unlock-rs.git

Cargo build vgpu_unlock-rs

cd vgpu_unlock-rs
cargo build --release

Copy the shared library over to /lib/nvidia/

cp target/release/libvgpu_unlock_rs.so /lib/nvidia/libvgpu_unlock_rs.so

Systemctl

Edit the two Nvidia service files. Start with nvidia-vgpud

sed -i '/^ExecStopPost=.*/a Environment=LD_PRELOAD=/usr/lib/nvidia/libvgpu_unlock_rs.so\nEnvironment=__RM_NO_VERSION_CHECK=1' /lib/systemd/system/nvidia-vgpud.service

Do the same for nvidia-vgpu-mgr

sed -i '/^ExecStopPost=.*/a Environment=LD_PRELOAD=/usr/lib/nvidia/libvgpu_unlock_rs.so\nEnvironment=__RM_NO_VERSION_CHECK=1' /lib/systemd/system/nvidia-vgpu-mgr.service

Reload the systemctl daemon and enable both services just to be sure they’re enabled

systemctl daemon-reload

systemctl enable nvidia-vgpud.service
systemctl enable nvidia-vgpu-mgr.service

Reboot Proxmox

reboot

If everything went according to plan, you should now have a working vGPU.

Launch nvidia-smi to verify.

Please consult journalctl for any errors

journalctl -u nvidia-vgpud.service
journalctl -u nvidia-vgpu-mgr.service

If there are no error we can proceed to creating the mediated devices (mdev’s)

Mdevctl

So the vGPU drivers load correctly without any errors. Then we can proceed to create a vGPU for each VM.

vGPU types

The first step we are going to take in creating a vGPU is list all the different types the GPU offers by executing mdevctl types in the command line (CLI)

Here you see a lot of different types your GPU offers and are split up into 4 distinct types

TypeIntended purpose
AVirtual Applications (vApps)
BVirtual Desktops (vPC)
CAI/Machine Learning/Training (vCS or vWS)
QVirtual Workstations (vWS)

The type Q profile is most likely the type you want to use since it enables the possibility to fully utilize the GPU using a remote desktop (eg Parsec). The next step is selecting the right Q profile for your GPU. This is highly dependent on the available VRAM your GPU offers. So my RTX 2070 Super has 8GB of VRAM and i want to create 4 vGPU’s. Then i would choose a profile that has 2GB of VRAM. (8GB / 4 vGPU’s = 2GB). Which would be the nvidia-257 (RTX6000-2Q) profile

Mdev creation

Now we have selected the profile we want we can assign it to a VM. There a three different ways to assign a vGPU (mdev) to your VM. All three of them need an UUID to function correctly.

Mdev creationDriver
Assign Mdev through the Proxmox Web GUIManually assign only a VM UUID. Will require a time restricted GRID driver in the VM
Assign Mdev through CLI Manually assign both Mdev and VM UUID. Will require a time restricted GRID driver in the VM
Assign a spoofed Mdev through CLIManually assign both Mdev and VM UUID and PCI ID. Will require an unlicensed Quadro driver in the VM. (Seems to be working only in Windows)

Generate Mdev UUID’s

This step is not necesarry if you choose to assign the Mdev through the Proxmox Web GUI. You can skip to section Assign Mdev through the Proxmox Web GUI directly.

If you choose to assign a Mdev or spoofed Mdev through CLI we need to generate a couple of extra UUID’s for mdevctl to use in the creation of the vGPU’s. I’m going to split up the RTX 2070 Super into 4 vGPU’s so i needed to create 4 UUID’s.

uuid -n 4
33fac024-b107-11ec-828d-23fa8d25415b
33fac100-b107-11ec-828e-ab18326737a3
33fac132-b107-11ec-828f-b325d98dab10
33fac15a-b107-11ec-8290-2f3f6ae84054

The -n parameter indicates how many uuid’s are created

Create vGPU profiles

At this point i recommend opening a text editor to edit the commands for creating the different vGPU’s.

Grab the PCI ID of the GPU using nvidia-smi

Omit the first 4 zero’s from the PCI ID. 00000000:28:00.0 -> 0000:28:00.0

Putting the final commands together we now have everything we need for each vGPU profile.

  • UUID
  • PCI Bus ID
  • vGPU Profile

This results (in my case) in the following commands. Four vGPU profiles with an unique UUID on PCI parent Bus ID 0000:28:00.0 of type nvidia-257

mdevctl start -u 33fac024-b107-11ec-828d-23fa8d25415b -p 0000:28:00.0 -t nvidia-257
mdevctl start -u 33fac100-b107-11ec-828e-ab18326737a3 -p 0000:28:00.0 -t nvidia-257
mdevctl start -u 33fac132-b107-11ec-828f-b325d98dab10 -p 0000:28:00.0 -t nvidia-257
mdevctl start -u 33fac15a-b107-11ec-8290-2f3f6ae84054 -p 0000:28:00.0 -t nvidia-257

Define the profiles so that they will be persistent after reboot

mdevctl define -a -u 33fac024-b107-11ec-828d-23fa8d25415b
mdevctl define -a -u 33fac100-b107-11ec-828e-ab18326737a3
mdevctl define -a -u 33fac132-b107-11ec-828f-b325d98dab10
mdevctl define -a -u 33fac15a-b107-11ec-8290-2f3f6ae84054

Verify the profiles are created correctly

mdevctl list

Example output

33fac15a-b107-11ec-8290-2f3f6ae84054 0000:28:00.0 nvidia-257 (defined)
33fac024-b107-11ec-828d-23fa8d25415b 0000:28:00.0 nvidia-257 (defined)
33fac100-b107-11ec-828e-ab18326737a3 0000:28:00.0 nvidia-257 (defined)
33fac132-b107-11ec-828f-b325d98dab10 0000:28:00.0 nvidia-257 (defined)

vGPU assigment

Assign the UUID to your Proxmox vGPU VM. (Replace with your own config file)

nano /etc/pve/qemu-server/100.conf

Add this line. Replace with your own VM ID

args: -uuid 00000000-0000-0000-0000-000000000100

Save and exit out of the file

Login to Promox web GUI and navigate to the VM you want to assign a vGPU.

  • Click on Hardware
  • Click on Add
  • Click on PCI Device
  • Select your NVidia card from the Device drop down list
  • Select on the MDEV (vGPU) profile from the MDev Type drop down list
  • Select Primary GPU and PCI-Express
  • Click on Add

Assign the UUID to your Proxmox vGPU VM. (Replace with your own config file)

nano /etc/pve/qemu-server/100.conf

Add this line. Replace with your own UUID and VM ID

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/33fac024-b107-11ec-828d-23fa8d25415b' -uuid 00000000-0000-0000-0000-000000000100

Save and exit out of the file

Assign the UUID to your Proxmox vGPU VM. (Replace with your own config file)

nano /etc/pve/qemu-server/100.conf

If you're using an GTX 1050/60/70/80 you can spoof it to a Quadro P2200. Add this line. Replace with your own UUID and VM ID

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/33fac024-b107-11ec-828d-23fa8d25415b,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x1c31' -uuid 00000000-0000-0000-0000-000000000100

If you're using a RTX 2050/60/70/80 you can spoof it to a Quadro RTX6000. Add this line. Replace with your own UUID and VM ID

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/33fac024-b107-11ec-828d-23fa8d25415b,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x1e30,x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x12ba' -uuid 00000000-0000-0000-0000-000000000100

Save and exit out of the file

This will create a Quadro device in your VM which does not require the licensed GRID driver, but instead uses an unlicensed regular Quadro driver. Search the Nvidia drivers page for Quadro P2200 or RTX6000 to install the appropriate drivers

This ONLY works in Windows. Haven't been able to get this working in (Ubuntu) Linux

Create a Profile Override

Optionally you could create a profile override for vGPU. With such an override you could manually define the maximum resolution, whether or not to enable CUDA and disable the frame limiter for each profile. When left untouched vGPU will default to it’s original settings created by the Nvidia vGPU driver.

mkdir /etc/vgpu_unlock/ && nano /etc/vgpu_unlock/profile_override.toml

Edit these settings to your liking

[profile.nvidia-257]
num_displays = 1
display_width = 1920
display_height = 1080
max_pixels = 2073600
cuda_enabled = 1
frl_enabled = 0
  • profile = profile created with mdev
  • max_pixel = width * height
  • frl (frame limiter) 0 = is disabled
  • cuda_enabled = 1

Nvidia-smi vGPU wrapper

Download the nvidia-smi vGPU wrapper script (credits Erin Allison)

wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nvidia-smi

Rename the original nvidia-smi executable

mv /usr/bin/nvidia-smi /usr/bin/nvidia-smi.orig

Copy the wrapper script over to /usr/bin/ and make it executable

cp nvidia-smi /usr/bin/
chmod +x /usr/bin/nvidia-smi

Now when you execute nvidia-smi vgpu you’ll see something like this

VM driver

Ubuntu 20.04

Launch the Ubuntu VM using the web interface or command line

qm start 100

and start of by verifying the GPU has been created. In Ubuntu this is: lspci -nnk|grep -i nvidia

Then it’s time to install the driver in the VM. You can find this driver in the ZIP file we downloaded at the beginning of this tutorial. I’d recommend not to deviate much from the version installed on the host machine. So i have 460.73.01 installed on the host machine (Proxmox) and going to install 460.91.03 in the VM. Download NVIDIA-GRID-Linux-KVM-460.91.03-462.96.zip from Nvidia’s website and unzip. (or from Google Cloud) Inside you will find the file NVIDIA-Linux-x86_64-460.91.03-grid.run

*** As kernel versions on the client side more frequently get above version 5.11 i’d advise to download the latest GRID driver from Google since they can build on newer kernel versions without patching. Just select the highest available version for the installed host (Proxmox) driver. So i have the 460 driver installed on Proxmox. Using the highest available 460.x driver from Google in my client VM’s (which is currently 460.106.00)

chmod +x NVIDIA-Linux-x86_64-460.106.00-grid.run
./NVIDIA-Linux-x86_64-460.106.00-grid.run --dkms

Blacklist Nouveau

echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf

Update ramdisk

sudo update-initramfs -u

And reboot

sudo reboot

When successful you should see that the driver is loaded in dmesg and lsmod. For a final check run nvidia-smi

And there you have it. A RTX6000-2Q vGPU profile with CUDA version 11.2 on a consumer grade Nvidia RTX 2070 Super.

Not so fast

Since we’re using the GRID driver which has been crippled by time restrictions due to licensing fees, we need to append some arguments when loading the nvidia driver to bypass these limitations. (credits neggles) What this does is set the “Unlicensed Unrestricted Time” to 1 day. After that you have to reload the nvidia driver (module) or reboot the VM.

echo 'options nvidia NVreg_RegistryDwords="UnlicensedUnrestrictedStateTimeout=0x5A0;UnlicensedRestricted1StateTimeout=0x5A0"' | sudo tee /etc/modprobe.d/nvidia.conf

sudo update-initramfs -u

sudo reboot

Windows 10

Launch the Windows VM using the web interface or command line

qm start 101

When the Windows VM boots up we have to choose which driver to install. The GRID (non-spoofed) or the Quadro (spoofed) driver. Have to admit, that when i installed the Quadro driver and switched back to a non-spoofed profile (GRID vGPU) it picked up the card anyway. The official procedure would be to open the NVIDIA-GRID-Linux-KVM-460.91.03-462.96.zip file and extract the 462.96_grid_win10_server2016_server2019_64bit_international.exe Windows GRID driver. (or download the latest GRID driver from Google) And run the file.

For installing the Quadro drivers, please visit Nvdia’s website and download a 46x.x driver version.

If you have any old Nvidia drivers installed first remove them using DDU Uninstaller when booting Windows up into safe mode.

Reboot and consult Device Manager, Task Manager and GPU-Z if there are any problems.

Which there aren’t any. After that you could use Remote Desktop Connection (RemoteFX with GFX) or even better use Parsec or Looking Glass to fully utilize the newly created vGPU.

Hold your horses

If you didn’t spoof your vGPU PCI ID and installed the GRID driver you need to install vGPU_LicenseBypass to bypass time limitations of the driver. What this script does is set the unlicensed allowed time to 1 day and get rid of license notification messages. Just double click the bat file and allow Administrator access. (credits Krutav Shah)

Performance

Was curious how well the vGPU’s performed compared to the GPU it was running on. So i ran a few Unreal Heaven tests to verify. Ran these tests using only one active vGPU profile. Performance is likely to drop when running multiple vGPU profiles.

GTX 1060 – 6GB

GTX 1060 – 6GB (pass through) vs Tesla P40-3Q (non spoof) vs Quadro P2200 (3Q/spoofed)

Results in table format: (Basic 640×480 – Extreme 1280×720 2xAA windowed)

GPUBasic scoreExtreme score
GTX1060 – 6GB57652072
Tesla P40-3Qnot testednot tested
Quadro P2200 (3Q/Spoofed)56992001

RTX 2070 Super – 8GB

RTX 2070 – 8GB (pass through) vs RTX6000-2Q (non spoof) vs Quadro RTX6000 (2Q/spoofed)

Results in table format: (Basic 640×480 – Extreme 1280×720 2xAA windowed)

GPUBasic scoreExtreme score
RTX2070 Super – 8GBnot testednot tested
Quadro RTX6000-2Q70874307
Quadro RTX6000 (2Q/Spoofed)69804223

Common errors

  • warning: vfio 0b5fd3fb-2389-4a22-ba70-52969a26b9d5: Could not enable error recovery for the device
    • Just a warning which you can ignore
  • Verify all devices in group 24 are bound to vfio-<bus> or pci-stub and not already in use
    • Indicates vGPU host driver issue
      • Use another driver on the host machine (Proxmox)
  • XID 43 detected on physical_chid:0x25, guest_chid:0x1d
    • XID 43 error = GPU stopped processing
      • Use a driver lower than version 470.73.01 on the host machine (Proxmox)
  • Code 43
    • Code 43 means that the driver unloads in the VM. This is happening because the Nvidia driver detects it’s running in a VM
      • Disable default video in hardware tab of the VM
      • Set CPU to host instead of KVM64
  • Failed CC version check. Bailing out! while installing the Nvidia driver
    • append the .run file with –no-cc-version-check

Discord

For any other issues please visit the vgpu_unlock Discord server

PayPal

If you like my work, please consider supporting.