Proxmox 7 vGPU

Intro

Since a long time i desperately wanted the possibility to split up a consumer grade Nvidia graphics card to pass through to a (Proxmox) VM. Not so long ago a project on Github started under the name of vgpu_unlock (credits Dual Coder) which creates the possibility to do just that. That resulted in this lengthy tutorial because of the use of Proxmox 7 which uses the 5.11+ kernel.

I have written a script that automatically installs the host vGPU driver all by itself.

You can check it out in my new blog post at https://wvthoog.nl/proxmox-7-vgpu-v3/

Requirements

There are a few requirements in order to get vgpu_unlock to work. It only works on specific cards (see list below) and on kernels 5.11 and lower.

Supported GPU’s

Nvidia cardGPU ChipvGPU unlock profile
GTX 900 Series (first gen)GM107 x4Tesla M10
GTX 900 series (second gen)GM204 x2Tesla M60
GTX 1000 SeriesGP102Tesla P40
Titan V, Quadro GV100GV100Tesla V100 16GB
RTX 2000 SeriesTU102Quadro RTX 6000
RTX 3000 SeriesGA10xAmpere is not supported

Older cards like the 900 and 1000 series will create a Tesla card for the VM. That means that we have to use the GRID driver in the VM in order to get this card working. Keep on reading how this can be achieved. If you own a 2000 series card however, you are in luck. Vgpu_unlock will create a Quadro card for the VM, which can use regular Nvidia drivers without any time (licensing) restriction the 900/1000 (Tesla) cards have.

Kernel version

Proxmox and Debian continuously update their packages and kernel. For vgpu_unlock to work we have to stick to kernel version 5.11. Kernel 5.13 is not going to work. To boot kernel version 5.11 we’re going to edit the Grub configuration file.

First add this line to your sources.list in order to make kernel older kernel versions available for installation.

echo 'deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription' | tee -a /etc/apt/sources.list

Then update your packages and upgrade if available

apt update && apt upgrade -y

This will download and install kernel version 5.13. Which is not the kernel we want. To list all the available kernels on your system execute the following command

grep menuentry /boot/grub/grub.cfg

Example output

As you can see kernel version 5.11.22-7 is currently the highest available version of the 5.11 kernel and this is the version i am going to use. So now we have to edit Grub to boot this kernel by default.

nano /etc/default/grub

Replace GRUB_DEFAULT with the following line. Remember to use your own kernel version, otherwise your system will not be able to boot. Also take note of the > character combining the two entries (Advanced options and Kernel version) together

GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve"

Exit and save out of the file and update Grub

update-grub

And restart the server. After the server successfully restarted check if the kernel version is correct

uname -a

Example output

Linux pve 5.11.22-7-pve #1 SMP PVE 5.11.22-12 (Sun, 07 Nov 2021 21:46:36 +0100) x86_64 GNU/Linux

This should mean you’re good to go. Proxmox will stick to the 5.11 kernel and vgpu_unlock will be able to work and remain working.

Next we can install the dependencies required.

Dependencies

Install dependencies for vgpu_unlock

apt update && apt upgrade -y
apt install -y git build-essential pve-headers-`uname -r` dkms jq unzip python3 python3-pip

Install frida

pip3 install frida

Git clone vgpu_unlock

git clone https://github.com/DualCoder/vgpu_unlock
chmod -R +x vgpu_unlock

Enable IOMMU

Configure IOMMU

nano /etc/default/grub

For Intel CPU’s edit this line

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

For AMD CPU’s edit this line

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

Save file and update grub

update-grub

Load VFIO modules at boot

nano /etc/modules

Insert these lines

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Create a couple of files in modprobe.d

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

Update initramfs

update-initramfs -u

Reboot Proxmox

reboot

And verify that IOMMU is enabled

dmesg | grep -e DMAR -e IOMMU

Example output

[    0.954526] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.958348] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.959202] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

Nvidia drivers

Download Nvidia’s vGPU drivers. You need to apply for a trial period to download those driver here. Copy them to the Proxmox server using SCP and unzip.

scp nvidia/NVIDIA-GRID-Linux-KVM-460.32.04-460.32.03-461.09.zip root@192.168.2.10:~
unzip NVIDIA-GRID-Linux-KVM-460.32.04-460.32.03-461.09.zip

Since i’m using Proxmox 7 based on Debian Bullseye with kernel version 5.11 i needed to patch a couple of files.

Extract the Nvidia driver

chmod +x NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run
./NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run -x

Download these patches and apply to the source

wget https://raw.githubusercontent.com/rupansh/vgpu_unlock_5.12/master/twelve.patch
wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nv-caps.patch
wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nv-frontend.patch
wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nvidia-vgpu-vfio.patch

cd NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm

patch -p0 < ../twelve.patch
patch -p0 < ../nv-caps.patch
patch -p0 < ../nv-frontend.patch
patch -p0 < ../nvidia-vgpu-vfio.patch

Install the driver

chmod +x nvidia-installer
./nvidia-installer --dkms

Edit the Nvidia vGPU system services files

nano /lib/systemd/system/nvidia-vgpud.service

# replace ExecStart with:
ExecStart=/root/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpud
nano /lib/systemd/system/nvidia-vgpu-mgr.service

# replace ExecStart with:
ExecStart=/root/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpu-mgr

Reload the services daemon

systemctl daemon-reload

Edit some Nvidia driver source files

nano /usr/src/nvidia-460.32.04/nvidia/os-interface.c

Under #include “nv-time.h” insert this line

#include "/root/vgpu_unlock/vgpu_unlock_hooks.c"

Edit the next file

nano /usr/src/nvidia-460.32.04/nvidia/nvidia.Kbuild

Add this line at the bottom of the file

ldflags-y += -T /root/vgpu_unlock/kern.ld

Remove the original Nvidia dkms module

dkms remove -m nvidia -v 460.32.04 --all

Rebuild the module

dkms install -m nvidia -v 460.32.04

Reboot

reboot

And verify that vGPU is running

dmesg|grep -i vgpu

You should see something like this

[   31.222948] vGPU unlock patch applied.

Launch nvidia-smi and write down your device PCI Bus Address(es)

As you can see i have two GPU’s installed. The first one is a GTX 1060 with id 00000000:2B:00.0. The second one is a GTX 1050Ti with id 00000000:2C:00.0 and will be passed through directly to a Windows 10 VM since vgpu_unlock only supports one GPU. (the first one it sees)

Create the vGPU’s

Mdevctl

Install mdevctl

apt install mdevctl

List the different types your GPU offers

mdevctl types

Which gives the following (example) output among many other types

'''  
   nvidia-47
    Available instances: 12
    Device API: vfio-pci
    Name: GRID P40-2Q
    Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
  nvidia-48
    Available instances: 8
    Device API: vfio-pci
    Name: GRID P40-3Q
    Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
  nvidia-49
    Available instances: 6
    Device API: vfio-pci
    Name: GRID P40-4Q
    Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
'''

This will list all kind of different names and available instances. I’ve decided to break up my GTX 1060 into 2 vGPU’s (instances) of type Q so i went for the nvidia-48 profile. Type Q is the profile we want since it supports all the features the GPU offers like NVENC, CUDA, etc. The type you select depends on the amount of VRAM the GPU has available. My GTX 1060 has 6GB so i had to choose a profile that consumed 3GB (2 vGPU’s x 3GB = 6GB total) of memory. Also take notice of the ‘name’. This will be the card the vGPU driver creates. So in my case that would be a (Tesla) P40-3Q

Generate UUID’s

Now it’s time to generate some UUID’s for mdevctl to use in the creation of the vGPU’s. As i previously mentioned i’m going to split up the GTX 1060 into 2 vGPU’s so i needed to create 2 UUID’s.

uuid -n 2
0b5fd3fb-2389-4a22-ba70-52969a26b9d5
924cfc77-4803-4118-a5f4-bd8de589ddf6

Create vGPU profiles

At this point i recommend opening a text editor to edit the commands for creating the different vGPU’s.

Grab the PCI ID of the GPU

dmesg|grep nvidia

Example output

[    8.056238] nvidia 0000:2b:00.0: enabling device (0000 -> 0003)
[    8.056325] nvidia 0000:2b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    8.172280] nvidia 0000:2c:00.0: enabling device (0000 -> 0003)
[    8.172348] nvidia 0000:2c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none

From nvidia-smi i could determine that 0000:2c:00.0 is my GTX 1050 Ti and 0000:2b:00.0 is the GTX 1060. Write the PCI Bus ID down of the card you intend to use.

Putting the final commands together we now have everything we need for each vGPU profile.

  • UUID
  • PCI Bus ID
  • vGPU Profile

This results (in my case) in the following commands. Two vGPU profiles with an unique UUID on PCI parent Bus ID 0000:2c:00.0 of type nvidia-48.

mdevctl start -u 0b5fd3fb-2389-4a22-ba70-52969a26b9d5 -p 0000:2c:00.0 -t nvidia-48
mdevctl start -u 924cfc77-4803-4118-a5f4-bd8de589ddf6 -p 0000:2c:00.0 -t nvidia-48

Define the profiles so that they will be persistent after reboot

mdevctl define -a -u 0b5fd3fb-2389-4a22-ba70-52969a26b9d5
mdevctl define -a -u 924cfc77-4803-4118-a5f4-bd8de589ddf6

Verify the profiles are created correctly

mdevctl list

Example output

924cfc77-4803-4118-a5f4-bd8de589ddf6 0000:2c:00.0 nvidia-48 (defined)
0b5fd3fb-2389-4a22-ba70-52969a26b9d5 0000:2b:00.0 nvidia-48 (defined)

That’s it. Now we’ve got 2 mediated devices ready to be used by a Proxmox VM

Assign vGPU to VM

Ubuntu 20.04

Thus far we’ve gotten the mdev part setup, now it’s time to create a new VM (or use an existing one) of type q35 and assign it a vGPU. Create the VM and edit the conf file.

nano /etc/pve/qemu-server/100.conf

Please review the following line

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/[UUID]' -uuid 00000000-0000-0000-0000-000000000[ID-of-VM]

You need to change two values. The first one is the UUID of one of the mdev devices we previously created. The second one is the ID of the VM. A complete line would look something like this

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/0b5fd3fb-2389-4a22-ba70-52969a26b9d5' -uuid 00000000-0000-0000-0000-000000000100

Save and exit out of the file.

Then we can launch the VM using the web interface or command line

qm start 100

You can ignore the following warning

warning: vfio 0b5fd3fb-2389-4a22-ba70-52969a26b9d5: Could not enable error recovery for the device

Log into the VM and verify that the card is recognized.

In Ubuntu 20.04 VM this is (lspci)

00:02.0 VGA compatible controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)

It’s time to install the driver on the VM. You can find this driver in the ZIP file we downloaded at the beginning of this tutorial. I’d recommend not to deviate much from the version installed on the host machine. So i have 460.32.04 installed on the host machine (Proxmox) and going to install 460.91.03 in the VM. Download NVIDIA-GRID-Linux-KVM-460.91.03-462.96.zip from Nvidia’s website and unzip. (or from Google Cloud) Inside you will find the file NVIDIA-Linux-x86_64-460.91.03-grid.run

*** As kernel versions on the client side more frequently get above version 5.11 i’d advise to download the latest GRID driver from Google since they can build on newer kernel versions. Just select the highest available version for the installed host (Proxmox) driver. So i have the 460 driver installed on Proxmox. Using the highest available 460.x drivers from Google in my client VM’s (which is currently 460.106.00)

chmod +x NVIDIA-Linux-x86_64-460.91.03-grid.run
./NVIDIA-Linux-x86_64-460.91.03-grid.run --dkms

Once installed. Reboot the VM

reboot

When succesful you should see that the driver is loaded in dmesg and lsmod. For a final check run nvidia-smi

And there you have it. A P40-Q3 vGPU profile with CUDA version 11.2 on a consumer grade Nvidia GTX 1060.

Not so fast

Since we’re using the GRID driver which has been crippled by time restrictions due to licensing fees, we need to append some arguments when loading the nvidia driver to bypass these limitations. (credits neggles) What this does is set the “Unlicensed Unrestricted Time” to 1 day. After that you have to reload the nvidia driver (module) or reboot the VM.

echo 'options nvidia NVreg_RegistryDwords="UnlicensedUnrestrictedStateTimeout=0x5A0;UnlicensedRestricted1StateTimeout=0x5A0"' | sudo tee /etc/modprobe.d/nvidia.conf

sudo update-initramfs -u

sudo reboot

Windows 10

In Windows the process is more or less the same. Once again create a VM or use an existing one. This time we’re going to insert the second vGPU we’ve created. Edit the second (Windows) Proxmox VM

nano /etc/pve/qemu-server/101.conf

And insert the following line. Please keep in mind that i’m using my own UUID and VM-ID’s here

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/924cfc77-4803-4118-a5f4-bd8de589ddf6' -uuid 00000000-0000-0000-0000-000000000101

Save and exit out of the file.

Launch the VM using the web interface or command line

qm start 101

When the Windows VM boots up we open the NVIDIA-GRID-Linux-KVM-460.91.03-462.96.zip file once more but this time extract the 462.96_grid_win10_server2016_server2019_64bit_international.exe Windows driver. And run the file.

Reboot and consult Device Manager if there are any problems

Which there aren’t any. After that you could use Remote Desktop Connection (with RemoteFX) or even better use Looking Glass or Parsec to fully utilize the newly created vGPU.

Hold your horses

Again since we’re using the GRID driver we need to install vGPU_LicenseBypass to bypass time limitations of the driver. What this script does is set the unlicensed allowed time to 1 day and get rid of license notification messages. Just double click the bat file and allow Administrator access. (credits Krutav Shah)

Conclusion

Hopefully you will find this tutorial informative. It cost me a bit of time to get to this point where it functions the way i think it should in Proxmox. An alternative method would we to rewrite the the Mdev PCI ID in the Proxmox config and load the Nvidia Quadro driver in the VM. But i didn’t have any success going that route.

PayPal

If you like my work, please consider donating