Proxmox 7 vGPU

Intro

Since a long time i desperately wanted the possibility to split up a consumer grade Nvidia graphics card to pass through to a (Proxmox) VM. Not so long ago a project on Github started under the name of vgpu_unlock (credits Dual Coder) which creates the possibility to do just that. That resulted in this lengthy tutorial because of the use of Proxmox 7 which uses the 5.11+ kernel.

Dependencies

Install a few dependencies used for vGPU. Add the Proxmox repository using this command

echo 'deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription' | tee -a /etc/apt/sources.list

*** For Proxmox 6.x (Buster) run the following command instead

echo 'deb http://download.proxmox.com/debian/pve buster pve-no-subscription' | tee -a /etc/apt/sources.list

Run apt update and install dependencies

apt update && apt upgrade -y
apt install -y git build-essential pve-headers dkms jq unzip python3 python3-pip

Install frida

pip3 install frida

Git clone vgpu_unlock

git clone https://github.com/DualCoder/vgpu_unlock
chmod -R +x vgpu_unlock

Enable IOMMU

Configure IOMMU

nano /etc/default/grub

For Intel CPU’s edit this line

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

For AMD CPU’s edit this line

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

Save file and update grub

update-grub

Load VFIO modules at boot

nano /etc/modules

Insert these lines

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Create a couple of files in modprobe.d

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

Update initramfs

update-initramfs -u

Reboot Proxmox

reboot

And verify that IOMMU is enabled

dmesg | grep -e DMAR -e IOMMU

Example output

[    0.954526] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.958348] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.959202] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

Nvidia drivers

Download Nvidia’s vGPU drivers. You need to apply for a trial period to download those driver here. Copy them to the Proxmox server using SCP and unzip.

scp nvidia/NVIDIA-GRID-Linux-KVM-460.32.04-460.32.03-461.09.zip root@192.168.2.10:~
unzip NVIDIA-GRID-Linux-KVM-460.32.04-460.32.03-461.09.zip

Since i’m using Proxmox 7 based on Debian Bullseye with kernel version 5.11 i needed to patch a couple of files.

*** This step is not necessary if you’re on Proxmox 6.x (Buster) You can just do ‘./NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run –dkms’

Extract the Nvidia driver

chmod +x NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run
./NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run -x

Download a couple of patches and apply to the source

wget https://raw.githubusercontent.com/rupansh/vgpu_unlock_5.12/master/twelve.patch
wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nv-caps.patch
wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nv-frontend.patch
wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nvidia-vgpu-vfio.patch

cd NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm

patch -p0 < ../twelve.patch
patch -p0 < ../nv-caps.patch
patch -p0 < ../nv-frontend.patch
patch -p0 < ../nvidia-vpgu-vfio.patch

Install the driver

chmod +x nvidia-installer
./nvidia-installer --dkms

*** Pickup from this point if you’re using Proxmox 6.x (Buster)

Edit the Nvidia vGPU system services files

nano /lib/systemd/system/nvidia-vgpud.service

# replace ExecStart with:
ExecStart=/root/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpud
nano /lib/systemd/system/nvidia-vgpu-mgr.service

# replace ExecStart with:
ExecStart=/root/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpu-mgr

Reload the services daemon

systemctl daemon-reload

Edit some Nvidia driver source files

nano /usr/src/nvidia-460.32.04/nvidia/os-interface.c

Under #include “nv-time.h” insert this line

#include "/root/vgpu_unlock/vgpu_unlock_hooks.c"

Edit the next file

nano /usr/src/nvidia-460.32.04/nvidia/nvidia.Kbuild

Add this line at the bottom of the file

ldflags-y += -T /root/vgpu_unlock/kern.ld

Remove the original Nvidia dkms module

dkms remove -m nvidia -v 460.32.04 --all

Rebuild the module

dkms install -m nvidia -v 460.32.04

Reboot

reboot

And verify that vGPU is running

dmesg|grep -i vgpu

You should see something like this

[   31.222948] vGPU unlock patch applied.

Launch nvidia-smi and write down your device PCI Bus Address(es)

As you can see i have two GPU’s installed. The first one is a GTX 1060 with id 00000000:2B:00.0. The second one is a GTX 1050Ti with id 00000000:2C:00.0 and will be passed through directly to a Windows 10 VM since vgpu_unlock only supports one GPU. (the first one it sees)

Create the vGPU’s

Mdevctl

Install mdevctl

apt install mdevctl

List the different types your GPU offers

mdevctl types

Which gives the following (example) output among many other types

'''  
   nvidia-47
    Available instances: 12
    Device API: vfio-pci
    Name: GRID P40-2Q
    Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
  nvidia-48
    Available instances: 8
    Device API: vfio-pci
    Name: GRID P40-3Q
    Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
  nvidia-49
    Available instances: 6
    Device API: vfio-pci
    Name: GRID P40-4Q
    Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
'''

You can see all kind of different names and available instances. I’ve decided to break up my GTX 1060 into 2 vGPU’s (instances) of type Q so i went for the nvidia-48 profile. Type Q is the profile we want since it supports all the features the GPU offers like NVENC, CUDA, etc. The type you select depends on the amount of VRAM the GPU has available. My GTX 1060 has 6GB so i had to choose a profile that consumed 3GB (2 vGPU’s x 3GB = 6GB total) of memory. Also take notice of the ‘name’. This will be the card the vGPU driver creates. So in my case that would be a (Tesla) P40-3Q

Generate UUID’s

Now it’s time to generate some UUID’s for mdevctl to use in the creation of the vGPU’s. As i previously mentioned i’m going to split up the GTX 1060 into 2 vGPU’s so i needed to create 2 UUID’s. Head over to this site to generate your own or use uuidgen in Linux.

0b5fd3fb-2389-4a22-ba70-52969a26b9d5
924cfc77-4803-4118-a5f4-bd8de589ddf6

Create vGPU profiles

At this point i recommend opening a text editor to edit all of the different commands for creating the different vGPU’s.

Grab the PCI ID of the GPU

dmesg|grep nvidia

Example output

[    8.056238] nvidia 0000:2b:00.0: enabling device (0000 -> 0003)
[    8.056325] nvidia 0000:2b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    8.172280] nvidia 0000:2c:00.0: enabling device (0000 -> 0003)
[    8.172348] nvidia 0000:2c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none

From nvidia-smi i could determine that 0000:2c:00.0 is my GTX 1050 Ti and 0000:2b:00.0 is the GTX 1060. Write the PCI Bus ID down of the card you intend to use.

Putting the final commands together we now have everything we need for every vGPU profile.

  • UUID
  • PCI Bus ID
  • vGPU Profile

This results (in my case) in the following commands. Two vGPU profiles with an unique UUID on PCI parent Bus ID 0000:2c:00.0 of type nvidia-48.

mdevctl start -u 0b5fd3fb-2389-4a22-ba70-52969a26b9d5 -p 0000:2c:00.0 -t nvidia-48
mdevctl start -u 924cfc77-4803-4118-a5f4-bd8de589ddf6 -p 0000:2c:00.0 -t nvidia-48

Define the profiles so that they will be persistent after a reboot

mdevctl define -a -u 0b5fd3fb-2389-4a22-ba70-52969a26b9d5
mdevctl define -a -u 924cfc77-4803-4118-a5f4-bd8de589ddf6

Verify the profiles are created correctly

mdevctl list

Example output

924cfc77-4803-4118-a5f4-bd8de589ddf6 0000:2c:00.0 nvidia-48 (defined)
0b5fd3fb-2389-4a22-ba70-52969a26b9d5 0000:2b:00.0 nvidia-48 (defined)

That’s it. Now we’ve got 2 mediated devices ready to be used by a Proxmox VM

Assign vGPU to VM

Ubuntu 20.04

Thus far we’ve gotten the mdev part setup, now it’s time to create a new VM (or use an existing one) of type q35 and assign it a vGPU. Create the VM and edit the conf file.

nano /etc/pve/qemu-server/100.conf

Please review the following line

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/[UUID]' -uuid 00000000-0000-0000-0000-000000000[ID-of-VM]

You need to change two values. The first one is the UUID of one of the mdev devices we previously created. The second one is the ID of the VM. A complete line would look something like this

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/0b5fd3fb-2389-4a22-ba70-52969a26b9d5' -uuid 00000000-0000-0000-0000-000000000100

Save and exit out of the file.

Then we can launch the VM using the web interface or command line

qm start 100

You can ignore the following warning

warning: vfio 0b5fd3fb-2389-4a22-ba70-52969a26b9d5: Could not enable error recovery for the device

Log into the VM and verify that the card is recognized.

In Ubuntu 20.04 VM this is (lspci)

00:02.0 VGA compatible controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)

It’s time to install the driver on the VM. You can find this driver in the ZIP file we downloaded at the beginning of this tutorial. I’d recommend not to deviate much from the version installed on the host machine. So i have 460.32.04 installed on the host machine (Proxmox) and going to install 460.91.03 in the VM. Download NVIDIA-GRID-Linux-KVM-460.91.03-462.96.zip from Nvidia’s website and unzip. Inside you will find the file NVIDIA-Linux-x86_64-460.91.03-grid.run

chmod +x NVIDIA-Linux-x86_64-460.91.03-grid.run
./NVIDIA-Linux-x86_64-460.91.03-grid.run --dkms

Once installed. Reboot the VM

reboot

When succesful you should see that the driver is loaded successfully in dmesg and lsmod. For a final check run nvidia-smi

And there you have it. A P40-Q3 vGPU profile with CUDA version 11.2 on a consumer grade Nvidia GTX 1060.

Since we’re using the GRID driver which has been crippled by time restrictions due to licensing fees, we need to append some arguments when loading the nvidia driver to bypass these limitations. (credits neggles) What this does is set the “Unlicensed Unrestricted Time” to 1 day. After that you have to reload the nvidia driver (module) or reboot the VM.

echo 'options nvidia NVreg_RegistryDwords="UnlicensedUnrestrictedStateTimeout=0x5A0;UnlicensedRestricted1StateTimeout=0x5A0"' | sudo tee /etc/modprobe.d/nvidia.conf
sudo update-initramfs -u
sudo reboot

Windows 10

In Windows the process is more or less the same. Once again create a VM or use an existing one. This time we’re going to insert the second vGPU we’ve created. Edit the second (Windows) Proxmox VM

nano /etc/pve/qemu-server/101.conf

And insert the following line. Please keep in mind that i’m using my own UUID and VM-ID’s here

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/924cfc77-4803-4118-a5f4-bd8de589ddf6' -uuid 00000000-0000-0000-0000-000000000101

Save and exit out of the file.

Launch the VM using the web interface or command line

qm start 101

When the Windows VM boots up we open the NVIDIA-GRID-Linux-KVM-460.91.03-462.96.zip file once more but this time extract the 462.96_grid_win10_server2016_server2019_64bit_international.exe Windows driver. And run the file.

Reboot and consult Device Manager if there are any problems

Which there aren’t any. After that you could use Remote Desktop Connection (with RemoteFX) or even better use Looking Glass or Parsec to fully utilize the newly created vGPU.

Again since we’re using the GRID driver we need to install vGPU_LicenseBypass to bypass time limitations of the driver. What this script does is set the unlicensed allowed time to 1 day and get rid of license notification messages. Just double click the bat file and allow Administrator access. (credits Krutav Shah)

Conclusion

Hopefully you will find this tutorial informative. It cost me a bit of time to get to this point where it functions the way i think it should in Proxmox. An alternative method would we to rewrite the the Mdev PCI x in the Proxmox config and load the Nvidia Quadro driver in the VM. But i didn’t have any success going that route.

nl_NLDutch