This article is now obsolete and has been replaced with a new tutorial
Since a long time i desperately wanted the possibility to split up a consumer grade Nvidia graphics card to pass through to a (Proxmox) VM. Not so long ago a project on Github started under the name of vgpu_unlock (credits Dual Coder) which creates the possibility to do just that. That resulted in this lengthy tutorial because of the use of Proxmox 7 which uses the 5.11+ kernel.
Table of Contents
There are a few requirements in order to get vgpu_unlock to work. It only works on specific cards (see list below) and on kernels 5.11 and lower.
|Nvidia card||GPU Chip||vGPU unlock profile|
|GTX 900 Series (first gen)||GM107 x4||Tesla M10|
|GTX 900 series (second gen)||GM204 x2||Tesla M60|
|GTX 1000 Series||GP102||Tesla P40|
|Titan V, Quadro GV100||GV100||Tesla V100 16GB|
|RTX 2000 Series||TU102||Quadro RTX 6000|
|RTX 3000 Series||GA10x||Ampere is not supported|
Older cards like the 900 and 1000 series will create a Tesla card for the VM. That means that we have to use the GRID driver in the VM in order to get this card working. Keep on reading how this can be achieved. If you own a 2000 series card however, you are in luck. Vgpu_unlock will create a Quadro card for the VM, which can use regular Nvidia drivers without any time (licensing) restriction the 900/1000 (Tesla) cards have.
Proxmox and Debian continuously update their packages and kernel. For vgpu_unlock to work we have to stick to kernel version 5.11. Kernel 5.13 is not going to work. To boot kernel version 5.11 we’re going to edit the Grub configuration file.
First add this line to your sources.list in order to make kernel older kernel versions available for installation.
echo 'deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription' | tee -a /etc/apt/sources.list
Then update your packages and upgrade if available
apt update && apt upgrade -y
This will download and install kernel version 5.13. Which is not the kernel we want. To list all the available kernels on your system execute the following command
grep menuentry /boot/grub/grub.cfg
As you can see kernel version 5.11.22-7 is currently the highest available version of the 5.11 kernel and this is the version i am going to use. So now we have to edit Grub to boot this kernel by default.
Replace GRUB_DEFAULT with the following line. Remember to use your own kernel version, otherwise your system will not be able to boot. Also take note of the > character combining the two entries (Advanced options and Kernel version) together
GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve"
Exit and save out of the file and update Grub
And restart the server. After the server successfully restarted check if the kernel version is correct
Linux pve 5.11.22-7-pve #1 SMP PVE 5.11.22-12 (Sun, 07 Nov 2021 21:46:36 +0100) x86_64 GNU/Linux
This should mean you’re good to go. Proxmox will stick to the 5.11 kernel and vgpu_unlock will be able to work and remain working.
Next we can install the dependencies required.
Install dependencies for vgpu_unlock
apt update && apt upgrade -y apt install -y git build-essential pve-headers-`uname -r` dkms jq unzip python3 python3-pip
pip3 install frida
Git clone vgpu_unlock
git clone https://github.com/DualCoder/vgpu_unlock chmod -R +x vgpu_unlock
For Intel CPU’s edit this line
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
For AMD CPU’s edit this line
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
Save file and update grub
Load VFIO modules at boot
Insert these lines
vfio vfio_iommu_type1 vfio_pci vfio_virqfd
Create a couple of files in modprobe.d
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
And verify that IOMMU is enabled
dmesg | grep -e DMAR -e IOMMU
[ 0.954526] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported [ 0.958348] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40 [ 0.959202] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
Download Nvidia’s vGPU drivers. You need to apply for a trial period to download those driver here. Copy them to the Proxmox server using SCP and unzip.
scp nvidia/NVIDIA-GRID-Linux-KVM-460.32.04-460.32.03-461.09.zip email@example.com:~ unzip NVIDIA-GRID-Linux-KVM-460.32.04-460.32.03-461.09.zip
Since i’m using Proxmox 7 based on Debian Bullseye with kernel version 5.11 i needed to patch a couple of files.
Extract the Nvidia driver
chmod +x NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run ./NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run -x
Download these patches and apply to the source
wget https://raw.githubusercontent.com/rupansh/vgpu_unlock_5.12/master/twelve.patch wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nv-caps.patch wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nv-frontend.patch wget https://raw.githubusercontent.com/wvthoog/nvidia_vgpu_proxmox_7/main/nvidia-vgpu-vfio.patch cd NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm patch -p0 < ../twelve.patch patch -p0 < ../nv-caps.patch patch -p0 < ../nv-frontend.patch patch -p0 < ../nvidia-vgpu-vfio.patch
Install the driver
chmod +x nvidia-installer ./nvidia-installer --dkms
Edit the Nvidia vGPU system services files
nano /lib/systemd/system/nvidia-vgpud.service # replace ExecStart with: ExecStart=/root/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpud
nano /lib/systemd/system/nvidia-vgpu-mgr.service # replace ExecStart with: ExecStart=/root/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpu-mgr
Reload the services daemon
Edit some Nvidia driver source files
Under #include “nv-time.h” insert this line
Edit the next file
Add this line at the bottom of the file
ldflags-y += -T /root/vgpu_unlock/kern.ld
Remove the original Nvidia dkms module
dkms remove -m nvidia -v 460.32.04 --all
Rebuild the module
dkms install -m nvidia -v 460.32.04
And verify that vGPU is running
dmesg|grep -i vgpu
You should see something like this
[ 31.222948] vGPU unlock patch applied.
Launch nvidia-smi and write down your device PCI Bus Address(es)
As you can see i have two GPU’s installed. The first one is a GTX 1060 with id 00000000:2B:00.0. The second one is a GTX 1050Ti with id 00000000:2C:00.0 and will be passed through directly to a Windows 10 VM since vgpu_unlock only supports one GPU. (the first one it sees)
Create the vGPU’s
apt install mdevctl
List the different types your GPU offers
Which gives the following (example) output among many other types
''' nvidia-47 Available instances: 12 Device API: vfio-pci Name: GRID P40-2Q Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12 nvidia-48 Available instances: 8 Device API: vfio-pci Name: GRID P40-3Q Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8 nvidia-49 Available instances: 6 Device API: vfio-pci Name: GRID P40-4Q Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6 '''
This will list all kind of different names and available instances. I’ve decided to break up my GTX 1060 into 2 vGPU’s (instances) of type Q so i went for the nvidia-48 profile. Type Q is the profile we want since it supports all the features the GPU offers like NVENC, CUDA, etc. The type you select depends on the amount of VRAM the GPU has available. My GTX 1060 has 6GB so i had to choose a profile that consumed 3GB (2 vGPU’s x 3GB = 6GB total) of memory. Also take notice of the ‘name’. This will be the card the vGPU driver creates. So in my case that would be a (Tesla) P40-3Q
Now it’s time to generate some UUID’s for mdevctl to use in the creation of the vGPU’s. As i previously mentioned i’m going to split up the GTX 1060 into 2 vGPU’s so i needed to create 2 UUID’s.
uuid -n 2 0b5fd3fb-2389-4a22-ba70-52969a26b9d5 924cfc77-4803-4118-a5f4-bd8de589ddf6
Create vGPU profiles
At this point i recommend opening a text editor to edit the commands for creating the different vGPU’s.
Grab the PCI ID of the GPU
[ 8.056238] nvidia 0000:2b:00.0: enabling device (0000 -> 0003) [ 8.056325] nvidia 0000:2b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none [ 8.172280] nvidia 0000:2c:00.0: enabling device (0000 -> 0003) [ 8.172348] nvidia 0000:2c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
From nvidia-smi i could determine that 0000:2c:00.0 is my GTX 1050 Ti and 0000:2b:00.0 is the GTX 1060. Write the PCI Bus ID down of the card you intend to use.
Putting the final commands together we now have everything we need for each vGPU profile.
- PCI Bus ID
- vGPU Profile
This results (in my case) in the following commands. Two vGPU profiles with an unique UUID on PCI parent Bus ID 0000:2c:00.0 of type nvidia-48.
mdevctl start -u 0b5fd3fb-2389-4a22-ba70-52969a26b9d5 -p 0000:2c:00.0 -t nvidia-48 mdevctl start -u 924cfc77-4803-4118-a5f4-bd8de589ddf6 -p 0000:2c:00.0 -t nvidia-48
Define the profiles so that they will be persistent after reboot
mdevctl define -a -u 0b5fd3fb-2389-4a22-ba70-52969a26b9d5 mdevctl define -a -u 924cfc77-4803-4118-a5f4-bd8de589ddf6
Verify the profiles are created correctly
924cfc77-4803-4118-a5f4-bd8de589ddf6 0000:2c:00.0 nvidia-48 (defined) 0b5fd3fb-2389-4a22-ba70-52969a26b9d5 0000:2b:00.0 nvidia-48 (defined)
That’s it. Now we’ve got 2 mediated devices ready to be used by a Proxmox VM
Assign vGPU to VM
Thus far we’ve gotten the mdev part setup, now it’s time to create a new VM (or use an existing one) of type q35 and assign it a vGPU. Create the VM and edit the conf file.
Please review the following line
args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/[UUID]' -uuid 00000000-0000-0000-0000-000000000[ID-of-VM]
You need to change two values. The first one is the UUID of one of the mdev devices we previously created. The second one is the ID of the VM. A complete line would look something like this
args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/0b5fd3fb-2389-4a22-ba70-52969a26b9d5' -uuid 00000000-0000-0000-0000-000000000100
Save and exit out of the file.
Then we can launch the VM using the web interface or command line
qm start 100
You can ignore the following warning
warning: vfio 0b5fd3fb-2389-4a22-ba70-52969a26b9d5: Could not enable error recovery for the device
Log into the VM and verify that the card is recognized.
In Ubuntu 20.04 VM this is (lspci)
00:02.0 VGA compatible controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
It’s time to install the driver on the VM. You can find this driver in the ZIP file we downloaded at the beginning of this tutorial. I’d recommend not to deviate much from the version installed on the host machine. So i have 460.32.04 installed on the host machine (Proxmox) and going to install 460.91.03 in the VM. Download NVIDIA-GRID-Linux-KVM-460.91.03-462.96.zip from Nvidia’s website and unzip. (or from Google Cloud) Inside you will find the file NVIDIA-Linux-x86_64-460.91.03-grid.run
*** As kernel versions on the client side more frequently get above version 5.11 i’d advise to download the latest GRID driver from Google since they can build on newer kernel versions. Just select the highest available version for the installed host (Proxmox) driver. So i have the 460 driver installed on Proxmox. Using the highest available 460.x drivers from Google in my client VM’s (which is currently 460.106.00)
chmod +x NVIDIA-Linux-x86_64-460.91.03-grid.run ./NVIDIA-Linux-x86_64-460.91.03-grid.run --dkms
Once installed. Reboot the VM
When succesful you should see that the driver is loaded in dmesg and lsmod. For a final check run nvidia-smi
And there you have it. A P40-Q3 vGPU profile with CUDA version 11.2 on a consumer grade Nvidia GTX 1060.
Not so fast
Since we’re using the GRID driver which has been crippled by time restrictions due to licensing fees, we need to append some arguments when loading the nvidia driver to bypass these limitations. (credits neggles) What this does is set the “Unlicensed Unrestricted Time” to 1 day. After that you have to reload the nvidia driver (module) or reboot the VM.
echo 'options nvidia NVreg_RegistryDwords="UnlicensedUnrestrictedStateTimeout=0x5A0;UnlicensedRestricted1StateTimeout=0x5A0"' | sudo tee /etc/modprobe.d/nvidia.conf sudo update-initramfs -u sudo reboot
In Windows the process is more or less the same. Once again create a VM or use an existing one. This time we’re going to insert the second vGPU we’ve created. Edit the second (Windows) Proxmox VM
And insert the following line. Please keep in mind that i’m using my own UUID and VM-ID’s here
args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/924cfc77-4803-4118-a5f4-bd8de589ddf6' -uuid 00000000-0000-0000-0000-000000000101
Save and exit out of the file.
Launch the VM using the web interface or command line
qm start 101
When the Windows VM boots up we open the NVIDIA-GRID-Linux-KVM-460.91.03-462.96.zip file once more but this time extract the 462.96_grid_win10_server2016_server2019_64bit_international.exe Windows driver. And run the file.
Reboot and consult Device Manager if there are any problems
Hold your horses
Again since we’re using the GRID driver we need to install vGPU_LicenseBypass to bypass time limitations of the driver. What this script does is set the unlicensed allowed time to 1 day and get rid of license notification messages. Just double click the bat file and allow Administrator access. (credits Krutav Shah)
Hopefully you will find this tutorial informative. It cost me a bit of time to get to this point where it functions the way i think it should in Proxmox. An alternative method would we to rewrite the the Mdev PCI ID in the Proxmox config and load the Nvidia Quadro driver in the VM. But i didn’t have any success going that route.
If you like my work, please consider donating