Rebind of GPU fails if GPU is bound before X/Wayland is started and unbound while X/Wayland is running
Brief summary of the problem:
Unbinding of dGPU, passing through to VM, and rebinding to amdgpu on host works, but only if the dGPU is first bound to vfio_pci before starting X/Wayland. If the dGPU is bound to amdgpu before starting X/Wayland and unbound while X/Wayland is running, the amdgpu driver errors out while rebinding (although passthrough to VM can work if rebound to vfio_pci).
It appears that the initial unbind fails silently leaving the driver in a bad state and a stale sysfs file, which blows the driver up upon subsequent rebind.
Hardware description:
- CPU:
Ryzen 5700G
- GPU:
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1)
09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev c8)
- System Memory: 32GB
- Display(s): 1 Dell P2416D attached to dGPU via displayport and iGPU via HDMI
- Type of Display Connection: See above
System information:
- Distro name and Version: Arch Linux
- Kernel version: Linux thor 5.15.10-arch1-1 #1 (closed) SMP PREEMPT Fri, 17 Dec 2021 11:17:37 +0000 x86_64 GNU/Linux
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
- Don't bind dGPU to vfio-pci, or bind to AMD GPU before starting GDM
- Start gdm w/ Xorg or Wayland
-
echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
to unbind amdgpu -
echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
to bind to vfio-pci -
echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
to unbind vfio-pci -
echo 0000:03:00.0 > /sys/bus/pci/drivers/amdgpu/bind
to bind to amdgpu - Observe dmesg failing to rebind GPU, complaining about duplicate sysfs file. Additional attempts to rebind result in even more dmesg errors.
Attached files:
Workaround
- bind dGPU to vfio-pci, and bind to amdgpu after logging in. GPU can be unbound from amdgpu, and rebound w/o problems
Edited by John Pham