Unable to unbind GPU from amdgpu
Submitted by wed..@..dex.ru
Assigned to Default DRI bug account
Created attachment 144877
dmesg kernel 5.2.1
Kernel version: 5.2.1
I have two GPUs in my system: integrated Intel and Sapphire Pulse Vega 56.
I boot with Intel as my primary gpu and I use Vega for VFIO (gpu passthrough) and gpu offloading.
What I'm trying to do is to boot with amdgpu driver for Vega and bind it to vfio-pci when I start VM (qemu).
The problem occurs when I try to unbind Vega from amdgpu driver using this command:
echo -n "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/unbind
It results in segfault with following error in dmesg (full dmesg from boot to shutdown is attached):
[drm:amdgpu_pci_remove [amdgpu]] ERROR Device removal is currently not supported outside of fbcon
After that I'm unable to rebind device back to amdgpu or any other driver:
echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/bind
bash: echo: write error: No such device
Also I'm unable to shutdown properly. Shutdown process becomes stuck at some point and only holding the button helps.
I've attached relevant lspci -vvv output before and after attempt to unbind, in case it's useful.
Another thing I've tried is to unbind using kernel 4.19.60 and it just hangs after executing the command. I've attached the log of this attempt (error is different from 5.2.1).
Attachment 144877, "dmesg kernel 5.2.1":