Rebinding AMDGPU causes initialization errors [R9 290]
Submitted by Robin
Assigned to Default DRI bug account
Link to original bug (#101946)
Description
Created attachment 133068
The script used to reproduce the error.
As I attempted to hotplug my R9 290 for a VM gaming setup, I stumbled on this issue.
The main kern.log error to come up is:
[ 160.013733] [drm:ci_dpm_enable [amdgpu]] ERROR ci_start_dpm failed
[ 160.014134] [drm:amdgpu_device_init [amdgpu]] ERROR hw_init of IP block<amdgpu_powerplay>
failed -22
[ 160.014531] amdgpu 0000:01:00.0: amdgpu_init failed
For my setup I use a Kaby Lake iGPU running i915.
With the R9 290 using vfio-pci / amdgpu.
Ubuntu 17.04 (4.10.0-28-generic).
Mesa 17.1.4 from the padoka stable PPA.
I'm able to reproduce this as follows.
1. Boot with vfio-pci capturing the card and amdgpu blacklisted. Kernel flags:
> intel_iommu=on iommu=pt vfio-pci.ids=1002:67b1,1002:aac8
2. Since I run Gnome3 on Ubuntu 17.04, this will bring me to a wayland greeter which uses my iGPU. Drop to a free TTY, without logging in. This prevents Xorg from responding to the AMD card becoming available.
3. Run the attached script "rebind-amd.sh" as root to bind back and forth between vfio-pci and amdgpu in an infinite loop.
This will:
A. modprobe both drivers to be sure they're loaded.
B. Print information about the driver and card usage.
C. Use the new_id > unbind > bind > remove_id sequence to switch drivers.
What happens is:
vfio-pci -> vfio-pci, Gives no problems, of course.
vfio-pci -> amdgpu, This works and the amdgpu driver initializes the card. Attached monitor(s) start searching for signals.
amdgpu -> vfio-pci, Since no Xorg is using the dGPU this works without problems.
vfio-pci -> amdgpu, Fails to initialize dGPU with the kernel error above.
I've attached the script, the output of the script and the full kern.log.
**Attachment 133068**, "The script used to reproduce the error.":
rebind-amd.sh