leaked connector on amdgpu unbind which leads to kernel oops (amdgpu.dc=0)
Brief summary of the problem:
Sometimes when I unbind the amdgpu driver the amdgpu complains about a leaked connector and crashes a bit later on. I haven't yet tracked the combination of things needed to trigger this, but it did happen to me about 3 times already.
I did put a WARN_ON(1) to __drm_connector_put_safe, to see who is the caller that triggers the delayed work that frees the connector when it is too late.
I attached a backtrace with the above WARN_ON and the crash (connector_leak_bug.txt) I also attached the script 'amdgpu_unbind' for the reference that I use to unbind the amdgpu driver.
Hardware description:
- CPU: 3970X
- GPU: WX4100
- System Memory: 128GB
- Display(s): 2 monitors and 1TV
- Type of Diplay Connection: 3 HDMI outputs connected via DP->HDMI adapters
System infomration:
- Distro name and Version: Fedora 32
- Kernel version: 5.10.0
- Custom kernel: compiled from mainline git
- AMD package version: from distro (AMD Radeon (TM) Pro WX 4100 (POLARIS11, DRM 3.40.0, 5.10.0.stable, LLVM 10.0.1))
How to reproduce the issue:
Load/unload the amdgpu driver, each time starting and stoping the X and Gnome GUI. Script that is used attached. Eventually the crash happens.
amdgpu.dc=1 is not stable on this gpu thus this was tested with amdgpu.dc=0, and the leak happens in the non DC code, thus likely DC code is not affected by this bug.