amdgpu dies randomly on wx4100, when attaching inputs (amdgpu.dc=1)
Brief summary of the problem:
The amdgpu driver often crashes when plugging an input.
I tested this now on purpose with 'amdgpu.dc=1' by slowly plugging and unplugging an input connector while I wait for the output to stabilize between each cycle, and still the issue reproduced after a dozen (or so) tries. (It only happens when I plug the connector, and never happens when I unplug it)
Then I unloaded the amdgpu driver and loaded it again with dc=0. This does sort of work but takes a lot of time. The dmesg output is attached (amdgpu_dc1_plug_bug.txt)
I did try to increase the number of tries in dm_helpers_read_local_edid, to something silly like 1000, but no luck.
I also tried to remove the code below the 'Abort detection for non-DP connectors if we have no EDID' Also no luck.
This bug pretty much makes it impossible to use the card daily as is since I do connect/disconnect monitors often, especially due to VFIO usage.
- I found out that running without the new DC framework (amdgpu.dc=0) solves issue 1 completely (but costs HDMI sound - HDMI sound only works with amdgpu.dc=1)
- CPU: 3970X
- GPU: WX4100
- System Memory: 128GB
- Display(s): Two 1080P displays + 1080P televison
- Type of Diplay Connection: 3 HDMI connections, via DP->HDMI adapters.
- Distro name and Version: Fedora 32
- Kernel version: 5.10.0,
- Custom kernel: compiled from mainline git
- AMD package version: from distro (AMD Radeon (TM) Pro WX 4100 (POLARIS11, DRM 3.40.0, 5.10.0.stable, LLVM 10.0.1))
How to reproduce the issue:
Connect/disconnect outputs (either physically or via a HDMI switch), after a while screen goes black and errors shown in the attached logs show up. Happens on all 3 outputs I have.
The first error is [drm:dc_link_detect_helper [amdgpu]] ERROR No EDID read.
- Dmesg logamdgpu_dc1_plug_bug.txt