[bisected] driver crash in 5.15-rc1 on Radeon WX4100 when in power saving mode
Brief summary of the problem:
When resuming from a power saving mode (activated earlier by a screensaver), there is a crash in the driver, the display shows garbage (or stays "off"). It first appeared in 5.15-rc1 build and then got reproduced it with the rc[234]. Nothing like that was observed in 5.13.x or 5.14.0.
A workaround is to use "amdgpu.runpm=0" on the kernel parameter line.
...
Sep 17 09:22:50 talos.danny.cz kernel: amdgpu 0000:01:00.0: refused to change power state from D0 to D3hot
Sep 17 09:22:51 talos.danny.cz kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 17 09:22:51 talos.danny.cz kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 17 09:22:52 talos.danny.cz kernel: [drm] VCE initialized successfully.
Sep 17 09:23:08 talos.danny.cz kernel: amdgpu 0000:01:00.0: refused to change power state from D0 to D3hot
Sep 17 09:23:11 talos.danny.cz kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 17 09:23:11 talos.danny.cz kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 17 09:23:11 talos.danny.cz kernel: [drm] VCE initialized successfully.
Sep 17 09:23:11 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma0 (-22).
Sep 17 09:23:11 talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Sep 17 09:23:11 talos.danny.cz kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Sep 17 09:23:11 talos.danny.cz kernel: [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-22).
Sep 17 09:23:11 talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Sep 17 09:23:11 talos.danny.cz kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Sep 17 09:23:11 talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Sep 17 09:23:11 talos.danny.cz kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
...
The "refused to change power state" messages seem to be harmless(?). Unfortunately I haven't found yet, what exactly triggers the crash.
I have started the bisecting process at the amd-drm-next-5.15-2021-09-01
tag hoping it should be caused by a change in the drm-next-5.15
branch between v5.14-rc3
tag and "today".
Hardware description:
- CPU: IBM Power9
- GPU: Radeon Pro WX 4100
- System Memory: 64GB
- Display(s): Dell U2412M
- Type of Display Connection: DP
System information:
- Distro name and Version: Fedora 34
- Kernel version: kernel-5.15.0-0.rc1.12.fc36.ppc64le
- Custom kernel: N/A
- AMD package version: N/A
How to reproduce the issue:
- activate screensaver and let the monitor enter power saving mode
- wait for an unknown amount of time
- leave the power saving mode by moving a mouse/pressing a key