Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
[5.2/5.3][drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Seen on both AMD 2400g and 3400g APU's, we find these in dmesg of 5.3.5.:
85.232749] fuse: init (API version 7.31)
[18161.173791] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR Waiting for fences timed out or interrupted!
[18166.037697] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, but soft recovered
[18171.153568] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR Waiting for fences timed out or interrupted!
[18186.261621] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, but soft recovered
or on 5.2.17 sometimes:
[ 7596.392996] sd 11:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
[97954.657336] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR Waiting for fences timed out or interrupted!
[97959.535278] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, signaled seq=2542528, emitted seq=2542531
[97959.535342] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process pid 0 thread pid 0
[97959.535346] [drm] GPU recovery disabled.
Then the graphics stop working and the machine GUI is unusable until reboot.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
5.3.x (13,16) can be quite stable.
5.4.x shows this issue quite quickly though. Much quicker than 5.3.
This is with git mesa, kernel.org and git xf86-video-amdgpu.
This means I can easily repdroduce the issue on 5.4; please send patches or settings to find root cause.
Do you have any specific xorg config? (if yes attach it please)
Could you attach xorg.log and a full dmesg log please?
Do you have any environment variable that might affect the drivers (e.g: AMD_DEBUG)?
Also: what desktop environment are you using (assuming you're using one)?
If gpu reset would work then that would be a workaround.
At the time of the errors we only had a gnome desktop, some terminals, thunderbird and perhaps transmission running. No fancy or weird graphics related software.
The issue of WARNING: CPU: 4 PID: 559 at drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1464 dcn_bw_update_from_pplib+0xa5/0x2e0 [amdgpu] (visible in dmesgs) has been open for months...
Since Linux 5.4 my Ryzen 5 2500U notebook (Vega 8 GPU, HP Envy x360) is completely unstable. To make the notebook usable again I had to go back to Linux-lts 4.19, but there standby is broken, so it is not a good workaround.
Basically I only have to open on Plasma the system settings or Firefox with Webrender enabled. Then the GPU will crash immediately and tries to reset. I do not use any special boot parameter for amdgpu!
Dez 16 22:25:11 pc kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! Dez 16 22:25:11 pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
I can confirm like as said in the archlinux bug report that downgrading to 5.3(.13) kernel doesn't cause hangs anymore (so far, 30 mins in, did webgl benchmarks on Firefox webrender enabled). Using latest 5.4 arch kernel caused hangs just a couple of minutes in using Firefox and seems to be triggered more quickly by opening godot and aseprite.
My hardware is Ryzen 2200G using iGPU only and with temps below 50C when stress testing. Currently on mesa 19.3.1 using RADV.