Kwin and other programs cause UTCL2 page faults at amdgpu_ctx_create, possibly related to an use after free and a pageflip timeout in amdgpu
Brief summary of the problem:
Some programs (e.g. xdg-dekstop-portal-kde
, kwin_wayland
, xwaylandvideobridge
, possibly xwayland
) crash at startup and cause the system to hang, and the gpu to display artifacts on screen.
I first discovered the problem when using the desktop portal and attributed the crash so it, but after more developments I found out that other programs (mostly kde 6 ones) crash as well and I think it's amggpu
's fault.
Looking through the logs I found out that when the program crashes, the amdgpu reports a series of UTCL2 page faults, as can be seen in all of the provided logs. After going through multiple logs I found that journalctl can't always reports at what function the program crashed, but when it does it's at amdgpu_ctx_create()
as can be seen in log-kwin.log (same file as below).
In one of the logs (use-after-free.log same file as below) I found that the kernel reports an use after free in amdgpu. I can't tell if this is related or if it's a coincidence.
In some of the logs (pageflip.log same file as below), kwin
reports a page flip timeout and calls it a kernel bug.
Resetting the gpu and soft-rebooting sometime solves the issue, but just restarting sddm is not enough.
Lastly, I tried swapping the gpu for a different (always amd) one but I wasn't able to reproduce the bug. It might be in a vega specific code path?
Hardware description:
- CPU: AMD Ryzen 7 7700X
- GPU: Radeon RX Vega 56
- System Memory: 32 GB
- Display(s): External 2560x1080 LG display
- Type of Display Connection: DP
System information:
- Distro name and Version: Fedora 40 (also happened in 39 with kde 6)
- Kernel version: 6.8.7-300.fc40.x86_64 (also happens in 6.6.28-200.fc40.x86_64 which is an lts kernel)
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
- On fedora 40
- Log into KDE 6.0.3 Wayland
- Some retry may be necessary
- The system hangs and sometimes shows artifacts on screen
Alternatively:
- Try some flatpaks that use desktop portal
- Try to interact with the portal
- Some will cause the same behaviour, and do so consistently
- Examples I found
- On fedora 39 with kde 6.0.4 opening a file received from Telegram-desktop with "open with"
- On fedora 39 with kde 6.0.4 opening a website (e.g. speedtest.net) from a flatpak browser that asks for geolocation
Also I get the same behaviour when exporting something with gimp (non flatpak) on Fedora 40 with kde 6.0.3, I noticed this while trying to crop the image of the artifacts, and the logs with the pageflip error are from one of those crash. But what it means is that it is not limited to KDE programs.
Screenshots/video files
Log files (for system lockups / game freezes / crashes)
Some journalctl -p 3
logs