Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Interface freeze with kernel 6.11.3 on Fedora Linux 40 (Radeon RX 6600)
Since I updated to kernel 6.11.3, I have been experiencing interface freezes, which require a reboot to resolve. These freezes happen consistently, after every boot, a few minutes after logging into the desktop. I resolved the issue by reverting to kernel 6.10.12 Here are some logs, I hope they will be helpful.
Sounds very similar to what others are experiencing with the amdgpu. Everything lower than 6.10.13 (so 6.10.12 or earlier) seems to be safe. Anything above 6.10.12 seems to be affected. As I've got issues with a system with a 7900XTX and a 6750XT where the 6750XT shows similar logs and the 7900XTX with most crashes just completely freezes.
Yeah, I tried 6.10.14 & 6.11.2 with RX 7600. Both crash as soon as the amdgpu driver gets initialized. The worst part is that I am unable to get any logs using journalctl. Even Alt+SysRq+S fails.
ott 15 21:07:56 fedora kernel: UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8929:26
ott 15 21:07:56 fedora kernel: index 3 is out of range for type 'dc_surface_update [3]'
for me that was enough to crash the system, but I guess randomly it could just destabilize your system and let it live longer just with corrupted memory
but it's unusual that you experience that with any desktop compositor other than Weston, what are you running?
I recall KWin merged a change recently that uses Overlay layer as a fallback Cursor layer under some conditions, maybe that could be relevant
I add further information: I have a notebook with AMD hardware, specifically:AMD Ryzen 5 5500U. On this one, I haven't had any freezes so far. The software configuration is the same: Cosmic + Kernel: Linux fedora-note 6.11.4-300.fc41.x86_64 but on Fedora 41. Perhaps the logs from this notebook, where I haven't experienced any freezes, might be useful.
grep_amdgpu
Update: I only encounter the issue when starting a session with Cosmic; the graphical session crashes after a few seconds. I can't reproduce it when starting the session with GNOME Shell.
The main cosmic-comp developer has mentioned that COSMIC is one of the only compositors widely utilizing overlay planes. It's possible you're not seeing the crash because GNOME Shell isn't using that.
Right, I've sent a bug fix pointing out that the cursor overlay mode uncovered this array-index-out-of-bounds issue here. AFAIU, a major rework is coming, but I've asked for the bug fix first (the right number of surfaces is 4) without success... I've just pinged again in the same ML thread.
Just chiming in to say that I also experience this issue (Framework 13, AMD 7840U).
I applied Zaeem's patch on top of 6.12-rc7, but it didn't help. The only relevant logs captured by journald were:
Nov 14 16:28:02 frmwrk kernel: BUG: unable to handle page fault for address: 000000000174e354Nov 14 16:28:02 frmwrk kernel: #PF: supervisor read access in kernel mode
which seems to indicate that it's not hitting the new logic.
If it helps any, I've noticed the hang happens when cosmic-panel touches / overlaps with Firefox's window.
I have been running your original suggestion to set MAX_SURFACES and MAX_SURFACE_NUM= 4 which prevents the Firefox issue with the tradeoff that some applets crash when the applet menu is opened.
When this happens dmesg doesn't spit anything out, but cosmic-panel complains with the following when the applet crashes:
Does cosmic-panel still complain if MAX_SURFACES = 6 as in this series?
So after applying that patch series the applets behave (no crashing, no complaining). Firefox doesn't, however and I got some logs when it crashed this time, here's the relevant bit: journald
Can you share the outputs of drm_info and # cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log?
It's worth noting here that I'm also applying a patch reverting 338567d1762 for another issue with DSC / bandwidth calc (#3735) for my own QOL. I can revert that to isolate if you think it might be interfering.
I didn't manage to reproduce the index-out-of-bounds on Cosmic yet, but I got this page fault with amd-staging-drm-next kernel with this commit reverted: .
I wonder if the index-out-of-bounds is still present on 6.12-rc7 (without Zaeem's patch) or only the page-fault error.
If so, maybe dmesg shows both many index-out-of-bounds errors and the page-fault as the last one? Can you double check your dmesg?
I've since moved to using 6.12.0 and I can't seem to reproduce the freeze with cosmic-comp commit 7e8cb91 or at least not in the same way. I'll keep using 6.12.0 for now and I'll report back with logs if I see the issue.
I can also switch back to 6.12-rc7 and compare / bisect to determine what might have fixed it if you'd like that info.
Compiling from the "amd-staging-drm-next" branch [1], I don't have any issues.
However, I don't know if it's expected, but from that branch, I have version 6.10.* of the kernel. The problem occurs with kernel 6.11.
I'm not sure if I'm doing something wrong or if this is expected. If you need, let me know what I can do to help with testing. Thanks.
The Fedora kernel: 6.10.10-200.fc40
Compiled kernel: Linux fedora 6.10.0+ #2 SMP PREEMPT_DYNAMIC Tue Nov 26 23:58:43 CET 2024 x86_64 GNU/Linux
Hi @emanuc, thanks for checking it out and reporting!
amd-staging-drm-next is the upstream branch for AMD driver and it's currently kernel 6.10 + all AMD driver changes from that kernel version until now. It doesn't include core kernel branches, but you get almost all AMD driver changes.
I just managed to consistently reproduce the page fault and I think it's causing the interface freeze since the beginning (not the array-index-out-of-bounds).
On kernel Fedora 41, 6.11.4 kernel, I see both array-index-out-of-bounds and the last error is in fact the page fault:
I just sent another proposal to fix this issue, following a different approach since I noticed the page fault is the current cause of the system freeze.