Freeze/crash on login in desktop environment [680M]
Brief summary of the problem:
As soon as I login in my desktop environment (Gnome), and interacts with clickable or hoverable elements (or just wait), the system becomes unresponsive. Sometimes I even have glitches. If I don´t login in the DE and use only ttyX everything works. The problem occurs with Xorg and Wayland.
While trying with different versions of yellow_carp firmwares, I accidentally booted a kernel with no firmware. It made the system stable and I could login and use my computer without feeezes nor crashes. After looking at dmesg, I've concluded that the firmware file causing the problem is /lib/firmware/amdgpu/yellow_carp_pfp.bin. If I remove it and update the initramfs I no longer have glitches/freezes.
I should also note that until a few days ago I did not have this problem. I was running Ubuntu 23.04 for a few months, upgrading the system packages once in a while. But last week, I woke up the laptop like everyday and 30 minutes later I got the freeze, had to reboot and have been stuck with 100% freeze unless I remove the file mentioned above.
Hardware description:
- CPU: AMD Ryzen 5 PRO 6650U
- GPU: [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev d2)
- System Memory: 16Go
- Display(s): internal
- Type of Display Connection: internal
System information:
- Distro name and Version: Ubuntu 23.10 (but it happens on Ubuntu 23.04, I upgraded to see if this solved the problem)
- Kernel version: 6.5.10-x64v3-xanmod1 #0~20231102.g537eb9e SMP PREEMPT_DYNAMIC Thu Nov 2 09:31:54 UTC x86_64 x86_64 x86_64 GNU/Linux
- Custom kernel: xanmod
- AMD official driver version: N/A
- Mesa
ii mesa-va-drivers:amd64 23.2.1-1ubuntu3 amd64 Mesa VA-API video acceleration drivers
ii mesa-vdpau-drivers:amd64 23.2.1-1ubuntu3 amd64 Mesa VDPAU video acceleration drivers
ii mesa-vulkan-drivers:amd64 23.2.1-1ubuntu3 amd64 Mesa Vulkan graphics drivers
Other kernel tested :
- mainline 6.1.11, 6.1.57, 6.2.7, 6.2.16, 6.3.13, 6.4.9, 6.4.16, 6.5.7
- ubuntu 6.2 et 6.5.0
Tested firmware versions : 20231030, 20230919, 20230804, 20230404, 20230117, 20221214... and I think I pretty much gave up on this one.
How to reproduce the issue:
Well I :
- just put back the /lib/firmware/amdgpu/yellow_carp_pfp.bin
- run update-initramfs -u
- reboot
- login in desktop environment
- and the system freezes after a few seconds
Log files (for system lockups / game freezes / crashes)
Dmesg:
amdgpu 0000:33:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:4 pasid:32769, for process Xorg pid 4779 thread Xorg:cs0 pid 4813)
amdgpu 0000:33:00.0: amdgpu: in page starting at address 0x0000800005b6f000 from client 0x1b (UTCL2)
amdgpu 0000:33:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00401431
amdgpu 0000:33:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa)
amdgpu 0000:33:00.0: amdgpu: MORE_FAULTS: 0x1
amdgpu 0000:33:00.0: amdgpu: WALKER_ERROR: 0x0
amdgpu 0000:33:00.0: amdgpu: PERMISSION_FAULTS: 0x3
amdgpu 0000:33:00.0: amdgpu: MAPPING_ERROR: 0x0
amdgpu 0000:33:00.0: amdgpu: RW: 0x0
amdgpu 0000:33:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:4 pasid:32769, for process Xorg pid 4779 thread Xorg:cs0 pid 4813)
amdgpu 0000:33:00.0: amdgpu: in page starting at address 0x000080001bc60000 from client 0x1b (UTCL2)
amdgpu 0000:33:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00401431
amdgpu 0000:33:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa)
amdgpu 0000:33:00.0: amdgpu: MORE_FAULTS: 0x1
amdgpu 0000:33:00.0: amdgpu: WALKER_ERROR: 0x0
amdgpu 0000:33:00.0: amdgpu: PERMISSION_FAULTS: 0x3
amdgpu 0000:33:00.0: amdgpu: MAPPING_ERROR: 0x0
amdgpu 0000:33:00.0: amdgpu: RW: 0x0
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=21745971, emitted seq=21745973
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 4779 thread Xorg:cs0 pid 4813
amdgpu 0000:33:00.0: amdgpu: GPU reset begin!
amdgpu 0000:33:00.0: amdgpu: Guilty job already signaled, skipping HW reset
amdgpu 0000:33:00.0: amdgpu: GPU reset(16) succeeded!
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered