mesa upgrade from 24.2.8 causes unexplainable freezes on amdgpu
After a recent Gentoo upgrade (starting with mesa) I experienced crashes via Chromium. My crashes can be described a freezing screen after which kworker/u32:0+amdgpu-reset-dev causes an extremely high load. Typically from having a YouTube video running or working with SketchUp Web. I have already reverted the firmware back to the release as of 20241110, but the device still crashes.
After which I downgraded to mesa 24.2.8, until which the freeze did not occur anymore (yet).
[ 8332.887809] amdgpu 0000:05:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
[ 8353.217798] amdgpu 0000:05:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[ 8353.881008] amdgpu 0000:05:00.0: amdgpu: Dumping IP State
[ 8368.900873] hrtimer: interrupt took 394179891 ns
[ 8476.863375] INFO: task chrome:disk$0:1047 blocked for more than 122 seconds.
[ 8476.863384] Tainted: P O 6.12.7-gentoo #1
[ 8476.863387] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8476.863388] task:chrome:disk$0 state:D stack:0 pid:1047 tgid:992 ppid:964 flags:0x00004002
[ 8476.863395] Call Trace:
[ 8476.863398] <TASK>
[ 8476.863401] __schedule+0x3e4/0xa40
[ 8476.863410] ? xas_load+0x8/0xc0
[ 8476.863415] schedule+0x26/0xf0
[ 8476.863419] schedule_timeout+0x121/0x150
[ 8476.863423] ? trace_hardirqs_on+0x1d/0x80
[ 8476.863428] ? preempt_count_add+0x69/0xa0
[ 8476.863432] dma_fence_default_wait+0x17d/0x1f0
[ 8476.863438] ? dma_fence_signal+0x50/0x50
[ 8476.863443] dma_fence_wait_timeout+0x104/0x130
[ 8476.863449] amdgpu_vm_fini+0x11a/0x4f0 [amdgpu]
[ 8476.863569] ? idr_destroy+0x72/0xb0
[ 8476.863572] amdgpu_driver_postclose_kms+0x184/0x260 [amdgpu]
[ 8476.863673] drm_file_free+0x21e/0x270 [drm]
[ 8476.863689] drm_release+0xa6/0x100 [drm]
[ 8476.863704] __fput+0xd6/0x290
[ 8476.863710] task_work_run+0x55/0x90
[ 8476.863714] do_exit+0x2c3/0x9e0
[ 8476.863717] ? __futex_wake_mark+0x50/0x50
[ 8476.863722] do_group_exit+0x2c/0x80
[ 8476.863724] get_signal+0x83f/0x8d0
[ 8476.863728] arch_do_signal_or_restart+0x2a/0x200
[ 8476.863732] ? do_futex+0xc7/0x190
[ 8476.863735] ? __x64_sys_futex+0x108/0x1c0
[ 8476.863737] ? _raw_spin_unlock_irq+0x18/0x40
[ 8476.863740] ? get_signal+0x762/0x8d0
[ 8476.863743] syscall_exit_to_user_mode+0xde/0x150
[ 8476.863747] do_syscall_64+0x85/0x120
[ 8476.863749] ? xfd_validate_state+0x1e/0x80
[ 8476.863752] ? restore_fpregs_from_fpstate+0x38/0xc0
[ 8476.863756] ? switch_fpu_return+0x55/0xd0
[ 8476.863759] ? trace_hardirqs_on_prepare+0x1d/0x80
[ 8476.863762] ? syscall_exit_to_user_mode+0x6e/0x150
[ 8476.863765] ? do_syscall_64+0x85/0x120
[ 8476.863767] ? sched_setaffinity+0x17b/0x210
[ 8476.863771] ? fpregs_assert_state_consistent+0x21/0x50
[ 8476.863775] ? trace_hardirqs_on_prepare+0x1d/0x80
[ 8476.863777] ? syscall_exit_to_user_mode+0x6e/0x150
[ 8476.863780] ? do_syscall_64+0x85/0x120
[ 8476.863782] ? trace_hardirqs_on_prepare+0x1d/0x80
[ 8476.863785] entry_SYSCALL_64_after_hwframe+0x55/0x5d
[ 8476.863789] RIP: 0033:0x7fa22f5540e5
[ 8476.863791] RSP: 002b:00007fa2134fb8d0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[ 8476.863794] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007fa22f5540e5
[ 8476.863797] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00000bac00d9c0a8
[ 8476.863798] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[ 8476.863800] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 8476.863801] R13: 0000000000000000 R14: 00000bac00d9c050 R15: 00000bac00d9c0a8
[ 8476.863804] </TASK>
Kernel: Linux thinkpad 6.12.7-gentoo 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c4)
Started to fail with mesa 24.3.x, 24.2.8 works.