[Vega 64] Newer Mesa-git revision crash hardware-accelerated apps
Brief summary of the problem:
Opening any hardware-accelerated programs (tested with Chrome and Heroic Games Launcher) fail a couple seconds past launch and lead to a system crash on newer Mesa-git revisions (last known good: 23.2.0_devel.172957.6e5eb0afd3f) with LLVM 17.0.0_r464772.f8a1d021ed34 installed.
Dmesg shows this entry after such a crash:
[ 68.300171] ------------[ cut here ]------------
[ 68.300173] WARNING: CPU: 2 PID: 327 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:598 amdgpu_irq_put+0xc4/0xe0 [amdgpu]
[ 68.300334] Modules linked in: fuse snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common sb_edac snd_hda_intel snd_intel_dspcfg x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp snd_hwdep crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul snd_hda_core ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd snd_pcm cryptd snd_timer igb vfat mei_wdt i2c_i801 snd i2c_smbus lpc_ich soundcore fat mousedev razerkbd(O) acpi_cpufreq usbip_host usbip_core pkcs8_key_parser crypto_user loop zram bpf_preload ip_tables x_tables ext4 crc32c_generic mbcache crc16 jbd2 usbhid amdgpu mfd_core drm_buddy video drm_ttm_helper crc32c_intel ttm i2c_algo_bit drm_display_helper cec xhci_pci xhci_pci_renesas gpu_sched wmi
[ 68.300362] CPU: 2 PID: 327 Comm: kworker/2:1 Tainted: G O 6.3.8-4.1-cachyos-lto #1 8a571a29fbc4266dc4f6067ae1010282fb7d49d9
[ 68.300364] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
[ 68.300365] Workqueue: events drm_mode_rmfb_work_fn
[ 68.300369] RIP: 0010:amdgpu_irq_put+0xc4/0xe0 [amdgpu]
[ 68.300481] Code: f6 89 da ff 10 89 c3 4c 89 e7 4c 89 ee e8 f4 54 dc f8 89 d8 eb 05 b8 fe ff ff ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 31 c0 eb f1 <0f> 0b eb ed 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 68.300483] RSP: 0018:ffff9dfc6036b858 EFLAGS: 00010046
[ 68.300485] RAX: 00000000ffffffea RBX: 0000000000000000 RCX: ffff9dfc60054d40
[ 68.300486] RDX: ffffffffc0d46038 RSI: ffff9e06282665c8 RDI: ffff9e0628260000
[ 68.300486] RBP: 0000000000000000 R08: ffffffffc0c4dd11 R09: 0000000000000000
[ 68.300487] R10: 000000000000002f R11: ffff9e04481fd000 R12: ffff9e0323f7a000
[ 68.300488] R13: ffff9e0628276a68 R14: ffff9e06282665c8 R15: ffff9e0628260010
[ 68.300489] FS: 0000000000000000(0000) GS:ffff9e0abf680000(0000) knlGS:0000000000000000
[ 68.300490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 68.300491] CR2: 00007f9faeffca38 CR3: 000000074600f006 CR4: 00000000001706e0
[ 68.300492] Call Trace:
[ 68.300494] <TASK>
[ 68.300496] ? __warn+0x9e/0x160
[ 68.300498] ? amdgpu_irq_put+0xc4/0xe0 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.300609] ? report_bug+0x112/0x180
[ 68.300613] ? handle_bug+0x3d/0x80
[ 68.300615] ? exc_invalid_op+0x16/0x40
[ 68.300617] ? asm_exc_invalid_op+0x16/0x20
[ 68.300621] ? amdgpu_irq_put+0xc4/0xe0 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.300731] dm_disable_vblank+0xb0/0x180 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.300843] drm_vblank_disable_and_save+0x9e/0x100
[ 68.300847] drm_crtc_vblank_off+0xb7/0x280
[ 68.300850] ? amdgpu_irq_put+0xac/0xe0 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.300961] amdgpu_dm_atomic_commit_tail+0x18f/0x39e0 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.301072] ? bw_calcs+0x1998/0x26a0 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.301184] ? dce112_validate_bandwidth+0x93/0x2a0 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.301294] ? kvmalloc_node+0x23/0xc0
[ 68.301298] ? dc_validate_global_state+0x48c/0x500 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.301409] ? amdgpu_dm_atomic_check+0x1d8d/0x1ec0 [amdgpu e95be992b7cf564ceb16dad4d7e267de8634996a]
[ 68.301520] ? dma_resv_get_fences+0x12d/0x4e0
[ 68.301522] ? dma_resv_get_singleton+0x21/0xe0
[ 68.301524] ? wait_for_completion_interruptible+0x54/0x1a0
[ 68.301526] ? drm_gem_plane_helper_prepare_fb+0x6d/0x1c0
[ 68.301527] ? wait_for_completion_timeout+0x4a/0x1c0
[ 68.301529] ? drm_atomic_helper_swap_state+0x1f1/0x460
[ 68.301531] drm_atomic_helper_commit+0x926/0xd00
[ 68.301534] drm_atomic_commit+0x81/0xa0
[ 68.301537] ? __drm_printfn_seq_file+0x20/0x20
[ 68.301540] drm_framebuffer_remove+0x24b/0x440
[ 68.301542] drm_mode_rmfb_work_fn+0x70/0x80
[ 68.301544] process_one_work+0x147/0x620
[ 68.301548] worker_thread+0x2d0/0x4c0
[ 68.301550] ? drm_self_refresh_helper_update_avg_times+0x120/0x120
[ 68.301552] kthread+0xe9/0x140
[ 68.301554] ? fn_enter+0x80/0x80
[ 68.301556] ret_from_fork+0x1f/0x30
[ 68.301558] </TASK>
[ 68.301559] ---[ end trace 0000000000000000 ]---
Hardware description and System Information:
System: Host: klx99 Kernel: 6.3.8-4.1-cachyos-lto arch: x86_64 bits: 64 Desktop: KDE Plasma v: 5.27.6 Distro: CachyOS Mobo: Lenovo model: X99-TF Gaming v: G368J V1.1 serial: UEFI: American Megatrends v: CX99DE26 date: 10/10/2020 CPU: Info: 18-core model: Intel Xeon E5-2696 v3 bits: 64 type: MT MCP cache: L2: 4.5 MiB Graphics: Device-1: AMD Vega 10 XL/XT [Radeon RX 56/64] driver: amdgpu v: kernel Display: x11 server: X.Org v: 21.1.99 with: Xwayland v: 23.1.2 driver: X: loaded: amdgpu unloaded: modesetting dri: radeonsi gpu: amdgpu resolution: 2560x1440~165Hz API: OpenGL v: 4.6 Mesa 23.2.0-devel (git-6e5eb0afd3) renderer: AMD Radeon RX Vega (vega10 LLVM 17.0.0 DRM 3.52 6.3.8-4.1-cachyos-lto)
How to reproduce the issue:
Any Mesa-git revision as of June 22nd 2023 should reproduce the issue. I carry some in-development patches on top that correspond to the Mesa MR number (except 100.mymesapatch which was taken from Clear Linux).
Last known good Mesa revision: 23.2.0_devel.172957.6e5eb0afd3f Last known good LLVM revision: 17.0.0_r464772.f8a1d021ed34
I've reproduced the failure on Mesa revision 23.2.0_devel.173203.b2ed33fb4d0 (with the patches as uploaded in my repo) compiling Mesa with Clang-17 (same revision as known-good LLVM) and the distro's default GCC 13.1 toolchain.
Attached files:
- Dmesg log dmsg-amdgpu.log.txt