Kernel panic during amdgpu IRQ
Brief summary of the problem:
Triggering gamma changes from wlsunset (using Wayland protocol wlr_gamma_control_unstable_v1
) or hotplugging some outputs occasionally results in kernel panics on my system.
I think this started happening with Kernel 6.7.x
As the system freezes, I had to use a serial console to capture the kernel panic.
Hardware description:
- CPU: AMD Ryzen 7 5800X3D
- GPU: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c1)
- (ASUS TUF Radeon RX 6800 XT)
- System Memory: 2x 16GB DDR4-3200 (G.Skill F4-3200C16-16GIS)
- Display(s): 1x LG UltraGear GN800P-B, 1x Samsung S24E650PL
- Type of Display Connection: DisplayPort, UltraGear uses FreeSync
System information:
- Distro name and Version: NixOS (nixos-unstable channel commit 632751bf0ceeefc74af7a9d2335ea923ad9c831a
- Kernel version:
Linux andromeda 6.7.2-zen1 #1-NixOS ZEN SMP PREEMPT_DYNAMIC Tue Jan 1 00:00:00 UTC 1980 x86_64 GNU/Linux
- Custom kernel: Zen kernel patches (Kernel config at https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/os-specific/linux/kernel/zen-kernels.nix)
- Kernel cmdline:
initrd=\efi\nixos\b6a5z5cs896jh7bd8n3yngig29gnsp9v-initrd-linux-6.7.2-initrd.efi init=/nix/store/adppc324plfdz5gaq2a3na9p118pal0n-nixos-system-andromeda-24.05.20240131.632751b/init amd_pstate=active no_console_suspend console=ttyS0,115200 loglevel=4
How to reproduce the issue:
- Boot into a wlroots window manager
- Start wlsunset, ensuring that it would change the gamma values (changing sunset/sunrise times and temperature values)
- Stop wlsunset
- Repeat from step 2 until crash
Alternatively:
- Turn off a secondary monitor
- Turn on secondary monitor
- Repeat!
Attached files:
Log files (for system lockups / game freezes / crashes)
[ 127.441624] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 127.448579] #PF: supervisor read access in kernel mode
[ 127.453717] #PF: error_code(0x0000) - not-present page
[ 127.458855] PGD 0 P4D 0
[ 127.461393] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 127.465750] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 6.7.2-zen1 #1-NixOS
[ 127.474005] Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING II, BIOS 5201 08/10/2023
[ 127.484417] RIP: 0010:dcn10_set_drr+0xa6/0x100 [amdgpu]
[ 127.489842] Code: 48 8b 80 28 01 00 00 48 85 c0 74 0a 48 8d 74 24 04 e8 7e 77 75 ef 45 85 e4 74 c5 45 85 ed 74 c0 48 8b 03 48 8b b8 f8 00 00 00 <48> 8b 07 48 8b 80 40 01 00 00 48 85 c0 74 a7 48 83
c3 08 ba 02 00
[ 127.508583] RSP: 0018:ffffa54180324dc0 EFLAGS: 00010002
[ 127.513806] RAX: ffff9b001c700248 RBX: ffffa54180324e10 RCX: 0000000000000000
[ 127.520936] RDX: 0000000080010015 RSI: ffff9afe414ece80 RDI: 0000000000000000
[ 127.528059] RBP: ffffa54180324e00 R08: 0000000000000030 R09: 0000000060c0f800
[ 127.535187] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000c0d
[ 127.542308] R13: 0000000000000c0d R14: ffffa54180324e18 R15: ffff9afe42c63180
[ 127.549430] FS: 0000000000000000(0000) GS:ffff9b055e380000(0000) knlGS:0000000000000000
[ 127.557512] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 127.563258] CR2: 0000000000000000 CR3: 00000001344d4000 CR4: 0000000000f50ef0
[ 127.570387] PKRU: 55555554
[ 127.573089] Call Trace:
[ 127.575536] <IRQ>
[ 127.577555] ? __die+0x23/0x70
[ 127.580610] ? page_fault_oops+0x17d/0x4b0
[ 127.584710] ? srso_alias_return_thunk+0x5/0xfbef5
[ 127.589499] ? generic_reg_set_ex+0x128/0x190 [amdgpu]
[ 127.594787] ? exc_page_fault+0x72/0x160
[ 127.598710] ? asm_exc_page_fault+0x26/0x30
[ 127.602894] ? dcn10_set_drr+0xa6/0x100 [amdgpu]
[ 127.607694] dc_stream_adjust_vmin_vmax+0xaa/0xd0 [amdgpu]
[ 127.613332] dm_crtc_high_irq+0x159/0x160 [amdgpu]
[ 127.618296] amdgpu_dm_irq_handler+0x85/0x220 [amdgpu]
[ 127.623605] amdgpu_irq_dispatch+0xd0/0x210 [amdgpu]
[ 127.628711] amdgpu_ih_process+0x83/0x100 [amdgpu]
[ 127.633637] amdgpu_irq_handler+0x23/0x60 [amdgpu]
[ 127.638567] __handle_irq_event_percpu+0x4d/0x1b0
[ 127.643269] handle_irq_event+0x3e/0x80
[ 127.647106] handle_edge_irq+0x9d/0x280
[ 127.650944] __common_interrupt+0x42/0xb0
[ 127.654954] common_interrupt+0x83/0xa0
[ 127.658792] </IRQ>
[ 127.660889] <TASK>
[ 127.662986] asm_common_interrupt+0x26/0x40
[ 127.667169] RIP: 0010:cpuidle_enter_state+0xcd/0x440
[ 127.672131] Code: 09 9d 5f ff e8 94 ef ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 32 b3 5e ff 45 84 ff 0f 85 59 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 86 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[ 127.690862] RSP: 0018:ffffa5418019fe90 EFLAGS: 00000246
[ 127.696088] RAX: ffff9b055e3b2bc0 RBX: ffff9afe41dcb400 RCX: 000000000000001f
[ 127.703216] RDX: 0000000000000003 RSI: 0000000025a5a65f RDI: 0000000000000000
[ 127.710339] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000000000014e
[ 127.717459] R10: 0000000000000018 R11: ffff9b055e3b1664 R12: ffffffffb11b44e0
[ 127.724581] R13: 0000001dac1b538b R14: 0000000000000002 R15: 0000000000000000
[ 127.731709] cpuidle_enter+0x2d/0x40
[ 127.735284] do_idle+0x1da/0x230
[ 127.738516] cpu_startup_entry+0x2a/0x30
[ 127.742439] start_secondary+0x11e/0x140
[ 127.746364] secondary_startup_64_no_verify+0x18f/0x19b
[ 127.751590] </TASK>
[ 127.753778] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq rfcomm af_packet xt_CHECKSUM xt_MASQUERADE ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat cmac algif_hash algif_skcipher af_alg bnep msr nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_intel_dspcfg edac_core btusb intel_rapl_msr snd_intel_sdw_acpi snd_usb_audio intel_rapl_common btrtl snd_hda_codec crc32_pclmul eeepc_wmi btintel polyval_clmulni polyval_generic gf128mul asus_wmi ghash_clmulni_intel btbcm sha512_ssse3 snd_usbmidi_lib btmtk sha512_generic battery uvcvideo snd_hda_core snd_rawmidi sha256_ssse3 ledtrig_audio sha1_ssse3 snd_hwdep snd_seq_device videobuf2_vmalloc cfg80211 bluetooth asus_wmi_sensors aesni_intel sparse_keymap snd_pcm igb uvc videobuf2_memops platform_profile crypto_simd cryptd snd_timer sp5100_tco videobuf2_v4l2 i8042 ecdh_generic ptp snd watchdog ecc pps_core rapl videobuf2_common rfkill wmi_bmof mxm_wmi libaes k10temp dca i2c_piix4 soundcore joydev
[ 127.753838] input_leds evdev mousedev mac_hid gpio_amdpt gpio_generic tiny_power_button button xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat sch_fq_codel nf_tables libcrc32c uinput hid_xpadneo(O) ff_memless wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel atkbd libps2 serio vivaldi_fmap loop tun tap macvlan bridge stp llc v4l2loopback(O) videodev mc led_class kvm_amd ccp kvm irqbypass fuse efi_pstore configfs nfnetlink efivarfs dmi_sysfs ip_tables x_tables dm_mod dax hid_generic usbhid hid ext4 sd_mod crc32c_generic crc16 mbcache jbd2 amdgpu ahci libahci xhci_pci xhci_pci_renesas libata nvme xhci_hcd nvme_core t10_pi usbcore scsi_mod crc64_rocksoft crc64 crc_t10dif crc32c_intel crct10dif_generic crct10dif_pclmul usb_common tpm_crb scsi_common crct10dif_common rtc_cmos tpm_tis tpm_tis_core i2c_algo_bit drm_ttm_helper ttm agpgart video wmi
[ 127.843671] drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper drm_kms_helper drm backlight firmware_class tpm rng_core autofs4
[ 127.946730] CR2: 0000000000000000
[ 127.950040] ---[ end trace 0000000000000000 ]---
[ 128.100411] RIP: 0010:dcn10_set_drr+0xa6/0x100 [amdgpu]
[ 128.105984] Code: 48 8b 80 28 01 00 00 48 85 c0 74 0a 48 8d 74 24 04 e8 7e 77 75 ef 45 85 e4 74 c5 45 85 ed 74 c0 48 8b 03 48 8b b8 f8 00 00 00 <48> 8b 07 48 8b 80 40 01 00 00 48 85 c0 74 a7 48 83 c3 08 ba 02 00
[ 128.124721] RSP: 0018:ffffa54180324dc0 EFLAGS: 00010002
[ 128.129947] RAX: ffff9b001c700248 RBX: ffffa54180324e10 RCX: 0000000000000000
[ 128.137076] RDX: 0000000080010015 RSI: ffff9afe414ece80 RDI: 0000000000000000
[ 128.144209] RBP: ffffa54180324e00 R08: 0000000000000030 R09: 0000000060c0f800
[ 128.151337] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000c0d
[ 128.158472] R13: 0000000000000c0d R14: ffffa54180324e18 R15: ffff9afe42c63180
[ 128.165602] FS: 0000000000000000(0000) GS:ffff9b055e380000(0000) knlGS:0000000000000000
[ 128.173687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 128.179432] CR2: 0000000000000000 CR3: 00000001344d4000 CR4: 0000000000f50ef0
[ 128.186562] PKRU: 55555554
[ 128.189274] Kernel panic - not syncing: Fatal exception in interrupt
[ 129.452836] Shutting down cpus with NMI
[ 129.456700] Kernel Offset: 0x2e800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 129.625197] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Edited by Sefa Eyeoglu