general protection fault in dc_plane_state_release when releasing display from standby
Brief summary of problem:
Experiencing intermittent kernel general protection fault when reviving display from power save "standby" (DPMS?) state. When this happens, display remains blanked, and Xorg no longer responds. Able to log-into system remotely over ssh (will attach the kernel call trace to this bug - both from kernel 6.1.19 and in 6.1.31, compiled using Gentoo sys-kernel/gentoo-sources). Display will blank (normal X/DPMS related blanking), and when keyboard or mouse used to "wake up" display, general protection fault will follow, display will not recover, and X process will hang indefinitely and cannot be stopped/restarted.
Each time the problem occurs, the traceback occurs in amdgpu dc_plane_state_release.
Bug #1585 and #1878 are similar, but are not the same problem is what I am experiencing.
Kernel dmesg at time of issue is as follows (abbreviated, full dmesg below):
[143238.419340] general protection fault, probably for non-canonical address 0xff918bb4f80798: 0000 [#1] PREEMPT SMP NOPTI
[143238.419344] CPU: 3 PID: 19667 Comm: X Tainted: G O 6.1.31-gentoo-x86_64 #1
[143238.419346] Hardware name: AZW GTR/GTR, BIOS GTR_V1.24_bLink_P4C5M43 06/08/2022
[143238.419347] RIP: 0010:dc_plane_state_release+0xb/0x1a0 [amdgpu]
[143238.419395] Code: cc cc cc be 01 00 00 00 48 89 cf e9 8f 0a ae d4 be 02 00 00 00 48 89 cf e9 82 0a ae d4 66 90 ba ff ff ff ff 53 48 89 fb 89 d0 <f0> 0f c1 87 98 03 00 00 83 f8 01 74 0e 85 c0 0f 8e f3 00 00 00 5b
[143238.419396] RSP: 0018:ffffa34607e7fba0 EFLAGS: 00010206
[143238.419398] RAX: 00000000ffffffff RBX: 00ff918bb4f80400 RCX: 000000000169fd03
[143238.419399] RDX: 00000000ffffffff RSI: ffff918acebfef00 RDI: 00ff918bb4f80400
[143238.419399] RBP: ffff918a497fe000 R08: 0000000000000002 R09: ffffa34607e7f6f0
[143238.419400] R10: 0000000000000001 R11: 0000000000000000 R12: ffff918a49f00010
[143238.419401] R13: 0000000000000003 R14: 00000000ffffffff R15: 0000000000000004
[143238.419401] FS: 00007f34bfdc78c0(0000) GS:ffff91912e6c0000(0000) knlGS:0000000000000000
[143238.419402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[143238.419403] CR2: 00007f34a6091000 CR3: 0000000106b32000 CR4: 0000000000750ee0
[143238.419404] PKRU: 55555554
[143238.419404] Call Trace:
[143238.419406] <TASK>
[143238.419407] ? die_addr.cold+0x8/0xd
[143238.419410] ? exc_general_protection+0x208/0x470
[143238.419413] ? asm_exc_general_protection+0x22/0x30
[143238.419415] ? dc_plane_state_release+0xb/0x1a0 [amdgpu]
[143238.419451] dm_drm_plane_destroy_state+0x19/0x30 [amdgpu]
[143238.419499] drm_atomic_state_default_clear+0x1c0/0x2e0 [drm]
[143238.419506] __drm_atomic_state_free+0x60/0x90 [drm]
[143238.419512] drm_atomic_helper_update_plane+0xfc/0x150 [drm_kms_helper]
[143238.419516] drm_mode_cursor_universal+0x115/0x210 [drm]
[143238.419523] drm_mode_cursor_common+0xc2/0x1f0 [drm]
[143238.419527] ? drm_mode_cursor_ioctl+0x50/0x50 [drm]
[143238.419532] drm_ioctl_kernel+0x9c/0x150 [drm]
[143238.419537] drm_ioctl+0x21c/0x420 [drm]
[143238.419542] ? drm_mode_cursor_ioctl+0x50/0x50 [drm]
[143238.419546] ? __fget_light+0x98/0x100
[143238.419549] amdgpu_drm_ioctl+0x45/0x80 [amdgpu]
[143238.419578] __x64_sys_ioctl+0x8b/0xc0
[143238.419582] do_syscall_64+0x3a/0x90
[143238.419584] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[143238.419585] RIP: 0033:0x7f34bff0ec2b
[143238.419586] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[143238.419587] RSP: 002b:00007fff075fea60 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[143238.419588] RAX: ffffffffffffffda RBX: 00007fff075feaf0 RCX: 00007f34bff0ec2b
[143238.419589] RDX: 00007fff075feaf0 RSI: 00000000c02464bb RDI: 000000000000000b
[143238.419589] RBP: 00007fff075feaf0 R08: 0000000000000000 R09: 0000000000000001
[143238.419590] R10: 0000000000000780 R11: 0000000000000246 R12: 00000000c02464bb
[143238.419591] R13: 000000000000000b R14: 0000000000000008 R15: 000055f2b10c2cb0
[143238.419592] </TASK>
[143238.419592] Modules linked in: snd_seq_dummy snd_seq snd_seq_device bridge stp llc binfmt_misc snd_aloop amdgpu vboxnetadp(O) intel_rapl_msr snd_hda_codec_realtek intel_rapl_common iwlmvm snd_hda_codec_generic btusb vboxnetflt(O) edac_mce_amd drm_ttm_helper ledtrig_audio btrfs snd_hda_codec_hdmi btrtl btbcm ttm xor mac80211 btintel raid6_pq kvm_amd iommu_v2 btmtk libarc4 zstd_compress vboxdrv(O) joydev wmi_bmof gpu_sched bluetooth snd_hda_intel i2c_algo_bit kvm snd_intel_dspcfg snd_intel_sdw_acpi drm_buddy irqbypass snd_hda_codec iwlwifi ecdh_generic drm_display_helper crct10dif_pclmul snd_hda_core ghash_clmulni_intel cec sha512_ssse3 snd_hwdep drm_kms_helper snd_pcm rapl snd_timer cfg80211 snd_pci_acp5x sp5100_tco drm snd_rn_pci_acp3x snd rfkill snd_acp_config snd_soc_acpi soundcore pcspkr snd_pci_acp3x i2c_piix4 ccp k10temp video wmi acpi_cpufreq dm_crypt trusted asn1_encoder uas nvme crc32_pclmul crc32c_intel nvme_core sd_mod t10_pi crc64_rocksoft crc64 r8169
[143238.419641] ---[ end trace 0000000000000000 ]---
[143238.419642] RIP: 0010:dc_plane_state_release+0xb/0x1a0 [amdgpu]
[143238.419678] Code: cc cc cc be 01 00 00 00 48 89 cf e9 8f 0a ae d4 be 02 00 00 00 48 89 cf e9 82 0a ae d4 66 90 ba ff ff ff ff 53 48 89 fb 89 d0 <f0> 0f c1 87 98 03 00 00 83 f8 01 74 0e 85 c0 0f 8e f3 00 00 00 5b
[143238.419679] RSP: 0018:ffffa34607e7fba0 EFLAGS: 00010206
[143238.419680] RAX: 00000000ffffffff RBX: 00ff918bb4f80400 RCX: 000000000169fd03
[143238.419681] RDX: 00000000ffffffff RSI: ffff918acebfef00 RDI: 00ff918bb4f80400
[143238.419681] RBP: ffff918a497fe000 R08: 0000000000000002 R09: ffffa34607e7f6f0
[143238.419682] R10: 0000000000000001 R11: 0000000000000000 R12: ffff918a49f00010
[143238.419683] R13: 0000000000000003 R14: 00000000ffffffff R15: 0000000000000004
[143238.419683] FS: 00007f34bfdc78c0(0000) GS:ffff91912e6c0000(0000) knlGS:0000000000000000
[143238.419684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[143238.419685] CR2: 00007f34a6091000 CR3: 0000000106b32000 CR4: 0000000000750ee0
[143238.419685] PKRU: 55555554
Hardware configuration:
CPU: AMD Ryzen 9 5900HX with Radeon Graphics (from /proc/cpuinfo)
GPU: (integrated graphics) 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1638] (rev c4)
System Memory: 32G (DDR4)
Display: ViewSonic VS15964 (1920x1200 59.95 Hz)
Display Connection in use: DisplayPort-0
System information:
Distribution: Gentoo
Kernel: 6.1.31-gentoo-x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jun 5 23:28:49 CDT 2023 x86_64 AMD Ryzen 9 5900HX with Radeon Graphics AuthenticAMD GNU/Linux
Kernel source: Gentoo sys-kernel/gentoo-sources v6.1.19, v6.1.31
AMD Official Driver Version: N/A, included as part of Linux kernel sources
Additional information:
-
The output I am using is DisplayPort-0, which is a "plain old" DisplayPort output. There are multiple outputs, including HDMI, available on the system.
-
I am NOT using a display manager (xdm) - basically, I am "old school" - I start X via 'startx' each time I log-in via the text console, and bring down X before logging off for the evening.
-
I am using fluxbox as my window manager. Again, fairly "old school." I mostly run X terminals (classic xterm) for text-based applications, Firefox, and a few applications under Wine. The applications being run does not appear to affect the crashes.
-
I use xrandr to turn on TearFree mode (xrandr --output DisplayPort-0 --set TearFree on) to prevent some "tearing" of window contents while moving windows. I have experienced the problem when TearFree was off as well.
-
I have not yet tried turning off DPMS power saving on the output to see if that improves the situation. I will await further suggestions before trying these because of the intermittent nature of the problem.
-
It is possible that this happens as the display is being blanked, but I try to wake it up before it has completed its task.
Attached files:
Log files
kernel logs: dmesg-6.1.19.txt dmesg-6.1.31.txt
kernel config: config.gz