amdgpu crashes on Ryzen 9 7940HS
I'd like to know if there are plans to support the AMD 7940HS chipset/780M GPU. I have an ASUS laptop that uses this APU, the ROG Flow 13 GV302XA. I am getting visual corruption in both Xorg and Wayland (both Gnome and KDE Xorg and Plasma). Also, libdrm functions don't seem to work reliably, for example I think drmModePageFlip() blocks indefinitely in certain circumstances depending on the refresh rate of the display (still testing this). I am also getting intermittent sluggish behavior (like a glitchy cursor) and random amdgpu driver crashes. Switching from the X or Wayland session to a framebuffer console sometimes crashes the driver. When it does work, OpenGL rendering in games works with good performance at 1920x1200 despite the occasional random lockup.
I am also having problems with waking from suspend. This laptop reports no ACPI S3 state, and once it suspends (breathing power LED), no button or input of any kind will restore the system or have any effect at all, requiring a reboot. However, if I blacklist amdgpu, pressing the power button shows an attempt to wake from suspend, but the laptop screen remains off even though the power LED is solid and the keyboard backlight comes back on.
I'm running the latest 6.6 mainline kernel on Debian trixie and firmware version 0x08002A00. I also compiled Mesa (both 64/32 bit) and am using the latest radeonsi driver (Mesa 24.0.0-devel (git-0976dfeca2)). This didn't improve the visual corruption over the version of Mesa included with Debian. The laptop uses a 120 Hz variable refresh display (1920x1200) and I have the VRR property enabled on the CRTC driving the display, but tried it with it disabled and it made no difference.
Additional Info:
I tried today's kernel 6.7-rc1 release, and I didn't notice any difference. Same glitches and crash when switching to a framebuffer console, I think after a DRM-based application (like a Wayland compositor) renders fullscreen.
I also have some more info about the libdrm problem. I noticed SDL's kmsdrm driver doesn't work properly. Any attempt to draw using this method will lock up the laptop if the rendering frame rate is below the display's max refresh rate. For example, if the display is set to 120 Hz, launching an SDL application using kmsdrm will render a single frame, then lock up if the render loop is under 120 fps.
UPDATE: turning off the hardware cursor seems to fix a lot of problems with timeouts during DRM flips (see below), as reported by the amdgpu driver after a crash. This was tested on X11 using the modesetting driver with both "SWcursor" and "VariableRefresh" set to "On." There is still tearing even when variable refresh is enabled under X11. I still need to figure out a way to prevent the timeouts when using libdrm+libgbm+Mesa directly with no window manager.
[Sun Nov 19 01:17:49 2023] amdgpu 0000:69:00.0: [drm] *ERROR* flip_done timed out
[Sun Nov 19 01:17:49 2023] amdgpu 0000:69:00.0: [drm] *ERROR* [CRTC:79:crtc-0] commit wait timed out
Output from psr.py:
DRI device 0 DMCUB F/W version: 0x08002a00
○ PSR 2 with Y coordinates (eDP 1.4b or eDP 1.5) [4]
○ Sink OUI: 38-ec-11
○ resv_40f: 00
○ ID String: 01-07
○ PSR Status: 00-00-02
journalctl output from the driver crash when switching to a framebuffer console:
Nov 12 02:14:56 debian kernel: ------------[ cut here ]------------
Nov 12 02:14:56 debian kernel: WARNING: CPU: 13 PID: 925 at drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:225 dmub_psr_enable+0xf7/0x100 [amdgpu]
Nov 12 02:14:56 debian kernel: Modules linked in: ccm rfcomm cmac algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep qrtr sunrpc binfmt_misc intel_rapl_msr intel_rapl_common btusb edac_mce_amd btrtl btintel btbcm btmtk kvm_amd bluetooth kvm mt7921e rtw88_882>
Nov 12 02:14:56 debian kernel: cs_dsp snd_acp_config typec_ucsi snd_timer snd_soc_cs35l41_lib sp5100_tco hid_sensor_accel_3d roles snd_soc_acpi ccp snd hid_sensor_trigger watchdog snd_pci_acp3x hid_sensor_iio_common typec soundcore industrialio_triggered_buffer ac kfifo_buf serial_multi_insta>
Nov 12 02:14:56 debian kernel: CPU: 13 PID: 925 Comm: systemd-logind Not tainted 6.6.1 #1
Nov 12 02:14:56 debian kernel: Hardware name: ASUSTeK COMPUTER INC. ROG Flow X13 GV302XA_GV302XA/GV302XA, BIOS GV302XA.311 05/29/2023
Nov 12 02:14:56 debian kernel: RIP: 0010:dmub_psr_enable+0xf7/0x100 [amdgpu]
Nov 12 02:14:56 debian kernel: Code: c0 75 cf 81 fb e8 03 00 00 74 1f 48 8b 44 24 48 65 48 2b 04 25 28 00 00 00 75 13 48 83 c4 50 5b 5d 41 5c 41 5d e9 49 58 42 e0 <0f> 0b eb dd e8 b0 d8 40 e0 90 90 90 90 90 90 90 90 90 90 90 90 90
Nov 12 02:14:56 debian kernel: RSP: 0018:ffffc90001bd35d8 EFLAGS: 00010246
Nov 12 02:14:56 debian kernel: RAX: 00000251de43c0b0 RBX: 00000000000003e9 RCX: 000000000000000d
Nov 12 02:14:56 debian kernel: RDX: 00000000001e4308 RSI: 00000000001e373c RDI: 00000251de257da8
Nov 12 02:14:56 debian kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffc900402ec600
Nov 12 02:14:56 debian kernel: R10: 0000000000000000 R11: ffffc90001bd3634 R12: ffff8881334f2350
Nov 12 02:14:56 debian kernel: R13: 0000000000000000 R14: ffffc90001bd36b3 R15: ffffc90001bd36b4
Nov 12 02:14:56 debian kernel: FS: 00007fe071023580(0000) GS:ffff88840e940000(0000) knlGS:0000000000000000
Nov 12 02:14:56 debian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 12 02:14:56 debian kernel: CR2: 000056071888ed38 CR3: 0000000109516000 CR4: 0000000000750ee0
Nov 12 02:14:56 debian kernel: PKRU: 55555554
Nov 12 02:14:56 debian kernel: Call Trace:
Nov 12 02:14:56 debian kernel: <TASK>
Nov 12 02:14:56 debian kernel: ? dmub_psr_enable+0xf7/0x100 [amdgpu]
Nov 12 02:14:56 debian kernel: ? __warn+0x81/0x130
Nov 12 02:14:56 debian kernel: ? dmub_psr_enable+0xf7/0x100 [amdgpu]
Nov 12 02:14:56 debian kernel: ? report_bug+0x191/0x1c0
Nov 12 02:14:56 debian kernel: ? handle_bug+0x3c/0x80
Nov 12 02:14:56 debian kernel: ? exc_invalid_op+0x17/0x70
Nov 12 02:14:56 debian kernel: ? asm_exc_invalid_op+0x1a/0x20
Nov 12 02:14:56 debian kernel: ? dmub_psr_enable+0xf7/0x100 [amdgpu]
Nov 12 02:14:56 debian kernel: ? dmub_psr_enable+0xac/0x100 [amdgpu]
Nov 12 02:14:56 debian kernel: edp_set_psr_allow_active+0x27b/0x3b0 [amdgpu]
Nov 12 02:14:56 debian kernel: amdgpu_dm_psr_disable+0x5b/0x80 [amdgpu]
Nov 12 02:14:56 debian kernel: amdgpu_dm_atomic_commit_tail+0x2c0b/0x39a0 [amdgpu]
Nov 12 02:14:56 debian kernel: ? srso_alias_return_thunk+0x1/0x7f
Nov 12 02:14:56 debian kernel: commit_tail+0x91/0x130 [drm_kms_helper]
Nov 12 02:14:56 debian kernel: drm_atomic_helper_commit+0x11a/0x140 [drm_kms_helper]
Nov 12 02:14:56 debian kernel: drm_atomic_commit+0x97/0xd0 [drm]
Nov 12 02:14:56 debian kernel: ? __pfx___drm_printfn_info+0x10/0x10 [drm]
Nov 12 02:14:56 debian kernel: drm_client_modeset_commit_atomic+0x203/0x250 [drm]
Nov 12 02:14:56 debian kernel: drm_client_modeset_commit_locked+0x5a/0x160 [drm]
Nov 12 02:14:56 debian kernel: __drm_fb_helper_restore_fbdev_mode_unlocked+0x5e/0xd0 [drm_kms_helper]
Nov 12 02:14:56 debian kernel: drm_fb_helper_set_par+0x2f/0x40 [drm_kms_helper]
Nov 12 02:14:56 debian kernel: fb_set_var+0x201/0x420
Nov 12 02:14:56 debian kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 12 02:14:56 debian kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 12 02:14:56 debian kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 12 02:14:56 debian kernel: ? update_load_avg+0x7e/0x780
Nov 12 02:14:56 debian kernel: fbcon_blank+0x213/0x310
Nov 12 02:14:56 debian kernel: do_unblank_screen+0xa9/0x160
Nov 12 02:14:56 debian kernel: complete_change_console+0x54/0x120
Nov 12 02:14:56 debian kernel: vt_ioctl+0xd8b/0x13f0
Nov 12 02:14:56 debian kernel: tty_ioctl+0x4ea/0x8b0
Nov 12 02:14:56 debian kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 12 02:14:56 debian kernel: ? __seccomp_filter+0x333/0x520
Nov 12 02:14:56 debian kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 12 02:14:56 debian kernel: __x64_sys_ioctl+0x94/0xd0
Nov 12 02:14:56 debian kernel: do_syscall_64+0x5d/0xc0
Nov 12 02:14:56 debian kernel: ? do_syscall_64+0x6c/0xc0
Nov 12 02:14:56 debian kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 12 02:14:56 debian kernel: ? do_syscall_64+0x6c/0xc0
Nov 12 02:14:56 debian kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Nov 12 02:14:56 debian kernel: RIP: 0033:0x7fe071b1b51b
Nov 12 02:14:56 debian kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Nov 12 02:14:56 debian kernel: RSP: 002b:00007ffcc40814b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 12 02:14:56 debian kernel: RAX: ffffffffffffffda RBX: 0000000000000018 RCX: 00007fe071b1b51b
Nov 12 02:14:56 debian kernel: RDX: 0000000000000001 RSI: 0000000000005605 RDI: 0000000000000018
Nov 12 02:14:56 debian kernel: RBP: 0000000000000000 R08: 00007ffcc40814b0 R09: 0000558a018d6668
Nov 12 02:14:56 debian kernel: R10: 00007ffcc40814f0 R11: 0000000000000246 R12: 00007ffcc40815a0
Nov 12 02:14:56 debian kernel: R13: 00007ffcc4081598 R14: 0000558a018d6f60 R15: 0000000000000006
Nov 12 02:14:56 debian kernel: </TASK>
Nov 12 02:14:56 debian kernel: ---[ end trace 0000000000000000 ]---