amdgpu kernel null pointer dereference, RIP: 0010:amdgpu_dma_buf_move_notify+0x7c/0x170 [amdgpu]
Brief summary of the problem:
Running native Steam on swaywm, I encounter occasional kernel oops with crash of my Wayland session. I can sometimes login via ssh, but typically I'm not able to make use of my laptop without a reboot.
- CPU: AMD Ryzen 5 3550H with Radeon Vega Mobile Gfx
- 01:00.0 Display controller : Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] [1002:67ef] (rev e5)
- 05:00.0 VGA compatible controller : Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8] (rev c2)
- System Memory: 31 GiB
- Display(s): ASUS TUF Gaming FX505DY internal screen and external screen
- Type of Display Connection: internal and HDMI
- Distro name and Version: Debian 11.1
- Kernel version: Linux 5.15.5
- Custom kernel: Vanilla Kernel 5.15.5
How to reproduce the issue:
Start sway session.
Run Steam, I'm using all env vars for dGPU because I'm too lazy to edit command lines for individual games:
DRI_PRIME=1 \ mesa_glthread=true \ DXVK_FILTER_DEVICE_NAME="POLARIS" \ VKD3D_VULKAN_DEVICE=1 \ steam
Launch another application.
Move focus between steam and the other application.
Sometimes this takes minutes to repro.
I'm not sure if it only happens with Steam or other Xwayland clients as well, but it's definitely the most frequent/consistent repro.
The issue has been around for a while and through several kernel versions. I wouldn't be able to tell I've not had it while still on X11, or when it started. I just never got round to report.
Log files (for system lockups / game freezes / crashes)
faddr2line converts this to line 387 in drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
if (bo->tbo.resource->mem_type == TTM_PL_SYSTEM)