BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1ce/0x710 [amdgpu]
System information
System:
Host: el-ryzerino Kernel: 6.7.4-200.fc39.x86_64 arch: x86_64 bits: 64
compiler: gcc v: 2.40-14.fc39 Desktop: GNOME v: 45.3 tk: GTK v: 3.24.41
wm: gnome-shell dm: GDM Distro: Fedora release 39 (Thirty Nine)
CPU:
Info: 16-core model: AMD Ryzen 9 5950X bits: 64 type: MT MCP arch: Zen 3+
rev: 2 cache: L1: 1024 KiB L2: 8 MiB L3: 64 MiB
Speed (MHz): avg: 3400 min/max: 2200/5083 boost: enabled cores: 1: 3400
2: 3400 3: 3400 4: 3400 5: 3400 6: 3400 7: 3400 8: 3400 9: 3400 10: 3400
11: 3400 12: 3400 13: 3400 14: 3400 15: 3400 16: 3400 17: 3400 18: 3400
19: 3400 20: 3400 21: 3400 22: 3400 23: 3400 24: 3400 25: 3400 26: 3400
27: 3400 28: 3400 29: 3400 30: 3400 31: 3400 32: 3400 bogomips: 217189
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Navi 31 [Radeon RX 7900 XT/7900 XTX] vendor: ASRock
driver: amdgpu v: kernel arch: RDNA-3 pcie: speed: 16 GT/s lanes: 16 ports:
active: DP-4,HDMI-A-1 empty: DP-1, DP-2, DP-3, DP-5 bus-ID: 0e:00.0
chip-ID: 1002:744c
Display: server: X.Org v: 1.20.14 with: Xwayland v: 23.2.4
compositor: gnome-shell driver: X: loaded: amdgpu
unloaded: fbdev,modesetting,radeon,vesa dri: radeonsi gpu: amdgpu
display-ID: :0 screens: 1
Screen-1: 0 s-res: 4480x1440 s-dpi: 96
Monitor-1: DP-4 mapped: DisplayPort-3 pos: right model: HP Z24n G2
res: 1920x1200 dpi: 94 diag: 611mm (24.1")
Monitor-2: HDMI-A-1 mapped: HDMI-A-0 pos: primary,left model: XG27WQ
res: 2560x1440 dpi: 109 diag: 703mm (27.7")
API: EGL v: 1.5 platforms: device: 0 drv: radeonsi device: 1 drv: swrast
surfaceless: drv: radeonsi x11: drv: radeonsi inactive: gbm,wayland
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 23.3.5 glx-v: 1.4
direct-render: yes renderer: AMD Radeon RX 7900 GRE (radeonsi navi31 LLVM
17.0.6 DRM 3.57 6.7.4-200.fc39.x86_64) device-ID: 1002:744c
API: Vulkan v: 1.3.268 surfaces: xcb,xlib device: 0 type: discrete-gpu
driver: mesa radv device-ID: 1002:744c device: 1 type: cpu
driver: mesa llvmpipe device-ID: 10005:0000
- OS:
"Fedora Linux 39 (Workstation Edition)
- GPU:
0e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev ce)
It is a RX 7900 GRE
- Kernel version:
Linux el-ryzerino 6.7.4-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 5 22:21:14 UTC 2024 x86_64 GNU/Linux
- Mesa version:
OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.3.5
- Desktop manager and compositor:
Gnome 45
Describe the issue
I noticed an error in dmesg after waking up from suspend. I am not entirely sure what caused it.
Log files as attachment
[34258.413097] ==================================================================
[34258.413099] BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1ce/0x710 [amdgpu]
[34258.413269] Use-after-free read at 0x000000008d0cefe0 (in kfence-#98):
[34258.413270] amdgpu_bo_move+0x1ce/0x710 [amdgpu]
[34258.413413] ttm_bo_handle_move_mem+0xbb/0x170 [ttm]
[34258.413417] ttm_bo_validate+0xe5/0x180 [ttm]
[34258.413422] amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu]
[34258.413565] amdgpu_vm_validate_pt_bos+0xbd/0x380 [amdgpu]
[34258.413709] amdgpu_cs_parser_bos.isra.0+0x490/0x820 [amdgpu]
[34258.413845] amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu]
[34258.413975] drm_ioctl_kernel+0xd6/0x180
[34258.413978] drm_ioctl+0x26d/0x4b0
[34258.413979] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
[34258.414104] __x64_sys_ioctl+0x97/0xd0
[34258.414107] do_syscall_64+0x64/0xe0
[34258.414109] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[34258.414112] kfence-#98: 0x00000000dfd76b32-0x00000000f369dda2, size=240, cache=kmalloc-256
[34258.414113] allocated by task 193187 on cpu 17 at 34251.126096s:
[34258.414265] __kmem_cache_alloc_node+0x2a7/0x2e0
[34258.414267] kmalloc_trace+0x2a/0xa0
[34258.414269] amdgpu_gtt_mgr_new+0x40/0x140 [amdgpu]
[34258.414403] ttm_resource_alloc+0x3b/0x80 [ttm]
[34258.414407] ttm_bo_mem_space+0x88/0x230 [ttm]
[34258.414411] ttm_mem_evict_first+0x1c6/0x530 [ttm]
[34258.414415] ttm_resource_manager_evict_all+0xa7/0x1d0 [ttm]
[34258.414419] amdgpu_device_prepare+0x4e/0xd0 [amdgpu]
[34258.414546] pci_pm_prepare+0x34/0x70
[34258.414547] dpm_prepare+0x269/0x440
[34258.414549] dpm_suspend_start+0x1e/0x90
[34258.414551] suspend_devices_and_enter+0x16a/0x970
[34258.414552] pm_suspend+0x25e/0x590
[34258.414553] state_store+0x6c/0xd0
[34258.414555] kernfs_fop_write_iter+0x136/0x1d0
[34258.414556] vfs_write+0x23d/0x400
[34258.414558] ksys_write+0x6f/0xf0
[34258.414559] do_syscall_64+0x64/0xe0
[34258.414560] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[34258.414562] freed by task 53793 on cpu 27 at 34258.413092s:
[34258.414961] ttm_resource_free+0x6b/0x80 [ttm]
[34258.414965] ttm_bo_move_accel_cleanup+0xc8/0x2a0 [ttm]
[34258.414969] amdgpu_bo_move+0x5d0/0x710 [amdgpu]
[34258.415099] ttm_bo_handle_move_mem+0xbb/0x170 [ttm]
[34258.415103] ttm_bo_validate+0xe5/0x180 [ttm]
[34258.415107] amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu]
[34258.415239] amdgpu_vm_validate_pt_bos+0xbd/0x380 [amdgpu]
[34258.415374] amdgpu_cs_parser_bos.isra.0+0x490/0x820 [amdgpu]
[34258.415505] amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu]
[34258.415637] drm_ioctl_kernel+0xd6/0x180
[34258.415638] drm_ioctl+0x26d/0x4b0
[34258.415639] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
[34258.415766] __x64_sys_ioctl+0x97/0xd0
[34258.415768] do_syscall_64+0x64/0xe0
[34258.415769] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[34258.415771] CPU: 27 PID: 53793 Comm: firefox:cs0 Not tainted 6.7.4-200.fc39.x86_64 #1
[34258.415773] Hardware name: To Be Filled By O.E.M. B550 Taichi/B550 Taichi, BIOS P3.40 01/18/2024
[34258.415774] ==================================================================
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Martin Wolf changed the description
changed the description
- Eric Engestrom added radeonsi in Mesa / mesa label
added radeonsi in Mesa / mesa label
- Michel Dänzer moved from mesa/mesa#10591 (moved)
moved from mesa/mesa#10591 (moved)
- Michel Dänzer added AMDgpu TTM labels
- Alex Deucher added 7000 dGPU series label
added 7000 dGPU series label
Same problem on my 6600 XT after resuming from S3 (kernel 6.7.4-zen1-1-zen):
[11972.234444] ================================================================== [11972.234446] BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1db/0x810 [amdgpu] [11972.234620] Use-after-free read at 0x00000000a39fe1c5 (in kfence-#20): [11972.234621] amdgpu_bo_move+0x1db/0x810 [amdgpu] [11972.234769] ttm_bo_validate+0x154/0x370 [ttm] [11972.234773] amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] [11972.234931] amdgpu_vm_validate_pt_bos+0xc2/0x4a0 [amdgpu] [11972.235082] amdgpu_cs_parser_bos.isra.0+0x496/0x820 [amdgpu] [11972.235231] amdgpu_cs_ioctl+0xa7c/0x1cc0 [amdgpu] [11972.235503] drm_ioctl_kernel+0xd6/0x180 [11972.235505] drm_ioctl+0x26d/0x4b0 [11972.235505] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [11972.235650] __x64_sys_ioctl+0x97/0xd0 [11972.235651] do_syscall_64+0x64/0xe0 [11972.235653] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [11972.235655] kfence-#20: 0x000000005547c341-0x00000000e431ed11, size=240, cache=kmalloc-256 [11972.235656] allocated by task 51785 on cpu 9 at 11966.852258s: [11972.235822] __kmem_cache_alloc_node+0x304/0x330 [11972.235823] kmalloc_trace+0x2a/0xa0 [11972.235824] amdgpu_gtt_mgr_new+0x40/0x140 [amdgpu] [11972.235995] ttm_resource_alloc+0x45/0x190 [ttm] [11972.235999] ttm_bo_mem_space+0x89/0x230 [ttm] [11972.236003] ttm_mem_evict_first+0x290/0x6c0 [ttm] [11972.236006] ttm_resource_manager_evict_all+0xa7/0x1d0 [ttm] [11972.236010] amdgpu_device_prepare+0x4e/0xd0 [amdgpu] [11972.236156] pci_pm_prepare+0x34/0x70 [11972.236156] dpm_prepare+0x550/0x790 [11972.236157] dpm_suspend_start+0x1e/0x2c0 [11972.236158] suspend_devices_and_enter+0x168/0xa30 [11972.236159] pm_suspend+0x2b1/0x5c0 [11972.236160] state_store+0xbc/0x140 [11972.236160] kernfs_fop_write_iter+0x122/0x200 [11972.236161] vfs_write+0x2af/0x440 [11972.236162] __x64_sys_write+0x74/0xf0 [11972.236163] do_syscall_64+0x64/0xe0 [11972.236164] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [11972.236165] freed by task 1670 on cpu 0 at 11972.234441s: [11972.236645] ttm_resource_free+0x83/0x190 [ttm] [11972.236649] ttm_bo_move_accel_cleanup+0xc8/0x2a0 [ttm] [11972.236653] amdgpu_bo_move+0x1a3/0x810 [amdgpu] [11972.236839] ttm_bo_validate+0x154/0x370 [ttm] [11972.236843] amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] [11972.237022] amdgpu_vm_validate_pt_bos+0xc2/0x4a0 [amdgpu] [11972.237284] amdgpu_cs_parser_bos.isra.0+0x496/0x820 [amdgpu] [11972.237461] amdgpu_cs_ioctl+0xa7c/0x1cc0 [amdgpu] [11972.237714] drm_ioctl_kernel+0xd6/0x180 [11972.237716] drm_ioctl+0x26d/0x4b0 [11972.237716] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [11972.237980] __x64_sys_ioctl+0x97/0xd0 [11972.237982] do_syscall_64+0x64/0xe0 [11972.237983] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [11972.237985] CPU: 0 PID: 1670 Comm: kwin_wayla:cs0 Tainted: P OE 6.7.4-zen1-1-zen #1 0a055c8fa38ec9f3120a144f16f586fae1ad0e30 [11972.237988] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F37g 09/20/2023 [11972.237989] ==================================================================
Edited by fililipNote: the occurrence of this use-after-free is extremely random. I haven't been able to reproduce it again so far, even while having a VAAPI stream + a video game running upon suspending, though judging by the fact that the OP and I have seen it happen (and we have proof above), it does seem to happen at times.
@GeneralProbe, have you seen this happen again ever since? What was going on on your PC when you got this error?
And a question to the devs: does this bug actually mean anything dangerous? Since nothing crashed when it happened, and it was caught by KFENCE, maybe it's not worth investigating after all?
Edited by fililip- Artem S. Tashkinov mentioned in issue #3185 (closed)
mentioned in issue #3185 (closed)
Caught it with Firefox prolly while watching a YouTube VP9 stream on Radeon 780m.
Collapse replies I almost never reboot/power cycle but sometimes I do e.g. when installing a new kernel or when the CPU gets stuck at 544MHz (a firmware issue specific to my laptop).
As to whether the error appeared on fresh boot or after the Nth suspend resume cycle I'm not sure but I think it was the latter.
Edited by Artem S. Tashkinov- Author
I noticed it at standby, I will check my journal tomorrow (pc is already off) if there is an error report that is not tied to standby. But it happens regularly.
Edited by Martin Wolf You can take a look at dmesg above that error to check whether or not you see messages such as "CPU is offline" etc., and amdgpu re-initialization after that.
You're correct, the error was logged right after resuming from software suspend.
1
System: AMD Ryzen 7 PRO 6850U / Rembrandt [Radeon 680M] [1002:1681] (rev d1) - ThinkPad T14 Gen 3 AMD (21CF)
Fedora 39 (KDE Spin), kernel: 6.7.4-200.fc39.x86_64, linux-firmware: 20240115-2.fc39Just had what appeared to be the same issue but on a Rembrandt APU - didn't seem to be immediately after wakeup from S0ix/s2idle though but a few minutes later. Couldn't confirm whether anything was actually displayed, since this was a spurious wakeup triggered by the current Qualcomm wireless firmware bug right after sleep timeout and I wasn't around then.
dmesg
Feb 18 05:02:04 hibiscus kernel: PM: resume devices took 0.327 seconds Feb 18 05:02:04 hibiscus kernel: OOM killer enabled. Feb 18 05:02:04 hibiscus kernel: Restarting tasks ... done. Feb 18 05:02:04 hibiscus kernel: random: crng reseeded on system resumption Feb 18 05:02:04 hibiscus kernel: PM: suspend exit ... (snipped; unrelated entries related to wireless) Feb 18 05:06:33 hibiscus kernel: ================================================================== Feb 18 05:06:33 hibiscus kernel: BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1ce/0x710 [amdgpu] Feb 18 05:06:33 hibiscus kernel: Use-after-free read at 0x00000000fab7b49b (in kfence-#218): Feb 18 05:06:33 hibiscus kernel: amdgpu_bo_move+0x1ce/0x710 [amdgpu] Feb 18 05:06:33 hibiscus kernel: ttm_bo_handle_move_mem+0xbb/0x170 [ttm] Feb 18 05:06:33 hibiscus kernel: ttm_bo_validate+0xe5/0x180 [ttm] Feb 18 05:06:33 hibiscus kernel: amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] Feb 18 05:06:33 hibiscus kernel: amdgpu_vm_validate_pt_bos+0xbd/0x380 [amdgpu] Feb 18 05:06:33 hibiscus kernel: amdgpu_cs_parser_bos.isra.0+0x490/0x820 [amdgpu] Feb 18 05:06:33 hibiscus kernel: amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu] Feb 18 05:06:33 hibiscus kernel: drm_ioctl_kernel+0xd6/0x180 Feb 18 05:06:33 hibiscus kernel: drm_ioctl+0x26d/0x4b0 Feb 18 05:06:33 hibiscus kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] Feb 18 05:06:33 hibiscus kernel: __x64_sys_ioctl+0x97/0xd0 Feb 18 05:06:33 hibiscus kernel: do_syscall_64+0x64/0xe0 Feb 18 05:06:33 hibiscus kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 Feb 18 05:06:33 hibiscus kernel: Feb 18 05:06:33 hibiscus kernel: kfence-#218: 0x00000000a6fa6b57-0x0000000020558e88, size=240, cache=kmalloc-256 Feb 18 05:06:33 hibiscus kernel: allocated by task 15534 on cpu 0 at 6372.590620s: Feb 18 05:06:33 hibiscus kernel: __kmem_cache_alloc_node+0x2a7/0x2e0 Feb 18 05:06:33 hibiscus kernel: kmalloc_trace+0x2a/0xa0 Feb 18 05:06:33 hibiscus kernel: amdgpu_gtt_mgr_new+0x40/0x140 [amdgpu] Feb 18 05:06:33 hibiscus kernel: ttm_resource_alloc+0x3b/0x80 [ttm] Feb 18 05:06:33 hibiscus kernel: ttm_bo_mem_space+0x88/0x230 [ttm] Feb 18 05:06:33 hibiscus kernel: ttm_mem_evict_first+0x1c6/0x530 [ttm] Feb 18 05:06:33 hibiscus kernel: ttm_resource_manager_evict_all+0xa7/0x1d0 [ttm] Feb 18 05:06:33 hibiscus kernel: amdgpu_device_prepare+0x4e/0xd0 [amdgpu] Feb 18 05:06:33 hibiscus kernel: pci_pm_prepare+0x34/0x70 Feb 18 05:06:33 hibiscus kernel: dpm_prepare+0x269/0x440 Feb 18 05:06:33 hibiscus kernel: dpm_suspend_start+0x1e/0x90 Feb 18 05:06:33 hibiscus kernel: suspend_devices_and_enter+0x16a/0x970 Feb 18 05:06:33 hibiscus kernel: pm_suspend+0x25e/0x590 Feb 18 05:06:33 hibiscus kernel: state_store+0x6c/0xd0 Feb 18 05:06:33 hibiscus kernel: kernfs_fop_write_iter+0x136/0x1d0 Feb 18 05:06:33 hibiscus kernel: vfs_write+0x23d/0x400 Feb 18 05:06:33 hibiscus kernel: ksys_write+0x6f/0xf0 Feb 18 05:06:33 hibiscus kernel: do_syscall_64+0x64/0xe0 Feb 18 05:06:33 hibiscus kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 Feb 18 05:06:33 hibiscus kernel: Feb 18 05:06:33 hibiscus kernel: freed by task 2039 on cpu 3 at 6644.208284s: Feb 18 05:06:33 hibiscus kernel: ttm_resource_free+0x6b/0x80 [ttm] Feb 18 05:06:33 hibiscus kernel: ttm_bo_move_accel_cleanup+0xc8/0x2a0 [ttm] Feb 18 05:06:33 hibiscus kernel: amdgpu_bo_move+0x5d0/0x710 [amdgpu] Feb 18 05:06:33 hibiscus kernel: ttm_bo_handle_move_mem+0xbb/0x170 [ttm] Feb 18 05:06:33 hibiscus kernel: ttm_bo_validate+0xe5/0x180 [ttm] Feb 18 05:06:33 hibiscus kernel: amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] Feb 18 05:06:33 hibiscus kernel: amdgpu_vm_validate_pt_bos+0xbd/0x380 [amdgpu] Feb 18 05:06:33 hibiscus kernel: amdgpu_cs_parser_bos.isra.0+0x490/0x820 [amdgpu] Feb 18 05:06:33 hibiscus kernel: amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu] Feb 18 05:06:33 hibiscus kernel: drm_ioctl_kernel+0xd6/0x180 Feb 18 05:06:33 hibiscus kernel: drm_ioctl+0x26d/0x4b0 Feb 18 05:06:33 hibiscus kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] Feb 18 05:06:33 hibiscus kernel: __x64_sys_ioctl+0x97/0xd0 Feb 18 05:06:33 hibiscus kernel: do_syscall_64+0x64/0xe0 Feb 18 05:06:33 hibiscus kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 Feb 18 05:06:33 hibiscus kernel: Feb 18 05:06:33 hibiscus kernel: CPU: 3 PID: 2039 Comm: plasmashel:cs0 Not tainted 6.7.4-200.fc39.x86_64 #1 Feb 18 05:06:33 hibiscus kernel: Hardware name: LENOVO 21CFA000CD/21CFA000CD, BIOS R23ET73W (1.49 ) 11/30/2023 Feb 18 05:06:33 hibiscus kernel: ==================================================================
Edited by Selene LynnExperienced the same problem, just after the resume from display supension: while using Firefox, the screen become black only showing mousepointer.
- 09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev c7)
- CPU Ryzen 5 3600
- Fedora 39 KDE on Wayland
- 6.7.4-200.fc39.x86_64
- amd-gpu-firmware-20240115-2.fc39.noarch
- Mesa 23.3.5
feb 20 10:23:13 machine kernel: ================================================================== feb 20 10:23:13 machine kernel: BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1ce/0x710 [amdgpu] feb 20 10:23:13 machine kernel: Use-after-free read at 0x000000004f6bd0a1 (in kfence-#245): feb 20 10:23:13 machine kernel: amdgpu_bo_move+0x1ce/0x710 [amdgpu] feb 20 10:23:13 machine kernel: ttm_bo_handle_move_mem+0xbb/0x170 [ttm] feb 20 10:23:13 machine kernel: ttm_mem_evict_first+0x201/0x530 [ttm] feb 20 10:23:13 machine kernel: ttm_bo_mem_space+0x1cd/0x230 [ttm] feb 20 10:23:13 machine kernel: ttm_bo_validate+0x95/0x180 [ttm] feb 20 10:23:13 machine kernel: amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] feb 20 10:23:13 machine kernel: amdgpu_cs_parser_bos.isra.0+0x4c3/0x820 [amdgpu] feb 20 10:23:13 machine kernel: amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu] feb 20 10:23:13 machine kernel: drm_ioctl_kernel+0xd6/0x180 feb 20 10:23:13 machine kernel: drm_ioctl+0x26d/0x4b0 feb 20 10:23:13 machine kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] feb 20 10:23:13 machine kernel: __x64_sys_ioctl+0x97/0xd0 feb 20 10:23:13 machine kernel: do_syscall_64+0x64/0xe0 feb 20 10:23:13 machine kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 feb 20 10:23:13 machine kernel: feb 20 10:23:13 machine kernel: kfence-#245: 0x000000005bafffc0-0x000000007035677d, size=96, cache=kmalloc-96 feb 20 10:23:13 machine kernel: allocated by task 6129 on cpu 5 at 266.756962s: feb 20 10:23:13 machine kernel: __kmem_cache_alloc_node+0x2a7/0x2e0 feb 20 10:23:13 machine kernel: kmalloc_trace+0x2a/0xa0 feb 20 10:23:13 machine kernel: amdgpu_vram_mgr_new+0x91/0x3a0 [amdgpu] feb 20 10:23:13 machine kernel: ttm_resource_alloc+0x3b/0x80 [ttm] feb 20 10:23:13 machine kernel: ttm_bo_mem_space+0x17b/0x230 [ttm] feb 20 10:23:13 machine kernel: ttm_bo_validate+0x95/0x180 [ttm] feb 20 10:23:13 machine kernel: amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] feb 20 10:23:13 machine kernel: amdgpu_cs_parser_bos.isra.0+0x4c3/0x820 [amdgpu] feb 20 10:23:13 machine kernel: amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu] feb 20 10:23:13 machine kernel: drm_ioctl_kernel+0xd6/0x180 feb 20 10:23:13 machine kernel: drm_ioctl+0x26d/0x4b0 feb 20 10:23:13 machine kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] feb 20 10:23:13 machine kernel: __x64_sys_ioctl+0x97/0xd0 feb 20 10:23:13 machine kernel: do_syscall_64+0x64/0xe0 feb 20 10:23:13 machine kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 feb 20 10:23:13 machine kernel: feb 20 10:23:13 machine kernel: freed by task 6129 on cpu 10 at 267.362352s: feb 20 10:23:13 machine kernel: ttm_resource_free+0x6b/0x80 [ttm] feb 20 10:23:13 machine kernel: ttm_bo_move_accel_cleanup+0x21d/0x2a0 [ttm] feb 20 10:23:13 machine kernel: amdgpu_bo_move+0x19b/0x710 [amdgpu] feb 20 10:23:13 machine kernel: ttm_bo_handle_move_mem+0xbb/0x170 [ttm] feb 20 10:23:13 machine kernel: ttm_mem_evict_first+0x201/0x530 [ttm] feb 20 10:23:13 machine kernel: ttm_bo_mem_space+0x1cd/0x230 [ttm] feb 20 10:23:13 machine kernel: ttm_bo_validate+0x95/0x180 [ttm] feb 20 10:23:13 machine kernel: amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] feb 20 10:23:13 machine kernel: amdgpu_cs_parser_bos.isra.0+0x4c3/0x820 [amdgpu] feb 20 10:23:13 machine kernel: amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu] feb 20 10:23:13 machine kernel: drm_ioctl_kernel+0xd6/0x180 feb 20 10:23:13 machine kernel: drm_ioctl+0x26d/0x4b0 feb 20 10:23:13 machine kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] feb 20 10:23:13 machine kernel: __x64_sys_ioctl+0x97/0xd0 feb 20 10:23:13 machine kernel: do_syscall_64+0x64/0xe0 feb 20 10:23:13 machine kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 feb 20 10:23:13 machine kernel: feb 20 10:23:13 machine kernel: CPU: 10 PID: 6129 Comm: firefox:cs0 Not tainted 6.7.4-200.fc39.x86_64 #1 feb 20 10:23:13 machine kernel: Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING/X470 AORUS ULTRA GAMING-CF, BIOS F64e 09/20/2023 feb 20 10:23:13 machine kernel: ================================================================== feb 20 10:23:14 machine abrt-dump-journal-oops[3874]: abrt-dump-journal-oops: Found oopses: 1 feb 20 10:23:14 machine abrt-dump-journal-oops[3874]: abrt-dump-journal-oops: Creating problem directories feb 20 10:23:15 machine abrt-dump-journal-oops[3874]: Reported 1 kernel oopses to Abrt feb 20 10:23:16 machine abrt-server[7166]: Can't find a meaningful backtrace for hashing in '.' feb 20 10:23:16 machine abrt-server[7166]: Deleting non-reportable oops '.' because DropNotReportableOopses is set to 'yes' feb 20 10:23:16 machine abrt-server[7166]: 'post-create' on '/var/spool/abrt/oops-2024-02-20-10:23:14-3874-0' exited with 1 feb 20 10:23:16 machine abrt-server[7166]: Deleting problem directory '/var/spool/abrt/oops-2024-02-20-10:23:14-3874-0' feb 20 10:23:16 machine abrt-server[7166]: Lock file '.lock' was locked by process 7280, but it crashed? feb 20 10:23:27 machine systemd[1]: systemd-timedated.service: Deactivated successfully. feb 20 10:23:27 machine audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-timedated comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' feb 20 10:23:27 machine audit: BPF prog-id=87 op=UNLOAD feb 20 10:23:27 machine audit: BPF prog-id=86 op=UNLOAD feb 20 10:23:27 machine audit: BPF prog-id=85 op=UNLOAD feb 20 10:24:01 machine systemd[4858]: Starting grub-boot-success.service - Mark boot as successful... feb 20 10:24:01 machine systemd[4858]: Finished grub-boot-success.service - Mark boot as successful. feb 20 10:24:35 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment. feb 20 10:24:38 machine kernel: ================================================================== feb 20 10:24:38 machine kernel: BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1ce/0x710 [amdgpu] feb 20 10:24:38 machine kernel: Use-after-free read at 0x00000000299fdded (in kfence-#202): feb 20 10:24:38 machine kernel: amdgpu_bo_move+0x1ce/0x710 [amdgpu] feb 20 10:24:38 machine kernel: ttm_bo_handle_move_mem+0xbb/0x170 [ttm] feb 20 10:24:38 machine kernel: ttm_mem_evict_first+0x201/0x530 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_mem_space+0x1cd/0x230 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_validate+0x95/0x180 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_init_reserved+0x146/0x170 [ttm] feb 20 10:24:38 machine kernel: amdgpu_bo_create+0x1ee/0x4e0 [amdgpu] feb 20 10:24:38 machine kernel: amdgpu_bo_create_user+0x40/0x70 [amdgpu] feb 20 10:24:38 machine kernel: amdgpu_gem_create_ioctl+0x168/0x3d0 [amdgpu] feb 20 10:24:38 machine kernel: drm_ioctl_kernel+0xd6/0x180 feb 20 10:24:38 machine kernel: drm_ioctl+0x26d/0x4b0 feb 20 10:24:38 machine kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] feb 20 10:24:38 machine kernel: __x64_sys_ioctl+0x97/0xd0 feb 20 10:24:38 machine kernel: do_syscall_64+0x64/0xe0 feb 20 10:24:38 machine kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 feb 20 10:24:38 machine kernel: feb 20 10:24:38 machine kernel: kfence-#202: 0x00000000b96893df-0x000000002f05581e, size=96, cache=kmalloc-96 feb 20 10:24:38 machine kernel: allocated by task 6129 on cpu 10 at 346.937674s: feb 20 10:24:38 machine kernel: __kmem_cache_alloc_node+0x2a7/0x2e0 feb 20 10:24:38 machine kernel: kmalloc_trace+0x2a/0xa0 feb 20 10:24:38 machine kernel: amdgpu_vram_mgr_new+0x91/0x3a0 [amdgpu] feb 20 10:24:38 machine kernel: ttm_resource_alloc+0x3b/0x80 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_mem_space+0x17b/0x230 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_validate+0x95/0x180 [ttm] feb 20 10:24:38 machine kernel: amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] feb 20 10:24:38 machine kernel: amdgpu_cs_parser_bos.isra.0+0x4c3/0x820 [amdgpu] feb 20 10:24:38 machine kernel: amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu] feb 20 10:24:38 machine kernel: drm_ioctl_kernel+0xd6/0x180 feb 20 10:24:38 machine kernel: drm_ioctl+0x26d/0x4b0 feb 20 10:24:38 machine kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] feb 20 10:24:38 machine kernel: __x64_sys_ioctl+0x97/0xd0 feb 20 10:24:38 machine kernel: do_syscall_64+0x64/0xe0 feb 20 10:24:38 machine kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 feb 20 10:24:38 machine kernel: feb 20 10:24:38 machine kernel: freed by task 6114 on cpu 2 at 352.275279s: feb 20 10:24:38 machine kernel: ttm_resource_free+0x6b/0x80 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_move_accel_cleanup+0x21d/0x2a0 [ttm] feb 20 10:24:38 machine kernel: amdgpu_bo_move+0x19b/0x710 [amdgpu] feb 20 10:24:38 machine kernel: ttm_bo_handle_move_mem+0xbb/0x170 [ttm] feb 20 10:24:38 machine kernel: ttm_mem_evict_first+0x201/0x530 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_mem_space+0x1cd/0x230 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_validate+0x95/0x180 [ttm] feb 20 10:24:38 machine kernel: ttm_bo_init_reserved+0x146/0x170 [ttm] feb 20 10:24:38 machine kernel: amdgpu_bo_create+0x1ee/0x4e0 [amdgpu] feb 20 10:24:38 machine kernel: amdgpu_bo_create_user+0x40/0x70 [amdgpu] feb 20 10:24:38 machine kernel: amdgpu_gem_create_ioctl+0x168/0x3d0 [amdgpu] feb 20 10:24:38 machine kernel: drm_ioctl_kernel+0xd6/0x180 feb 20 10:24:38 machine kernel: drm_ioctl+0x26d/0x4b0 feb 20 10:24:38 machine kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] feb 20 10:24:38 machine kernel: __x64_sys_ioctl+0x97/0xd0 feb 20 10:24:38 machine kernel: do_syscall_64+0x64/0xe0 feb 20 10:24:38 machine kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 feb 20 10:24:38 machine kernel: feb 20 10:24:38 machine kernel: CPU: 2 PID: 6114 Comm: Renderer Tainted: G B 6.7.4-200.fc39.x86_64 #1 feb 20 10:24:38 machine kernel: Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING/X470 AORUS ULTRA GAMING-CF, BIOS F64e 09/20/2023 feb 20 10:24:38 machine kernel: ================================================================== feb 20 10:24:38 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment. feb 20 10:24:39 machine rtkit-daemon[3778]: Successfully made thread 9338 of process 8141 (/usr/lib64/firefox/firefox) owned by '1000' RT at priority 10. feb 20 10:24:39 machine abrt-dump-journal-oops[3874]: abrt-dump-journal-oops: Found oopses: 1 feb 20 10:24:39 machine abrt-dump-journal-oops[3874]: abrt-dump-journal-oops: Creating problem directories feb 20 10:24:40 machine abrt-dump-journal-oops[3874]: Reported 1 kernel oopses to Abrt feb 20 10:24:41 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment. feb 20 10:24:41 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment. feb 20 10:24:41 machine abrt-server[9422]: Can't find a meaningful backtrace for hashing in '.' feb 20 10:24:41 machine abrt-server[9422]: Deleting non-reportable oops '.' because DropNotReportableOopses is set to 'yes' feb 20 10:24:41 machine abrt-server[9422]: 'post-create' on '/var/spool/abrt/oops-2024-02-20-10:24:39-3874-0' exited with 1 feb 20 10:24:41 machine abrt-server[9422]: Deleting problem directory '/var/spool/abrt/oops-2024-02-20-10:24:39-3874-0' feb 20 10:24:41 machine abrt-server[9422]: Lock file '.lock' was locked by process 9551, but it crashed? feb 20 10:24:42 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment. feb 20 10:24:42 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment. feb 20 10:24:49 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment. feb 20 10:24:54 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment. feb 20 10:24:55 machine plasmashell[6024]: ATTENTION: default value of option mesa_glthread overridden by environment.
Edited by Germano Massullo- Developer
Can one of you who can reproduce reliably bisect?
Collapse replies - Author
Yes, I'm able to bisect.
Just occurred on my machine when waking from suspend overnight: Fedora 39, GNOME on Xorg kernel-6.7.5-200.fc39.x86_64 Radeon RX 6700
[24243.114426] PM: suspend exit [24243.127953] ================================================================== [24243.127955] BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1ce/0x710 [amdgpu] [24243.128291] Use-after-free read at 0x0000000041f9f67e (in kfence-#230): [24243.128293] amdgpu_bo_move+0x1ce/0x710 [amdgpu] [24243.128402] ttm_bo_handle_move_mem+0xbb/0x170 [ttm] [24243.128402] ttm_bo_validate+0xe5/0x180 [ttm] [24243.128402] amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] [24243.128402] amdgpu_vm_validate_pt_bos+0xbd/0x380 [amdgpu] [24243.128402] amdgpu_cs_parser_bos.isra.0+0x490/0x820 [amdgpu] [24243.128402] amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu] [24243.129490] drm_ioctl_kernel+0xd6/0x180 [24243.129490] drm_ioctl+0x26d/0x4b0 [24243.129490] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [24243.129490] __x64_sys_ioctl+0x97/0xd0 [24243.129490] do_syscall_64+0x64/0xe0 [24243.129490] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [24243.129490] kfence-#230: 0x00000000cc2df3da-0x00000000b22e5021, size=240, cache=kmalloc-256 [24243.129490] allocated by task 738643 on cpu 0 at 24238.862449s: [24243.129490] __kmem_cache_alloc_node+0x2a7/0x2e0 [24243.129490] kmalloc_trace+0x2a/0xa0 [24243.129490] amdgpu_gtt_mgr_new+0x40/0x140 [amdgpu] [24243.129490] ttm_resource_alloc+0x3b/0x80 [ttm] [24243.129490] ttm_bo_mem_space+0x88/0x230 [ttm] [24243.129490] ttm_mem_evict_first+0x1c6/0x530 [ttm] [24243.129490] ttm_resource_manager_evict_all+0xa7/0x1d0 [ttm] [24243.129490] amdgpu_device_prepare+0x54/0xf0 [amdgpu] [24243.129490] pci_pm_prepare+0x34/0x70 [24243.129490] dpm_prepare+0x269/0x440 [24243.129490] dpm_suspend_start+0x1e/0x90 [24243.129490] suspend_devices_and_enter+0x16a/0x970 [24243.129490] pm_suspend+0x25e/0x590 [24243.129490] state_store+0x6c/0xd0 [24243.129490] kernfs_fop_write_iter+0x136/0x1d0 [24243.129490] vfs_write+0x23d/0x400 [24243.129490] ksys_write+0x6f/0xf0 [24243.129490] do_syscall_64+0x64/0xe0 [24243.129490] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [24243.129490] freed by task 2645 on cpu 0 at 24243.127947s: [24243.131442] ttm_resource_free+0x6b/0x80 [ttm] [24243.131442] ttm_bo_move_accel_cleanup+0xc8/0x2a0 [ttm] [24243.131442] amdgpu_bo_move+0x5d0/0x710 [amdgpu] [24243.131829] ttm_bo_handle_move_mem+0xbb/0x170 [ttm] [24243.131829] ttm_bo_validate+0xe5/0x180 [ttm] [24243.131829] amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu] [24243.131829] amdgpu_vm_validate_pt_bos+0xbd/0x380 [amdgpu] [24243.132213] amdgpu_cs_parser_bos.isra.0+0x490/0x820 [amdgpu] [24243.132213] amdgpu_cs_ioctl+0xa2d/0x1a30 [amdgpu] [24243.132213] drm_ioctl_kernel+0xd6/0x180 [24243.132213] drm_ioctl+0x26d/0x4b0 [24243.132213] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [24243.133156] __x64_sys_ioctl+0x97/0xd0 [24243.133156] do_syscall_64+0x64/0xe0 [24243.133156] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [24243.133156] CPU: 0 PID: 2645 Comm: Xorg:cs0 Not tainted 6.7.5-200.fc39.x86_64 #1 [24243.133156] Hardware name: MSI MS-7751/Z77A-GD65 (MS-7751), BIOS V10.11 10/09/2013 [24243.133156] ==================================================================
Just adding to the count, I too am hitting this, sometimes when displays wake, but other times just entirely randomly, sometimes twice in a row.
Kernel is
6.7.6-arch1-1
, distro is Arch Linux if that's relevant.I'm running 2 AMD Vega 56's, but only using one for display output at the moment.
[ 361.583741] ================================================================== [ 361.583748] BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1ce/0x710 [amdgpu] [ 361.584108] Use-after-free read at 0x0000000059f72e49 (in kfence-#243): [ 361.584112] amdgpu_bo_move+0x1ce/0x710 [amdgpu] [ 361.584370] ttm_bo_handle_move_mem+0xbb/0x170 [ttm] [ 361.584379] ttm_mem_evict_first+0x201/0x530 [ttm] [ 361.584387] ttm_bo_mem_space+0x1cd/0x230 [ttm] [ 361.584395] ttm_bo_validate+0x95/0x180 [ttm] [ 361.584402] ttm_bo_init_reserved+0x146/0x170 [ttm] [ 361.584410] amdgpu_bo_create+0x1ee/0x4e0 [amdgpu] [ 361.584666] amdgpu_bo_create_user+0x40/0x70 [amdgpu] [ 361.584922] amdgpu_gem_create_ioctl+0x168/0x3d0 [amdgpu] [ 361.585181] drm_ioctl_kernel+0xd6/0x180 [ 361.585186] drm_ioctl+0x26d/0x4b0 [ 361.585188] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [ 361.585439] __x64_sys_ioctl+0x97/0xd0 [ 361.585443] do_syscall_64+0x64/0xe0 [ 361.585448] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 361.585454] kfence-#243: 0x00000000a7b22fba-0x000000008118403f, size=96, cache=kmalloc-96 [ 361.585457] allocated by task 12101 on cpu 1 at 325.716395s: [ 361.586131] __kmem_cache_alloc_node+0x2a6/0x2e0 [ 361.586135] kmalloc_trace+0x2a/0xa0 [ 361.586137] amdgpu_vram_mgr_new+0x91/0x3a0 [amdgpu] [ 361.586401] ttm_resource_alloc+0x3b/0x80 [ttm] [ 361.586409] ttm_bo_mem_space+0x17b/0x230 [ttm] [ 361.586417] ttm_bo_validate+0x95/0x180 [ttm] [ 361.586424] ttm_bo_init_reserved+0x146/0x170 [ttm] [ 361.586432] amdgpu_bo_create+0x1ee/0x4e0 [amdgpu] [ 361.586687] amdgpu_bo_create_user+0x40/0x70 [amdgpu] [ 361.586945] amdgpu_gem_create_ioctl+0x168/0x3d0 [amdgpu] [ 361.587203] drm_ioctl_kernel+0xd6/0x180 [ 361.587206] drm_ioctl+0x26d/0x4b0 [ 361.587208] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [ 361.587459] __x64_sys_ioctl+0x97/0xd0 [ 361.587461] do_syscall_64+0x64/0xe0 [ 361.587464] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 361.587467] freed by task 12131 on cpu 8 at 361.583731s: [ 361.588150] ttm_resource_free+0x6b/0x80 [ttm] [ 361.588159] ttm_bo_move_accel_cleanup+0x21d/0x2a0 [ttm] [ 361.588167] amdgpu_bo_move+0x19b/0x710 [amdgpu] [ 361.588423] ttm_bo_handle_move_mem+0xbb/0x170 [ttm] [ 361.588431] ttm_mem_evict_first+0x201/0x530 [ttm] [ 361.588438] ttm_bo_mem_space+0x1cd/0x230 [ttm] [ 361.588446] ttm_bo_validate+0x95/0x180 [ttm] [ 361.588453] ttm_bo_init_reserved+0x146/0x170 [ttm] [ 361.588461] amdgpu_bo_create+0x1ee/0x4e0 [amdgpu] [ 361.588717] amdgpu_bo_create_user+0x40/0x70 [amdgpu] [ 361.588974] amdgpu_gem_create_ioctl+0x168/0x3d0 [amdgpu] [ 361.589235] drm_ioctl_kernel+0xd6/0x180 [ 361.589238] drm_ioctl+0x26d/0x4b0 [ 361.589240] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [ 361.589492] __x64_sys_ioctl+0x97/0xd0 [ 361.589494] do_syscall_64+0x64/0xe0 [ 361.589497] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 361.589502] CPU: 8 PID: 12131 Comm: chromium:gdrv0 Tainted: P OE 6.7.6-arch1-1 #1 92d1e939a2710641cdadd5e5b8601f67b3474c0a [ 361.589507] Hardware name: System manufacturer System Product Name/PRIME X399-A, BIOS 1002 02/15/2019 [ 361.589509] ==================================================================
Edited by Aki Van NessI am also hitting this, same kfence output as above, on a Fedora 39 with kernel 6.7.5-200.fc39.x86_64. The error occurs occasionally when plugging an external monitor (through an USB-C cable).
I looked at the amd-gfx mailing-list where development happens (archives), here is what I currently understand:
-
this bug and other issues were reported in February by Joonkyo Jung:
- not clearly related: Reporting a use-after-free in amdgpu
- not clearly related: Reporting a null-ptr-deref in amdgpu
- the current issue: Reporting a slab-use-after-free in amdgpu
-
Vitaly Prozyak posted a patch to fix the use-after-free issue (1) on March 7th (yesterday), and Christian König is in the process of reviewing it : https://lists.freedesktop.org/archives/amd-gfx/2024-March/105158.html
-
It is not clear to me whether fixing issue (1) will also fix issue (3) (the current issue). (Vitaly Prozyak suggests that (3) may be fixed, but Joonkyo Jung replies that it can still be reproduced.)
My current summary of the situation is that people at AMD are looking at this issue right now, and that hopefully a fix will be available in the next few days/weeks.
-
Journalctl:
мар 12 13:01:20 archlinux kwin_wayland[613]: kf.coreaddons: Even a brand-new cache starts off corru> мар 12 13:02:03 archlinux kwin_wayland[613]: kwin_scene_opengl: Invalid framebuffer status: "GL_FR> мар 12 13:02:03 archlinux kwin_wayland[613]: kwin_scene_opengl: Can't enable invalid framebuffer ob> мар 12 13:02:03 archlinux kwin_wayland[613]: kwin_scene_opengl: Invalid framebuffer status: "GL_FR> мар 12 13:02:03 archlinux kwin_wayland[613]: kwin_scene_opengl: Can't enable invalid framebuffer ob> мар 12 13:02:03 archlinux kwin_wayland[613]: kwin_scene_opengl: Invalid framebuffer status: "GL_FR> мар 12 13:02:03 archlinux kwin_wayland[613]: kwin_scene_opengl: Can't enable invalid framebuffer ob> мар 12 13:02:04 archlinux kernel: ================================================================== мар 12 13:02:04 archlinux kernel: BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1db/0x810 [a> мар 12 13:02:04 archlinux kernel: Use-after-free read at 0x000000008524eb3c (in kfence-#240): мар 12 13:02:04 archlinux kernel: мар 12 13:02:04 archlinux kernel: мар 12 13:02:04 archlinux kernel: CPU: 3 PID: 1382 Comm: vlc:cs0 Not tainted 6.7.9-zen1-1-zen #1 f2> мар 12 13:02:04 archlinux kernel: Hardware name: ASUSTeK COMPUTER INC. X540YA/X540YA, BIOS X540YA.3> мар 12 13:02:04 archlinux kernel: ================================================================== мар 12 13:02:08 archlinux kwin_wayland[613]: kwin_scene_opengl: Invalid framebuffer status: "GL_FR> мар 12 13:02:08 archlinux kwin_wayland[613]: kwin_scene_opengl: Can't enable invalid framebuffer ob> мар 12 13:02:08 archlinux kwin_wayland[613]: kwin_scene_opengl: Invalid framebuffer status: "GL_FR> мар 12 13:02:08 archlinux kwin_wayland[613]: kwin_scene_opengl: Can't enable invalid framebuffer ob> мар 12 13:02:10 archlinux kwin_wayland[613]: kwin_scene_opengl: Invalid framebuffer status: "GL_FR> мар 12 13:02:10 archlinux kwin_wayland[613]: kwin_scene_opengl: Can't enable invalid framebuffer ob> мар 12 13:02:33 archlinux kwin_wayland[613]: kwin_scene_opengl: Invalid framebuffer status: "GL_FR> мар 12 13:02:33 archlinux kwin_wayland[613]: kwin_scene_opengl: Can't enable invalid framebuffer ob>
dmesg:
[ 0.560566] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0x9d419000-0x9d419fff flags 0x200] vs 9d419000 4000 [ 0.560589] tpm_crb MSFT0101:00: can't request region for resource [mem 0x9d419000-0x9d419fff] [ 12.938905] kfd kfd: amdgpu: MULLINS not supported in kfd [ 152.716032] ================================================================== [ 152.716051] BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1db/0x810 [amdgpu] [ 152.717951] Use-after-free read at 0x000000008524eb3c (in kfence-#240): [ 152.750746] CPU: 3 PID: 1382 Comm: vlc:cs0 Not tainted 6.7.9-zen1-1-zen #1 f2bc3439c8fc885fedf6af86e989d79ca23afd6a [ 152.750756] Hardware name: ASUSTeK COMPUTER INC. X540YA/X540YA, BIOS X540YA.323 12/31/2019 [ 152.750760] ==================================================================
I accidentally noticed it in the logs, after which it happened, I still don’t understand.
v6.8 is affected too:
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c7)
PM: suspend exit ================================================================== BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1de/0x810 [amdgpu] Use-after-free read at 0x0000000061fc678c (in kfence-#151): amdgpu_bo_move+0x1de/0x810 [amdgpu] ttm_bo_validate+0x154/0x370 [ttm] amdgpu_cs_bo_validate+0x9c/0x2d0 [amdgpu] amdgpu_vm_validate_pt_bos+0xc2/0x4a0 [amdgpu] amdgpu_cs_parser_bos.isra.0+0x496/0x820 [amdgpu] amdgpu_cs_ioctl+0xa7e/0x1cd0 [amdgpu] drm_ioctl_kernel+0xb5/0x110 drm_ioctl+0x26d/0x4b0 amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x89/0x170 entry_SYSCALL_64_after_hwframe+0x6e/0x76 kfence-#151: 0x00000000cef79426-0x000000000e01a8c9, size=240, cache=kmalloc-256 allocated by task 50665 on cpu 12 at 27588.259455s: kmalloc_trace+0x237/0x490 amdgpu_gtt_mgr_new+0x40/0x140 [amdgpu] ttm_resource_alloc+0x45/0x190 [ttm] ttm_bo_mem_space+0x89/0x230 [ttm] ttm_mem_evict_first+0x290/0x6c0 [ttm] ttm_resource_manager_evict_all+0xa7/0x1d0 [ttm] amdgpu_device_prepare+0x54/0xf0 [amdgpu] pci_pm_prepare+0x34/0x70 dpm_prepare+0x550/0x790 dpm_suspend_start+0x1e/0x2c0 suspend_devices_and_enter+0x168/0xa30 pm_suspend+0x2b1/0x5c0 state_store+0xbc/0x140 kernfs_fop_write_iter+0x122/0x200 vfs_write+0x2d7/0x4c0 __x64_sys_write+0x74/0xf0 do_syscall_64+0x89/0x170 entry_SYSCALL_64_after_hwframe+0x6e/0x76 freed by task 2336 on cpu 30 at 27590.728676s: ttm_resource_free+0x83/0x190 [ttm] ttm_bo_move_accel_cleanup+0xc8/0x2a0 [ttm] amdgpu_bo_move+0x1a6/0x810 [amdgpu] ttm_bo_validate+0x154/0x370 [ttm] amdgpu_cs_bo_validate+0x9c/0x2d0 [amdgpu] amdgpu_vm_validate_pt_bos+0xc2/0x4a0 [amdgpu] amdgpu_cs_parser_bos.isra.0+0x496/0x820 [amdgpu] amdgpu_cs_ioctl+0xa7e/0x1cd0 [amdgpu] drm_ioctl_kernel+0xb5/0x110 drm_ioctl+0x26d/0x4b0 amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x89/0x170 entry_SYSCALL_64_after_hwframe+0x6e/0x76 CPU: 30 PID: 2336 Comm: firefox:cs0 Not tainted 6.8.0-pf1 #1 9eb776b240dc554d9fffabffd9e26a31ff7d84b0 Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 4702 10/20/2023 ==================================================================
Edited by Oleksandr Natalenko- Oleksandr Natalenko mentioned in issue #3259 (closed)
mentioned in issue #3259 (closed)
- Alex Deucher marked #3259 (closed) as a duplicate of this issue
marked #3259 (closed) as a duplicate of this issue
- Alex Deucher marked this issue as related to #3259 (closed)
marked this issue as related to #3259 (closed)
Journalctl error appears when I open the code editor (as Electron app) after my computer woke up from sleep 2 hours ago.
Only 5 text lines:
================================================================== BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1de/0x7a0 [amdgpu] Use-after-free read at 0x0000000092b1500e (in kfence-#97): CPU: 4 PID: 28722 Comm: electron:cs0 Not tainted 6.7.10-273-tkg-eevdf #1 c365953fb9e9ab0017efb807da31a4987a09a94d Hardware name: Gigabyte Technology Co., Ltd. X570S UD/X570S UD, BIOS F6 02/15/2024 ==================================================================
My hardware info:
- Ryzen 5800X
- Radeon RX5700
- Two same monitors 1440P@60FPS
Edited by ZeskoA speculative fix could be https://lists.freedesktop.org/archives/amd-gfx/2024-March/105648.html but I would need amdgpu experts to chime in.
@tursulin thanks a bunch! I don't know anything about this codebase myself but it looks fairly reasonable (in particular, unlikely to introduce regressions). It's very useful to have people step in to offer a fix.
Two comments:
- Nitpick: I would have initialized
old_mem_type
at the beginning to make the flow easier to read (see below). - I think that you could include a pointer to the present bugreport in the amd-gfx discussion, which has evidence that a fair number of users hit this issue on a regular basis, and may help prioritize reviewing and/or make maintainers happy about you submitting a patch.
uint32_t old_mem_type = (old_mem ? old_mem->mem_type : 0); // or maybe -1 as 0 is the valid TTM_PL_SYSTEM value
- Nitpick: I would have initialized
Collapse replies - Maybe but then you have to check old_mem for NULL twice, and/or have the conundrum from your -1 comment to solve.
- It was there, see the Closes: tag. ;)
Note Christian has floated a different solution to the problem since: https://lists.freedesktop.org/archives/amd-gfx/2024-March/105680.html
- Developer
Thanks @tursulin for investigating the issue! I confirm that Christian's fix (https://lists.freedesktop.org/archives/amd-gfx/2024-March/105680.html) is working fine.
I have created the IGT test https://patchwork.freedesktop.org/patch/584984/?series=131671&rev=1 which allows it to run on our CI.
Edited by Vitaly Prosyak
@vitalyp thank you for your work on related bugs. Do you have an idea of when Christian's fix may be made available for users? I don't see it listed in Alex Deucher's fixes for 6.9.
- Oleksandr Natalenko mentioned in issue #3310 (closed)
mentioned in issue #3310 (closed)
Thank you for your quick fixes to the bug!
As forementioned thankfully by @gasche, I also reported the bug at: Reporting a slab-use-after-free in amdgpu.
Would it be possible for me, and all the reporters above, to get an acknowledgement on the patch that @ckoenig wrote for this issue?
Collapse replies Does this patch have a prerequisite for v6.8 kernel? I tried applying it on top of bare v6.8, but got lots of
NULL
pointer dereferences inkworker
s, so I reverted it.
- Michel Dänzer marked #3310 (closed) as a duplicate of this issue
marked #3310 (closed) as a duplicate of this issue
- Michel Dänzer marked this issue as related to #3310 (closed)
marked this issue as related to #3310 (closed)
Seen in 6.7.12 Manjaro while using VirtualBox. I understand that 6.6 should be OK, 6.7 and 6.8 are both affected, 6.7 won't receive fix because it's over, 6.8 will possibly be fixed but at which point release is so far unknown. Is it so?
- Vitaly Prosyak mentioned in commit igt-gpu-tools@d564783b
mentioned in commit igt-gpu-tools@d564783b
I'm also affected; dmesg log after resuming from suspend:
==================================================…===== BUG: KFENCE: use-after-free read in amdgpu_bo_move…dgpu] Use-after-free read at 0x000000003ecf4900 (in kfen…#82): amdgpu_bo_move+0x1ca/0x710 [amdgpu] ttm_bo_handle_move_mem+0xb8/0x170 [ttm] ttm_bo_validate+0xde/0x180 [ttm] amdgpu_cs_bo_validate+0x98/0x2e0 [amdgpu] amdgpu_vm_validate+0xb9/0x510 [amdgpu] amdgpu_cs_parser_bos.isra.0+0x491/0x820 [amdgpu] amdgpu_cs_ioctl+0x9e4/0x1940 [amdgpu] drm_ioctl_kernel+0xae/0x100 [drm] drm_ioctl+0x270/0x4e0 [drm] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu] __x64_sys_ioctl+0x90/0xd0 do_syscall_64+0x6b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e kfence-#82: 0x00000000b02ab617-0x0000000043e790e8,…c-256 allocated by task 10717 on cpu 1 at 14801.124124s: kmalloc_trace+0x27d/0x330 amdgpu_gtt_mgr_new+0x3c/0x140 [amdgpu] ttm_resource_alloc+0x34/0x80 [ttm] ttm_bo_mem_space+0x8e/0x230 [ttm] ttm_mem_evict_first+0x28f/0x520 [ttm] ttm_resource_manager_evict_all+0xa3/0x1d0 [ttm] amdgpu_device_prepare+0x52/0xf0 [amdgpu] pci_pm_prepare+0x2d/0x70 dpm_prepare+0x25e/0x430 dpm_suspend_start+0x1a/0x60 suspend_devices_and_enter+0x13a/0x930 pm_suspend+0x1fa/0x500 state_store+0x68/0xd0 kernfs_fop_write_iter+0x12f/0x1c0 vfs_write+0x28f/0x460 ksys_write+0x6b/0xf0 do_syscall_64+0x6b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e freed by task 1096 on cpu 0 at 14848.975680s: ttm_resource_free+0x64/0x80 [ttm] ttm_bo_move_accel_cleanup+0xc4/0x2a0 [ttm] amdgpu_bo_move+0x5ca/0x710 [amdgpu] ttm_bo_handle_move_mem+0xb8/0x170 [ttm] ttm_bo_validate+0xde/0x180 [ttm] amdgpu_cs_bo_validate+0x98/0x2e0 [amdgpu] amdgpu_vm_validate+0xb9/0x510 [amdgpu] amdgpu_cs_parser_bos.isra.0+0x491/0x820 [amdgpu] amdgpu_cs_ioctl+0x9e4/0x1940 [amdgpu] drm_ioctl_kernel+0xae/0x100 [drm] drm_ioctl+0x270/0x4e0 [drm] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu] __x64_sys_ioctl+0x90/0xd0 do_syscall_64+0x6b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e
Graphics: AMD Navi 23 [Radeon RX 6600/6600 XT/6600M]
patch attached by @tursulin fixes the issue. Tested on kernels 6.6.x, 6.7.x, 6.8.x and 6.9.0-rc4
- Christian König closed
closed
- Christian König mentioned in commit agd5f/linux@ffda7081
mentioned in commit agd5f/linux@ffda7081
- Christian König mentioned in commit agd5f/linux@d3a9331a
mentioned in commit agd5f/linux@d3a9331a
- Alex Deucher mentioned in issue #3342 (moved)
mentioned in issue #3342 (moved)
- Christian König mentioned in commit nouveau@5c25b169
mentioned in commit nouveau@5c25b169
- Christian König mentioned in commit nouveau@0c7ed3ed
mentioned in commit nouveau@0c7ed3ed
- Christian König mentioned in commit nouveau@9a4f6e13
mentioned in commit nouveau@9a4f6e13