5.15.128 dmesg amdgpu_sync_keep_later WARNING hundreds of times a second
Before submitting your bug report:
Brief summary of the problem:
Booting 5.15.128 results in hundreds of logged WARNINGs every second (maximum I saw was 580 in one second):
Aug 30 21:40:37 debian kernel: WARNING: CPU: 11 PID: 1418 at include/linux/dma-fence.h:478 amdgpu_sync_keep_later+0xb1/0xf0 [amdgpu]
Aug 30 21:40:37 debian kernel: Modules linked in: overlay(E) qrtr(E) cmac(E) algif_hash(E) algif_skcipher(E) af_alg(E) bnep(E) binfmt_misc(E) nls_iso8859_1(E) intel_rapl_msr(E) rtl8192cu(E) rtl_usb(E) hp_wmi(E) rtl8192c_common(E) snd_soc_dmic(E) snd_acp3x_pdm_dma(E) rtlwifi(E) snd_acp3x_rn(E) intel_rapl_common(E) sparse_keymap(E) snd_soc_core(E) wmi_bmof(E) platform_profile(E) mac80211(E) kvm_amd(E) btusb(E) libarc4(E) btrtl(E) btbcm(E) btintel(E) cfg80211(E) snd_hda_codec_realtek(E) kvm(E) bluetooth(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) ledtrig_audio(E) uvcvideo(E) videobuf2_vmalloc(E) ecdh_generic(E) ecc(E) videobuf2_memops(E) videobuf2_v4l2(E) irqbypass(E) videobuf2_common(E) rapl(E) pcspkr(E) videodev(E) snd_pci_acp5x(E) snd_hda_intel(E) snd_rn_pci_acp3x(E) snd_intel_dspcfg(E) snd_usb_audio(E) snd_pci_acp3x(E) k10temp(E) snd_hda_codec(E) snd_usbmidi_lib(E) snd_hwdep(E) snd_hda_core(E) snd_rawmidi(E) snd_seq_device(E) snd_pcm(E) mc(E) snd_timer(E) snd(E) ccp(E) soundcore(E) ucsi_acpi(E)
Aug 30 21:40:37 debian kernel: typec_ucsi(E) typec(E) joydev(E) wmi(E) video(E) input_leds(E) amd_pmc(E) acpi_tad(E) serio_raw(E) evbug(E) mac_hid(E) hid_multitouch(E) parport_pc(E) ppdev(E) lp(E) parport(E) efi_pstore(E) dmi_sysfs(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) libcrc32c(E) xor(E) zstd_compress(E) raid6_pq(E) usbmouse(E) hid_microsoft(E) ff_memless(E) usbkbd(E) hid_cmedia(E) r8153_ecm(E) cdc_ether(E) usbnet(E) r8152(E) mii(E) usbhid(E) dm_crypt(E) uas(E) usb_storage(E) amdgpu(E) drm_ttm_helper(E) ttm(E) iommu_v2(E) gpu_sched(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) crct10dif_pclmul(E) crc32_pclmul(E) cec(E) ghash_clmulni_intel(E) hid_generic(E) aesni_intel(E) nvme(E) rc_core(E) xhci_pci(E) crypto_simd(E) cryptd(E) drm(E) i2c_piix4(E) nvme_core(E) amd_sfh(E) xhci_pci_renesas(E) i2c_hid_acpi(E) i2c_hid(E) hid(E)
Aug 30 21:40:37 debian kernel: CPU: 11 PID: 1418 Comm: Xorg:cs0 Tainted: G W E 5.15.128 #655
Aug 30 21:40:37 debian kernel: Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.12 04/11/2023
Aug 30 21:40:37 debian kernel: RIP: 0010:amdgpu_sync_keep_later+0xb1/0xf0 [amdgpu]
Aug 30 21:40:37 debian kernel: Code: 43 38 85 c0 74 53 8d 50 01 09 c2 78 21 49 89 1c 24 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff e9 2f 77 6d e4 <0f> 0b eb aa be 01 00 00 00 e8 41 10 b6 e3 eb d3 be 03 00 00 00 e8
Aug 30 21:40:37 debian kernel: RSP: 0018:ffffa9b2c22ffb30 EFLAGS: 00010293
Aug 30 21:40:37 debian kernel: RAX: ffffffffa6ff5b00 RBX: ffff93b28f00f300 RCX: 0000000000000052
Aug 30 21:40:37 debian kernel: RDX: 0000000000000000 RSI: ffff93b28f00f300 RDI: ffff93b28c6490f8
Aug 30 21:40:37 debian kernel: RBP: ffffa9b2c22ffb58 R08: 0000000000000000 R09: 0000000000000000
Aug 30 21:40:37 debian kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff93b28c6490f8
Aug 30 21:40:37 debian kernel: R13: 0000000000000000 R14: ffff93b28a00e238 R15: ffffa9b2c22ffbe0
Aug 30 21:40:37 debian kernel: FS: 00007ff9a36406c0(0000) GS:ffff93b58e6c0000(0000) knlGS:0000000000000000
Aug 30 21:40:37 debian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 30 21:40:37 debian kernel: CR2: 00007ff99c000020 CR3: 000000010adc8000 CR4: 0000000000750ee0
Aug 30 21:40:37 debian kernel: PKRU: 55555554
Aug 30 21:40:37 debian kernel: Call Trace:
Aug 30 21:40:37 debian kernel: <TASK>
Aug 30 21:40:37 debian kernel: ? show_regs.cold+0x1a/0x1f
Aug 30 21:40:37 debian kernel: ? __warn+0x88/0x120
Aug 30 21:40:37 debian kernel: ? amdgpu_sync_keep_later+0xb1/0xf0 [amdgpu]
Aug 30 21:40:37 debian kernel: ? report_bug+0xb6/0xf0
Aug 30 21:40:37 debian kernel: ? handle_bug+0x38/0x90
Aug 30 21:40:37 debian kernel: ? exc_invalid_op+0x18/0x80
Aug 30 21:40:37 debian kernel: ? asm_exc_invalid_op+0x1b/0x20
Aug 30 21:40:37 debian kernel: ? amdgpu_sync_keep_later+0xb1/0xf0 [amdgpu]
Aug 30 21:40:37 debian kernel: amdgpu_sync_vm_fence+0x23/0x50 [amdgpu]
Aug 30 21:40:37 debian kernel: amdgpu_cs_ioctl+0x16c2/0x2020 [amdgpu]
Aug 30 21:40:37 debian kernel: ? futex_wait_queue_me+0xb7/0x110
Aug 30 21:40:37 debian kernel: ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
Aug 30 21:40:37 debian kernel: drm_ioctl_kernel+0xc2/0x110 [drm]
Aug 30 21:40:37 debian kernel: drm_ioctl+0x29d/0x4d0 [drm]
Aug 30 21:40:37 debian kernel: ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
Aug 30 21:40:37 debian kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
Aug 30 21:40:37 debian kernel: __x64_sys_ioctl+0xa0/0xe0
Aug 30 21:40:37 debian kernel: do_syscall_64+0x5b/0x90
Aug 30 21:40:37 debian kernel: ? exc_page_fault+0x93/0x190
Aug 30 21:40:37 debian kernel: entry_SYSCALL_64_after_hwframe+0x62/0xcc
Aug 30 21:40:37 debian kernel: RIP: 0033:0x7ff9b031cb3b
Aug 30 21:40:37 debian kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Aug 30 21:40:37 debian kernel: RSP: 002b:00007ff9a363f7d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 30 21:40:37 debian kernel: RAX: ffffffffffffffda RBX: 00007ff9a363f958 RCX: 00007ff9b031cb3b
Aug 30 21:40:37 debian kernel: RDX: 00007ff9a363f890 RSI: 00000000c0186444 RDI: 0000000000000010
Aug 30 21:40:37 debian kernel: RBP: 00007ff9a363f890 R08: 00007ff9a363f9b0 R09: 00007ff9a363f870
Aug 30 21:40:37 debian kernel: R10: 0000000000000003 R11: 0000000000000246 R12: 00000000c0186444
Aug 30 21:40:37 debian kernel: R13: 0000000000000010 R14: 00007ff9a363f958 R15: 00005576b7339060
Aug 30 21:40:37 debian kernel: </TASK>
Aug 30 21:40:37 debian kernel: ---[ end trace 99cd28a1a3e9e33a ]---
Bisect result:
4921792e04f2125b5eadef9dbe9417a8354c7eff is the first bad commit
commit 4921792e04f2125b5eadef9dbe9417a8354c7eff
Author: Lang Yu <Lang.Yu@amd.com>
Date: Fri May 5 20:14:15 2023 +0800
drm/amdgpu: install stub fence into potential unused fence pointers
[ Upstream commit 187916e6ed9d0c3b3abc27429f7a5f8c936bd1f0 ]
When using cpu to update page tables, vm update fences are unused.
Install stub fence into these fence pointers instead of NULL
to avoid NULL dereference when calling dma_fence_wait() on them.
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Hardware description: HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.12 04/11/2023
- CPU: AMD Ryzen 7 5800U with Radeon Graphics
- GPU: [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1638] (rev c1)
- System Memory: 16GB
- Display(s): laptop screen 2560x1600
- Type of Display Connection: eDP
System information:
- Debian 12
- Custom kernel: 5.15.128
- AMD official driver version: N/A
How to reproduce the issue:
Boot kernel with userland Debian 12/Xorg/XFCE.
Edited by Chris Bainbridge