stack trace in kernel 6.6 using opencl
I am getting the following stack trace on fedora 39 using rocm opencl with kernel 6.6.1 and 6.6.2 fedora 39. Working fine on kernel 6.5.10. Windows were hung using the mesa opencl. Traces caused by running the "clpeak" program.
[ 423.564046] amdgpu: Failed to reserve buffers in ttm.
[ 423.564319] amdgpu: Failed to reserve buffers in ttm.
[ 423.568978] amdgpu: Failed to reserve buffers in ttm.
[ 423.570432] ------------[ cut here ]------------
[ 423.570434] WARNING: CPU: 6 PID: 168 at drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:1518 amdgpu_amdkfd_gpuvm_destroy_cb+0x116/0x120 [amdgpu]
[ 423.570694] Modules linked in: snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat bridge stp llc rfkill vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nft_masq nft_chain_nat nf_nat nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun it87(OE) hwmon_vid sunrpc binfmt_misc vfat fat xfs snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_usb_audio snd_hda_intel joydev uvcvideo snd_intel_dspcfg intel_rapl_msr uvc snd_intel_sdw_acpi videobuf2_vmalloc intel_rapl_common snd_usbmidi_lib videobuf2_memops snd_hda_codec videobuf2_v4l2 videobuf2_common snd_hda_core edac_mce_amd snd_ump snd_hwdep videodev snd_seq snd_rawmidi kvm_amd mc snd_seq_device snd_pcm kvm irqbypass snd_timer rapl snd gigabyte_wmi wmi_bmof pcspkr i2c_piix4 k10temp soundcore gpio_amdpt gpio_generic squashfs loop zram amdgpu i2c_algo_bit drm_ttm_helper ttm video drm_exec drm_suballoc_helper amdxcp
[ 423.570762] drm_buddy crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched polyval_clmulni polyval_generic nvme drm_display_helper ghash_clmulni_intel uas r8169 cec usb_storage sha512_ssse3 ccp nvme_core sp5100_tco nvme_common wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath i2c_dev fuse
[ 423.570786] CPU: 6 PID: 168 Comm: kworker/6:1 Tainted: G OE 6.6.1-300.fc39.x86_64 #1
[ 423.570789] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS ELITE/B550M AORUS ELITE, BIOS FB 11/14/2022
[ 423.570791] Workqueue: events delayed_fput
[ 423.570796] RIP: 0010:amdgpu_amdkfd_gpuvm_destroy_cb+0x116/0x120 [amdgpu]
[ 423.571007] Code: df 5b 5d 41 5c e9 ea 16 cc f4 5b 5d 41 5c e9 21 39 92 f5 e8 5c b8 45 f5 eb cc be 03 00 00 00 e8 e0 c6 13 f5 eb c0 0f 0b eb 82 <0f> 0b eb 8b 0f 0b eb 94 66 90 90 90 90 90 90 90 90 90 90 90 90 90
[ 423.571010] RSP: 0018:ffffc900007ffcc0 EFLAGS: 00010206
[ 423.571013] RAX: ffff88827f18b020 RBX: ffff88827f18b000 RCX: ffff88827f18b000
[ 423.571014] RDX: ffff888338f1b648 RSI: ffff888121d7a730 RDI: ffff88827f18b040
[ 423.571016] RBP: ffff888121d7a000 R08: 0000000000000000 R09: 000000008020001e
[ 423.571018] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88827f18b040
[ 423.571019] R13: ffff88812c9e8400 R14: 0000000000000000 R15: ffff888363c00001
[ 423.571021] FS: 0000000000000000(0000) GS:ffff888c0eb80000(0000) knlGS:0000000000000000
[ 423.571023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 423.571025] CR2: 0000046e03230000 CR3: 0000000418222000 CR4: 0000000000f50ee0
[ 423.571027] PKRU: 55555554
[ 423.571028] Call Trace:
[ 423.571031] <TASK>
[ 423.571033] ? amdgpu_amdkfd_gpuvm_destroy_cb+0x116/0x120 [amdgpu]
[ 423.571239] ? __warn+0x81/0x130
[ 423.571244] ? amdgpu_amdkfd_gpuvm_destroy_cb+0x116/0x120 [amdgpu]
[ 423.571415] ? report_bug+0x171/0x1a0
[ 423.571419] ? handle_bug+0x3c/0x80
[ 423.571422] ? exc_invalid_op+0x17/0x70
[ 423.571425] ? asm_exc_invalid_op+0x1a/0x20
[ 423.571429] ? amdgpu_amdkfd_gpuvm_destroy_cb+0x116/0x120 [amdgpu]
[ 423.571580] amdgpu_vm_fini+0x49/0x550 [amdgpu]
[ 423.571715] amdgpu_driver_postclose_kms+0x191/0x280 [amdgpu]
[ 423.571841] drm_file_free+0x21c/0x270
[ 423.571845] drm_release+0x74/0xf0
[ 423.571847] __fput+0xf5/0x290
[ 423.571850] delayed_fput+0x23/0x30
[ 423.571852] process_one_work+0x174/0x340
[ 423.571856] worker_thread+0x27b/0x3a0
[ 423.571858] ? __pfx_worker_thread+0x10/0x10
[ 423.571860] kthread+0xe8/0x120
[ 423.571864] ? __pfx_kthread+0x10/0x10
[ 423.571866] ret_from_fork+0x34/0x50
[ 423.571870] ? __pfx_kthread+0x10/0x10
[ 423.571872] ret_from_fork_asm+0x1b/0x30
[ 423.571878] </TASK>
[ 423.571879] ---[ end trace 0000000000000000 ]---
[ 423.573753] amdgpu 0000:05:00.0: amdgpu: still active bo inside vm
[ 456.830963] amdgpu: Failed to reserve buffers in ttm.
[ 456.831214] amdgpu: Failed to reserve buffers in ttm.
[ 456.835995] amdgpu: Failed to reserve buffers in ttm.
[ 456.837572] ------------[ cut here ]------------
hardware
MATH16-16 >> inxi -C -m -G
Memory:
System RAM: total: 48 GiB available: 46.96 GiB used: 6.46 GiB (13.8%)
RAM Report: permissions: Unable to run dmidecode. Root privileges
required.
CPU:
Info: 8-core model: AMD Ryzen 7 5800X bits: 64 type: MT MCP cache: L2: 4 MiB
Speed (MHz): avg: 3031 min/max: 550/5085 cores: 1: 3589 2: 3528 3: 550
4: 550 5: 550 6: 3590 7: 3566 8: 3496 9: 4126 10: 3603 11: 3583 12: 3582
13: 3593 14: 3593 15: 3583 16: 3426
Graphics:
Device-1: AMD Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
driver: amdgpu v: kernel
Device-2: Logitech HD Webcam C615 driver: snd-usb-audio,uvcvideo type: USB
Display: x11 server: X.Org v: 1.20.14 with: Xwayland v: 23.2.2 driver: X:
loaded: amdgpu dri: radeonsi gpu: amdgpu resolution: 1: 1920x1080~60Hz
2: N/A
API: OpenGL v: 4.6 vendor: amd mesa v: 23.2.1 renderer: AMD Radeon RX 570
Series (polaris10 LLVM 16.0.6 DRM 3.54 6.6.1-300.fc39.x86_64)
API: Vulkan v: 1.3.250 drivers: radv,llvmpipe surfaces: xcb,xlib
API: EGL Message: EGL data requires eglinfo. Check --recommends.
Edited by Mike Hedman