Reaching warn on in xe_vm_unlock_dma_resv()
Reproduced in a Alderlake-P and Raptorlake-P.
Commit head: commit 2bf8f260 ("drm/xe: don't auto fall back to execlist mode if guc failed to init")
void xe_vm_unlock_dma_resv(struct xe_vm *vm,
struct ttm_validate_buffer *tv_onstack,
struct ttm_validate_buffer *tv,
struct ww_acquire_ctx *ww,
struct list_head *objs)
{
/*
* Nothing should've been able to enter the list while we were locked,
* since we've held the dma-resvs of all the vm's external objects,
* and holding the dma_resv of an object is required for list
* addition, and we shouldn't add ourselves.
*/
XE_WARN_ON(!list_empty(&vm->notifier.rebind_list));
[ 5613.144337] ------------[ cut here ]------------
[ 5613.149126] WARNING: CPU: 3 PID: 45883 at drivers/gpu/drm/xe/xe_vm.c:504 xe_vm_unlock_dma_resv+0x43/0x50 [xe]
[ 5613.159956] Modules linked in: xe drm_ttm_helper gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy video ttm drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt mei_gsc snd_hda_codec_hdmi mei_pxp mei_hdcp x86_pkg_temp_thermal pmt_telemetry pmt_class coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel kvm_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm ptp i2c_i801 mei_me pps_core wmi_bmof i2c_smbus mei intel_vsec wmi fuse [last unloaded: ttm]
[ 5613.204901] CPU: 2 PID: 45883 Comm: gnome-shell Tainted: G U W 6.2.0+zeh-xe+ #906
[ 5613.213620] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P LP4x RVP, BIOS ADLPFWI1.R00.3031.A03.2201191550 01/19/2022
[ 5613.226398] RIP: 0010:xe_vm_unlock_dma_resv+0x43/0x50 [xe]
[ 5613.232407] Code: 05 20 05 00 00 48 39 c2 75 20 e8 88 b0 ed ff 48 85 db 74 05 48 39 eb 75 07 5b 5d c3 cc cc cc cc 48 89 df 5b 5d e9 6d d1 ee e0 <0f> 0b eb dc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[ 5613.251197] RSP: 0018:ffffc90003cbb9e8 EFLAGS: 00010202
[ 5613.256462] RAX: ffff88813066bd20 RBX: ffffc90003cbbb00 RCX: ffffc90003cbbaa0
[ 5613.263643] RDX: ffff88835a0fcb90 RSI: ffffc90003cbba38 RDI: ffffc90003cbbaa0
[ 5613.270793] RBP: ffffc90003cbbb00 R08: ffffc90003cbba38 R09: 0000000000000000
[ 5613.277959] R10: 0000000000011aa5 R11: 0000000000000000 R12: ffffc90003cbbe50
[ 5613.285128] R13: ffff88811afe6e00 R14: 0000000000000000 R15: ffff88835626b240
[ 5613.292286] FS: 00007efc577a45c0(0000) GS:ffff88849e300000(0000) knlGS:0000000000000000
[ 5613.300406] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5613.306183] CR2: 00007f4930112000 CR3: 0000000352fb4004 CR4: 0000000000770ee0
[ 5613.313342] PKRU: 55555554
[ 5613.316098] Call Trace:
[ 5613.318595] <TASK>
[ 5613.320743] xe_exec_ioctl+0x383/0x8a0 [xe]
[ 5613.325278] ? __is_insn_slot_addr+0x8e/0x110
[ 5613.329719] ? __is_insn_slot_addr+0x8e/0x110
[ 5613.334116] ? kernel_text_address+0x75/0xf0
[ 5613.338429] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 5613.343778] ? __kernel_text_address+0x9/0x40
[ 5613.348181] ? unwind_get_return_address+0x1a/0x30
[ 5613.353013] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 5613.358362] ? arch_stack_walk+0x99/0xf0
[ 5613.362329] ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.366996] ? lock_acquire+0x287/0x2f0
[ 5613.370873] ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.375530] ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.380181] ? lock_release+0x225/0x2e0
[ 5613.384059] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
[ 5613.389092] drm_ioctl_kernel+0xc0/0x170
[ 5613.393068] drm_ioctl+0x1b7/0x490
[ 5613.396519] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
[ 5613.401547] ? lock_release+0x225/0x2e0
[ 5613.405432] __x64_sys_ioctl+0x8a/0xb0
[ 5613.409232] do_syscall_64+0x37/0x90
[ 5613.412848] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 5613.417936] RIP: 0033:0x7efc5b91aaff
[ 5613.421553] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[ 5613.440275] RSP: 002b:00007ffe7a04c0e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 5613.447865] RAX: ffffffffffffffda RBX: 000055dd5c1437e0 RCX: 00007efc5b91aaff
[ 5613.455024] RDX: 00007ffe7a04c200 RSI: 0000000040386448 RDI: 000000000000000d
[ 5613.462178] RBP: 00007ffe7a04c170 R08: 000055dd59685010 R09: 000055dd5bc42400
[ 5613.469359] R10: 0000000000000007 R11: 0000000000000246 R12: 000055dd5990d190
[ 5613.476513] R13: 0000000000200012 R14: 000055dd59725ee0 R15: 000055dd59795490
[ 5613.483704] </TASK>
[ 5613.485935] irq event stamp: 0
[ 5613.489030] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 5613.495325] hardirqs last disabled at (0): [<ffffffff81138cb4>] copy_process+0x984/0x1dd0
[ 5613.503538] softirqs last enabled at (0): [<ffffffff81138cb4>] copy_process+0x984/0x1dd0
[ 5613.511748] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 5613.518047] ---[ end trace 0000000000000000 ]---
Easy way to reproduce:
- start firefox
- https://webglsamples.org/aquarium/aquarium.html
- select 5k fishes
- wait a couple minutes for it to happen
Edited by José Roberto de Souza