Assert in 'xe_tile_assert(tile, update->qwords <= MAX_PTE_PER_SDI);' when actually doing async VM binds in Mesa
With mesa/mesa!26805 (merged) applied I'm getting the below warning when running some sparse tests like dEQP-VK.api.buffer_memory_requirements.create_sparse_binding_sparse_aliased.ext_mem_flags_excluded.method1.size_req_transfer_usage_bits
. This was reproduced in TGL, did not tested yet in a DG2.
From what I understood there is more pending binds than MAX_PTE_PER_SDI.
[ 58.496769] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling PW_3
[ 58.496996] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling PW_2
[ 78.016392] ------------[ cut here ]------------
[ 78.016404] xe 0000:00:02.0: [drm] Assertion `update->qwords <= 0x1FE` failed!
platform: 1 subplatform: 1
graphics: Xe_LP 12.00 step B0
media: Xe_M 12.00 step B0
tile: 0 VRAM 0 B
[ 78.016425] WARNING: CPU: 0 PID: 966 at drivers/gpu/drm/xe/xe_migrate.c:1110 write_pgtable+0x2de/0x300 [xe]
[ 78.016473] Modules linked in: snd_hda_codec_hdmi snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec i2c_algo_bit drm_buddy drm_display_helper ttm mei_hdcp mei_pxp x86_pkg_temp_thermal wmi_bmof coretemp snd_hda_intel snd_intel_dspcfg crct10dif_pclmul snd_hda_codec e1000e crc32_pclmul snd_hwdep snd_hda_core video ptp ghash_clmulni_intel kvm_intel pps_core mei_me snd_pcm i2c_i801 mei i2c_smbus wmi fuse
[ 78.016563] CPU: 0 PID: 966 Comm: deqp-vk Not tainted 6.7.0-rc3-zeh-xe+ #1196
[ 78.016566] Hardware name: Dell Inc. Latitude 5420/01M3M4, BIOS 1.27.0 03/17/2023
[ 78.016567] RIP: 0010:write_pgtable+0x2de/0x300 [xe]
[ 78.016599] Code: 00 00 00 48 c7 c1 7e 01 4b a0 50 41 55 44 8b 8c 24 a0 00 00 00 44 8b 84 24 a4 00 00 00 48 8b 94 24 a8 00 00 00 e8 02 06 e2 e0 <0f> 0b 48 83 c4 50 4c 8b 54 24 60 e9 5b fd ff ff e8 5d e7 9f e1 66
[ 78.016601] RSP: 0018:ffffc9000185b4a8 EFLAGS: 00010282
[ 78.016605] RAX: 0000000000000000 RBX: ffff888129677520 RCX: 0000000000000000
[ 78.016607] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 78.016608] RBP: 00000000000001ff R08: 00000000fffeffff R09: 0000000000000001
[ 78.016610] R10: 00000000fffeffff R11: ffff888287080000 R12: ffffc9000185b8f8
[ 78.016612] R13: ffffffffa04b052d R14: 0000000000000000 R15: 0000000000000000
[ 78.016613] FS: 00007f7462769740(0000) GS:ffff888287800000(0000) knlGS:0000000000000000
[ 78.016615] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 78.016617] CR2: 00005569ac8bef40 CR3: 0000000138ecc006 CR4: 0000000000770ef0
[ 78.016618] PKRU: 55555554
[ 78.016620] Call Trace:
[ 78.016622] <TASK>
[ 78.016623] ? write_pgtable+0x2de/0x300 [xe]
[ 78.016659] ? __warn+0x7c/0x170
[ 78.016667] ? write_pgtable+0x2de/0x300 [xe]
[ 78.016707] ? report_bug+0x189/0x1c0
[ 78.016713] ? handle_bug+0x36/0x70
[ 78.016717] ? exc_invalid_op+0x13/0x60
[ 78.016720] ? asm_exc_invalid_op+0x16/0x20
[ 78.016726] ? write_pgtable+0x2de/0x300 [xe]
[ 78.016757] ? write_pgtable+0x2de/0x300 [xe]
[ 78.016793] xe_migrate_update_pgtables+0x3c2/0xf50 [xe]
[ 78.016831] ? __slab_alloc.isra.0+0x4d/0x90
[ 78.016836] ? __slab_alloc.isra.0+0x5a/0x90
[ 78.016842] ? __kmem_cache_alloc_node+0x14f/0x250
[ 78.016844] ? __xe_pt_bind_vma+0x425/0xe00 [xe]
[ 78.016890] ? __is_insn_slot_addr+0x91/0x120
[ 78.016899] __xe_pt_bind_vma+0x468/0xe00 [xe]
[ 78.016938] ? __thaw_task+0x60/0x60
[ 78.016941] ? arch_stack_walk+0x93/0xe0
[ 78.016957] ? __lock_acquire+0x6c9/0x2940
[ 78.016964] ? lock_acquire+0xd3/0x2d0
[ 78.016969] ? mark_held_locks+0x40/0x70
[ 78.016973] xe_vm_bind_vma+0xcf/0x3c0 [xe]
[ 78.017011] xe_vm_bind+0xa5/0x3e0 [xe]
[ 78.017047] __xe_vma_op_execute+0x319/0x760 [xe]
[ 78.017097] ? find_held_lock+0x2b/0x80
[ 78.017116] xe_vm_bind_ioctl+0x1ce1/0x2060 [xe]
[ 78.017164] ? xe_vm_destroy_ioctl+0x190/0x190 [xe]
[ 78.017196] drm_ioctl_kernel+0xa0/0x100
[ 78.017201] drm_ioctl+0x214/0x470
[ 78.017204] ? xe_vm_destroy_ioctl+0x190/0x190 [xe]
[ 78.017240] ? find_held_lock+0x2b/0x80
[ 78.017251] ? __fget_files+0xbc/0x180
[ 78.017258] __x64_sys_ioctl+0x85/0xa0
[ 78.017263] do_syscall_64+0x3c/0xe0
[ 78.017267] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 78.017272] RIP: 0033:0x7f746251a94f
[ 78.017276] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[ 78.017277] RSP: 002b:00007fffcbabc470 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 78.017280] RAX: ffffffffffffffda RBX: 0000000040010000 RCX: 00007f746251a94f
[ 78.017282] RDX: 00007fffcbabc590 RSI: 0000000040886445 RDI: 0000000000000006
[ 78.017283] RBP: 00007fffcbabc500 R08: 7fffffffffffffff R09: 00005569b1b64a40
[ 78.017285] R10: 00005569b1c600d0 R11: 0000000000000246 R12: 0000000000000000
[ 78.017287] R13: 00005569a8e23e49 R14: 00005569ad4e1118 R15: 00007f7462e8c040
[ 78.017293] </TASK>
[ 78.017294] irq event stamp: 467461
[ 78.017296] hardirqs last enabled at (467467): [<ffffffff811bf676>] console_unlock+0x106/0x150
[ 78.017299] hardirqs last disabled at (467472): [<ffffffff811bf65b>] console_unlock+0xeb/0x150
[ 78.017302] softirqs last enabled at (465156): [<ffffffff811346b2>] irq_exit_rcu+0x82/0xe0
[ 78.017305] softirqs last disabled at (465147): [<ffffffff811346b2>] irq_exit_rcu+0x82/0xe0
[ 78.017308] ---[ end trace 0000000000000000 ]---
Edited by José Roberto de Souza