Incorrect vm locking mode for vma binding
It looks like we can trigger vma binds that are not rebinds from the rebind worker, and thus grab the vm lock in the incorrect mode. Similarly there seems to be a bind of an already bound vma with the vm lock held in write mode, but that should be OK, I guess (second trace) and might need lockdep asserts updated. On DG2:
./xe_evict --r evict-cm-threads-small-multi-vm
106.589057] [IGT] xe_evict: starting subtest evict-cm-threads-small-multi-vm
[ 107.190360] ------------[ cut here ]------------
[ 107.190507] WARNING: CPU: 0 PID: 1168 at drivers/gpu/drm/xe/xe_pt.c:954 xe_pt_commit_locks_assert+0xe7/0x360 [xe]
[ 107.190599] Modules linked in: xe drm_ttm_helper ttm drm_suballoc_helper kunit gpu_sched nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal snd_hda_intel intel_powerclamp snd_intel_dspcfg btusb eeepc_wmi snd_hda_codec coretemp btrtl asus_wmi ppdev snd_hwdep btbcm ledtrig_audio intel_cstate btintel snd_hda_core sparse_keymap ee1004 iTCO_wdt bluetooth platform_profile intel_pmc_bxt iTCO_vendor_support snd_seq snd_seq_device intel_wmi_thunderbolt ecdh_generic intel_uncore ecc snd_pcm rfkill pcspkr wmi_bmof
[ 107.190797] video snd_timer mei_me snd i2c_i801 idma64 mei soundcore i2c_smbus parport_pc parport wmi acpi_tad acpi_pad drm ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel e1000e usb_storage nvme nvme_core pinctrl_tigerlake fuse
[ 107.191239] CPU: 0 PID: 1168 Comm: kworker/u24:13 Not tainted 6.1.0-rc1+ #53
[ 107.191268] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
[ 107.191298] Workqueue: events_unbound preempt_rebind_work_func [xe]
[ 107.191361] RIP: 0010:xe_pt_commit_locks_assert+0xe7/0x360 [xe]
[ 107.191398] Code: 6f 02 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 cc cc cc cc 31 f6 49 8d bc 24 c0 01 00 00 e8 f1 6e 68 f4 85 c0 0f 85 93 01 00 00 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 0f b6 14
[ 107.191459] RSP: 0018:ffffc9000241f560 EFLAGS: 00010246
[ 107.191480] RAX: 0000000000000000 RBX: ffff88815af58400 RCX: 0000000000000000
[ 107.191505] RDX: 0000000000000000 RSI: ffffffffb5ea1f00 RDI: ffffffffb61128a0
[ 107.191529] RBP: ffffffffb77f3c6c R08: 0000000000000001 R09: ffff88816bcc9913
[ 107.191553] R10: ffffed102d799322 R11: 0000000000000001 R12: ffff888111631000
[ 107.191577] R13: ffff88815bc50000 R14: ffffc9000241f84c R15: ffff88815af58400
[ 107.191602] FS: 0000000000000000(0000) GS:ffff888727a00000(0000) knlGS:0000000000000000
[ 107.191630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 107.191650] CR2: 00007f2a98adfa70 CR3: 000000014fd30004 CR4: 0000000000770ef0
[ 107.191676] PKRU: 55555554
[ 107.191688] Call Trace:
[ 107.191700] <TASK>
[ 107.191716] __xe_pt_bind_vma+0x968/0x2020 [xe]
[ 107.191757] ? preempt_rebind_work_func+0xe78/0x1a30 [xe]
[ 107.191792] ? preempt_rebind_work_func+0xe78/0x1a30 [xe]
[ 107.191842] ? orc_find.part.0+0x1ed/0x330
[ 107.191880] ? xe_pt_zap_ptes+0x2a0/0x2a0 [xe]
[ 107.191914] ? ret_from_fork+0x1f/0x30
[ 107.191938] ? preempt_rebind_work_func+0xe78/0x1a30 [xe]
[ 107.191975] ? preempt_rebind_work_func+0xe78/0x1a30 [xe]
[ 107.192017] ? orc_find.part.0+0x1ed/0x330
[ 107.192059] ? ret_from_fork+0x1f/0x30
[ 107.192082] ? __stack_depot_save+0x2a/0x4e0
[ 107.192110] ? kasan_save_stack+0x31/0x40
[ 107.192130] ? kasan_save_stack+0x1e/0x40
[ 107.192146] ? kasan_set_track+0x21/0x30
[ 107.192162] ? __kasan_kmalloc+0x7e/0x90
[ 107.192178] ? xe_preempt_fence_alloc+0x3e/0x1b0 [xe]
[ 107.192216] ? preempt_rebind_work_func+0xe79/0x1a30 [xe]
[ 107.192249] ? process_one_work+0x7e6/0x1340
[ 107.192274] ? worker_thread+0x750/0xed0
[ 107.192290] ? kthread+0x29f/0x340
[ 107.192304] ? ret_from_fork+0x1f/0x30
[ 107.192320] ? native_queued_spin_lock_slowpath+0x130/0x920
[ 107.192351] ? lock_chain_count+0x20/0x20
[ 107.192372] ? lock_acquire+0x19a/0x510
[ 107.192393] ? lock_is_held_type+0xe2/0x140
[ 107.192417] ? lock_is_held_type+0xe2/0x140
[ 107.192439] ? xe_vm_bind_vma+0x23f/0x9b0 [xe]
[ 107.192474] xe_vm_bind_vma+0x23f/0x9b0 [xe]
[ 107.192516] xe_vm_rebind+0x191/0x710 [xe]
[ 107.192556] preempt_rebind_work_func+0xffd/0x1a30 [xe]
[ 107.192603] ? xe_vm_rebind+0x710/0x710 [xe]
[ 107.192644] ? lock_acquire+0x1aa/0x510
[ 107.192671] ? lock_is_held_type+0xe2/0x140
[ 107.192694] process_one_work+0x7e6/0x1340
[ 107.192719] ? lock_release+0x6f0/0x6f0
[ 107.192736] ? pwq_dec_nr_in_flight+0x230/0x230
[ 107.192761] ? rwlock_bug.part.0+0x90/0x90
[ 107.192785] worker_thread+0x750/0xed0
[ 107.192812] ? process_one_work+0x1340/0x1340
[ 107.192840] kthread+0x29f/0x340
[ 107.192854] ? kthread_complete_and_exit+0x20/0x20
[ 107.192877] ret_from_fork+0x1f/0x30
[ 107.192908] </TASK>
[ 107.192919] irq event stamp: 6979
[ 107.192932] hardirqs last enabled at (6987): [<ffffffffb33709ee>] __up_console_sem+0x5e/0x70
[ 107.192964] hardirqs last disabled at (6994): [<ffffffffb33709d3>] __up_console_sem+0x43/0x70
[ 107.192993] softirqs last enabled at (6322): [<ffffffffb32039fe>] __irq_exit_rcu+0x1ce/0x260
[ 107.193027] softirqs last disabled at (6317): [<ffffffffb32039fe>] __irq_exit_rcu+0x1ce/0x260
[ 107.193057] ---[ end trace 0000000000000000 ]---
[ 107.195573] ------------[ cut here ]------------
[ 107.195621] WARNING: CPU: 10 PID: 9 at drivers/gpu/drm/xe/xe_pt.c:952 xe_pt_commit_locks_assert+0x20f/0x360 [xe]
[ 107.195678] Modules linked in: xe drm_ttm_helper ttm drm_suballoc_helper kunit gpu_sched nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal snd_hda_intel intel_powerclamp snd_intel_dspcfg btusb eeepc_wmi snd_hda_codec coretemp btrtl asus_wmi ppdev snd_hwdep btbcm ledtrig_audio intel_cstate btintel snd_hda_core sparse_keymap ee1004 iTCO_wdt bluetooth platform_profile intel_pmc_bxt iTCO_vendor_support snd_seq snd_seq_device intel_wmi_thunderbolt ecdh_generic intel_uncore ecc snd_pcm rfkill pcspkr wmi_bmof
[ 107.195845] video snd_timer mei_me snd i2c_i801 idma64 mei soundcore i2c_smbus parport_pc parport wmi acpi_tad acpi_pad drm ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel e1000e usb_storage nvme nvme_core pinctrl_tigerlake fuse
[ 107.196243] CPU: 10 PID: 9 Comm: kworker/u24:0 Tainted: G W 6.1.0-rc1+ #53
[ 107.196271] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
[ 107.196299] Workqueue: events_unbound async_op_work_func [xe]
[ 107.196338] RIP: 0010:xe_pt_commit_locks_assert+0x20f/0x360 [xe]
[ 107.196374] Code: 0b 48 83 c4 08 5b 5d 41 5c 41 5d c3 cc cc cc cc 49 8d bc 24 c0 01 00 00 be 01 00 00 00 e8 c9 6d 68 f4 85 c0 0f 85 aa 00 00 00 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 0f b6 14
[ 107.196433] RSP: 0018:ffffc9000013f3e0 EFLAGS: 00010246
[ 107.196453] RAX: 0000000000000000 RBX: ffff88815af58400 RCX: 0000000000000000
[ 107.196477] RDX: 0000000000000000 RSI: ffffffffb5ea1f00 RDI: ffffffffb61128a0
[ 107.196501] RBP: ffffffffb77f3c6c R08: 0000000000000001 R09: ffff8881453c5313
[ 107.196524] R10: ffffed1028a78a62 R11: 0000000000000001 R12: ffff888111631000
[ 107.196547] R13: ffff88815bc50000 R14: ffffc9000013f6cc R15: ffff88815af58400
[ 107.196571] FS: 0000000000000000(0000) GS:ffff888727f00000(0000) knlGS:0000000000000000
[ 107.196598] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 107.196618] CR2: 00007f187cbfee40 CR3: 000000081be2a005 CR4: 0000000000770ee0
[ 107.196642] PKRU: 55555554
[ 107.196653] Call Trace:
[ 107.196664] <TASK>
[ 107.196678] __xe_pt_bind_vma+0x968/0x2020 [xe]
[ 107.196711] ? ret_from_fork+0x1f/0x30
[ 107.196728] ? kernel_text_address+0x13/0xd0
[ 107.196752] ? arch_stack_walk+0x75/0xd0
[ 107.196777] ? xe_pt_zap_ptes+0x2a0/0x2a0 [xe]
[ 107.196807] ? ret_from_fork+0x1f/0x30
[ 107.196828] ? stack_trace_save+0x81/0xa0
[ 107.196877] ? __stack_depot_save+0x2a/0x4e0
[ 107.196928] ? lock_is_held_type+0xe2/0x140
[ 107.196947] ? find_held_lock+0x2c/0x110
[ 107.196967] ? lock_release+0x37d/0x6f0
[ 107.196987] ? mark_held_locks+0x9e/0xe0
[ 107.197006] ? _raw_spin_unlock_irqrestore+0x2d/0x60
[ 107.197026] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 107.197046] ? _raw_spin_unlock_irqrestore+0x2d/0x60
[ 107.197066] ? lockdep_hardirqs_on+0x7d/0x100
[ 107.197086] ? kvfree_call_rcu+0x470/0x660
[ 107.197109] ? ttm_resource_compat+0x61/0x1a0 [ttm]
[ 107.197138] ? ttm_bo_validate+0x12e/0x380 [ttm]
[ 107.197164] ? lock_is_held_type+0xe2/0x140
[ 107.197186] ? xe_vm_bind_vma+0x23f/0x9b0 [xe]
[ 107.197218] xe_vm_bind_vma+0x23f/0x9b0 [xe]
[ 107.197257] ? lock_is_held_type+0xe2/0x140
[ 107.197278] xe_vm_bind+0xdd/0x250 [xe]
[ 107.197312] vm_bind_ioctl+0x45a/0x1690 [xe]
[ 107.197343] ? lock_release+0x37d/0x6f0
[ 107.197365] ? lock_is_held_type+0xe2/0x140
[ 107.197394] ? xe_vma_userptr_pin_pages+0x860/0x860 [xe]
[ 107.197454] ? down_write+0x1c2/0x1f0
[ 107.197476] ? async_op_work_func+0x67c/0xf60 [xe]
[ 107.197508] async_op_work_func+0x67c/0xf60 [xe]
[ 107.197549] ? lock_release+0x6f0/0x6f0
[ 107.197567] ? vm_bind_ioctl+0x1690/0x1690 [xe]
[ 107.197601] ? lock_is_held_type+0xe2/0x140
[ 107.197624] process_one_work+0x7e6/0x1340
[ 107.197648] ? lock_release+0x6f0/0x6f0
[ 107.197665] ? pwq_dec_nr_in_flight+0x230/0x230
[ 107.197689] ? rwlock_bug.part.0+0x90/0x90
[ 107.197712] worker_thread+0x5ac/0xed0
[ 107.197739] ? process_one_work+0x1340/0x1340
[ 107.197760] kthread+0x29f/0x340
[ 107.197774] ? kthread_complete_and_exit+0x20/0x20
[ 107.197796] ret_from_fork+0x1f/0x30
[ 107.197826] </TASK>
[ 107.197848] irq event stamp: 368327
[ 107.197864] hardirqs last enabled at (368335): [<ffffffffb33709ee>] __up_console_sem+0x5e/0x70
[ 107.197894] hardirqs last disabled at (368342): [<ffffffffb33709d3>] __up_console_sem+0x43/0x70
[ 107.197923] softirqs last enabled at (368022): [<ffffffffb32039fe>] __irq_exit_rcu+0x1ce/0x260
[ 107.197953] softirqs last disabled at (368017): [<ffffffffb32039fe>] __irq_exit_rcu+0x1ce/0x260
[ 107.197983] ---[ end trace 0000000000000000 ]---
[ 108.403195] [IGT] xe_evict: exiting, ret=0