Scheduler Use-After-Free
Enabling KASAN, I see the following:
[ 138.211070] ==================================================================
[ 138.211081] BUG: KASAN: use-after-free in drm_sched_entity_push_job+0x512/0x5a0 [gpu_sched]
[ 138.211093] Write of size 8 at addr ffff8881317c49f8 by task xe_exec_basic/1156
[ 138.211103] CPU: 6 PID: 1156 Comm: xe_exec_basic Not tainted 6.1.0-rc1+ #52
[ 138.211109] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
[ 138.211115] Call Trace:
[ 138.211118] <TASK>
[ 138.211121] dump_stack_lvl+0x56/0x73
[ 138.211129] print_report+0x172/0x475
[ 138.211135] ? drm_sched_entity_push_job+0x512/0x5a0 [gpu_sched]
[ 138.211144] ? drm_sched_entity_push_job+0x512/0x5a0 [gpu_sched]
[ 138.211152] ? drm_sched_entity_push_job+0x512/0x5a0 [gpu_sched]
[ 138.211161] kasan_report+0xbc/0xf0
[ 138.211168] ? drm_sched_entity_push_job+0x512/0x5a0 [gpu_sched]
[ 138.211177] drm_sched_entity_push_job+0x512/0x5a0 [gpu_sched]
[ 138.211186] xe_exec_ioctl+0x857/0x17e0 [xe]
[ 138.211208] ? xe_engine_set_property_ioctl+0x220/0x220 [xe]
[ 138.211226] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 138.211234] ? lock_is_held_type+0xe2/0x140
[ 138.211239] ? find_held_lock+0x2c/0x110
[ 138.211244] ? lock_release+0x37d/0x6f0
[ 138.211251] drm_ioctl_kernel+0x1ca/0x3a0 [drm]
[ 138.211283] ? xe_engine_set_property_ioctl+0x220/0x220 [xe]
[ 138.211300] ? drm_setversion+0x800/0x800 [drm]
[ 138.211325] drm_ioctl+0x44d/0x8d0 [drm]
[ 138.211349] ? xe_engine_set_property_ioctl+0x220/0x220 [xe]
[ 138.211366] ? drm_ioctl_kernel+0x3a0/0x3a0 [drm]
[ 138.211390] ? selinux_inode_getsecctx+0x80/0x80
[ 138.211397] ? io_schedule_timeout+0x160/0x160
[ 138.211402] ? var_wake_function+0x270/0x270
[ 138.211408] ? security_file_ioctl+0x4d/0x90
[ 138.211415] __x64_sys_ioctl+0x128/0x1a0
[ 138.211421] do_syscall_64+0x38/0x90
[ 138.211426] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 138.211432] RIP: 0033:0x7f4eafcd707b
[ 138.211437] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cd bd 0c 00 f7 d8 64 89 01 48
[ 138.211449] RSP: 002b:00007fff7fabc0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 138.211457] RAX: ffffffffffffffda RBX: 00007fff7fabc170 RCX: 00007f4eafcd707b
[ 138.211462] RDX: 00007fff7fabc170 RSI: 0000000040386448 RDI: 0000000000000003
[ 138.211468] RBP: 0000000040386448 R08: 0000000000000000 R09: 00000000001a0000
[ 138.211473] R10: 00000000000000f8 R11: 0000000000000246 R12: 00007f4eae29dd80
[ 138.211478] R13: 0000000000000003 R14: 00000000001a0000 R15: 00007f4eae299000
[ 138.211486] </TASK>
[ 138.211491] Allocated by task 1156:
[ 138.211495] kasan_save_stack+0x1e/0x40
[ 138.211500] kasan_set_track+0x21/0x30
[ 138.211504] __kasan_slab_alloc+0x58/0x70
[ 138.211509] kmem_cache_alloc+0x1b4/0x3c0
[ 138.211513] xe_sched_job_create+0x158/0xbc0 [xe]
[ 138.211533] xe_exec_ioctl+0x596/0x17e0 [xe]
[ 138.211549] drm_ioctl_kernel+0x1ca/0x3a0 [drm]
[ 138.211572] drm_ioctl+0x44d/0x8d0 [drm]
[ 138.211593] __x64_sys_ioctl+0x128/0x1a0
[ 138.211600] do_syscall_64+0x38/0x90
[ 138.211608] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 138.211620] Freed by task 59:
[ 138.211623] kasan_save_stack+0x1e/0x40
[ 138.211628] kasan_set_track+0x21/0x30
[ 138.211633] kasan_save_free_info+0x2a/0x50
[ 138.211638] __kasan_slab_free+0x106/0x190
[ 138.211643] slab_free_freelist_hook+0xb6/0x190
[ 138.211648] kmem_cache_free+0xda/0x440
[ 138.211652] guc_engine_free_job+0x75/0x220 [xe]
[ 138.211672] drm_sched_main+0xa56/0x11f0 [gpu_sched]
[ 138.211680] process_one_work+0x7e6/0x1340
[ 138.211686] worker_thread+0x5ac/0xed0
[ 138.211690] kthread+0x29f/0x340
[ 138.211695] ret_from_fork+0x1f/0x30
[ 138.211702] The buggy address belongs to the object at ffff8881317c4900
which belongs to the cache xe_sched_job of size 312
[ 138.211712] The buggy address is located 248 bytes inside of
312-byte region [ffff8881317c4900, ffff8881317c4a38)
[ 138.211723] The buggy address belongs to the physical page:
[ 138.211728] page:0000000037c252c7 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1317c4
[ 138.211736] head:0000000037c252c7 order:1 compound_mapcount:0 compound_pincount:0
[ 138.211743] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[ 138.211752] raw: 0017ffffc0010200 0000000000000000 dead000000000122 ffff8881527c5b80
[ 138.211758] raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000
[ 138.211764] page dumped because: kasan: bad access detected
[ 138.211771] Memory state around the buggy address:
[ 138.211776] ffff8881317c4880: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[ 138.211782] ffff8881317c4900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 138.211788] >ffff8881317c4980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 138.211794] ^
[ 138.211799] ffff8881317c4a00: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[ 138.211805] ffff8881317c4a80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 138.211811] ==================================================================