current xe doesn't even probe on DG2
After a rebase on a later xe I was not able anymore to load xe in a DG2 machine. It constantly reboot. I could finally get the log and the bisect.
Bisect points to commit c9aa6a30 ttm: Allow the user to control which fences BO wait on.
Log:
[ 83.866046] [drm] Initialized xe 1.1.0 20201103 for 0000:03:00.0 on minor 0 [ 106.833196] systemd-journald[335]: Sent WATCHDOG=1 notification. [ 110.622822] systemd-journald[335]: Successfully sent stream file descriptor to service manager. [ 112.006358] xe 0000:03:00.0: vgaarb: deactivate vga console [ 112.011952] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] XE_DG2 56a1:0008 dgfx:1 gfx100:1255 dma_m_s:46 tc:0 [ 112.021446] xe 0000:03:00.0: [drm:xe_mmio_init [xe]] VRAM: 0x00000001fe000000 [ 112.028632] BUG: kernel NULL pointer dereference, address: 0000000000000002 [ 112.035554] #PF: supervisor read access in kernel mode [ 112.040666] #PF: error_code(0x0000) - not-present page [ 112.045778] PGD 0 P4D 0 [ 112.048304] Oops: 0000 [#1 (closed)] PREEMPT SMP NOPTI [ 112.052641] CPU: 2 PID: 1748 Comm: insmod Kdump: loaded Tainted: G OE 5.18.0+ #4 (closed) [ 112.061204] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X220.B00.2103302221 03/30/2021 [ 112.074074] RIP: 0010:ttm_bo_validate+0x1b/0x100 [ttm] [ 112.079189] Code: 45 bc eb bb e8 e6 ec 48 ea 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 48 89 d3 48 83 ec 28 <8b> 36 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 85 f6 75 0c 41 [ 112.097837] RSP: 0018:ffffaa8c407b7858 EFLAGS: 00010292 [ 112.103036] RAX: 0000000000000001 RBX: 0000000000000010 RCX: 0000000000000000 [ 112.110132] RDX: 0000000000000010 RSI: 0000000000000002 RDI: ffff973fd481bc00 [ 112.117228] RBP: ffffaa8c407b7898 R08: ffff973fc93775d8 R09: ffff973fc93775a0 [ 112.124321] R10: 0000000000010000 R11: 0000000090000000 R12: ffff973fd481bc00 [ 112.131417] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000002 [ 112.138509] FS: 00007f157d987c40(0000) GS:ffff97431dc80000(0000) knlGS:0000000000000000 [ 112.146551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 112.152264] CR2: 0000000000000002 CR3: 00000001140d0004 CR4: 00000000003706e0 [ 112.159359] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 112.166454] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 112.173571] Call Trace: [ 112.176009] [ 112.178102] ? ttm_sys_man_alloc+0x43/0x60 [ttm] [ 112.182702] ttm_bo_init_reserved+0x16b/0x1f0 [ttm] [ 112.187558] __xe_bo_create_locked+0x19e/0x230 [xe] [ 112.192425] ? xe_ttm_io_mem_pfn+0x90/0x90 [xe] [ 112.196947] xe_bo_create_locked+0x7e/0x100 [xe] [ 112.201568] xe_ggtt_init+0xdb/0x300 [xe] [ 112.205582] xe_gt_init+0xd8/0x1f0 [xe] [ 112.209417] xe_device_probe+0x51/0x90 [xe] [ 112.213593] xe_pci_probe+0x137/0x1c0 [xe] [ 112.217687] ? __pm_runtime_resume+0x60/0x80 [ 112.221937] local_pci_probe+0x48/0x90 [ 112.225670] ? pci_match_device+0xde/0x130 [ 112.229750] pci_device_probe+0xc8/0x240 [ 112.233655] really_probe+0x1a0/0x380 [ 112.237301] __driver_probe_device+0x109/0x180 [ 112.241725] driver_probe_device+0x23/0x90 [ 112.245799] __driver_attach+0xac/0x1b0 [ 112.249618] ? __device_attach_driver+0xe0/0xe0 [ 112.254124] bus_for_each_dev+0x7c/0xc0 [ 112.257942] driver_attach+0x1e/0x20 [ 112.261529] bus_add_driver+0x152/0x1f0 [ 112.265380] driver_register+0x95/0xf0 [ 112.269115] __pci_register_driver+0x68/0x70 [ 112.273366] xe_register_pci_driver+0x23/0x30 [xe] [ 112.278150] xe_init+0x28/0x39 [xe] [ 112.281635] ? xe_hw_fence_module_init+0x36/0x36 [xe] [ 112.286675] do_one_initcall+0x46/0x210 [ 112.290496] ? kmem_cache_alloc_trace+0x186/0x2c0 [ 112.295175] do_init_module+0x52/0x270 [ 112.298909] load_module+0x23c9/0x26e0 [ 112.302645] __do_sys_finit_module+0xc5/0x130 [ 112.306979] ? __do_sys_finit_module+0xc5/0x130 [ 112.311489] __x64_sys_finit_module+0x18/0x20 [ 112.315822] do_syscall_64+0x38/0x90 [ 112.319386] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 112.324409] RIP: 0033:0x7f157d11ea3d [ 112.327969] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48 [ 112.346614] RSP: 002b:00007ffe0ca66768 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 112.354141] RAX: ffffffffffffffda RBX: 0000561cf5813770 RCX: 00007f157d11ea3d [ 112.361237] RDX: 0000000000000000 RSI: 0000561cf42c8cd2 RDI: 0000000000000003 [ 112.368333] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 112.375429] R10: 0000000000000003 R11: 0000000000000246 R12: 0000561cf42c8cd2 [ 112.382523] R13: 0000561cf5815860 R14: 0000561cf42c7888 R15: 0000561cf5813880 [ 112.389620] [ 112.391798] Modules linked in: xe(OE+) binfmt_misc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel nls_iso8859_1 aesni_intel crypto_simd cryptd rapl intel_cstate intel_wmi_thunderbolt serio_raw 8250_dw mei_me joydev mei drm_ttm_helper ttm input_leds drm_suballoc_helper gpu_sched intel_pch_thermal wmi video mac_hid acpi_pad sch_fq_codel msr parport_pc ppdev parport drm ip_tables x_tables autofs4 hid_generic e1000e i2c_i801 i2c_smbus nvme usbhid nvme_core hid intel_lpss_pci ahci intel_lpss idma64 libahci virt_dma [last unloaded: xe] [ 112.446768] CR2: 0000000000000002