Missing Huc FW crashes system
Observed on BMG. See dmesg below. FW might be missing by system should not crash.
[ 7238.473255] xe: loading out-of-tree module taints kernel.
[ 7238.480259] Setting dangerous option force_probe - tainting kernel
[ 7238.494619] Console: switching to colour dummy device 80x25
[ 7238.495301] xe 0000:03:00.0: vgaarb: deactivate vga console
[ 7238.499153] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE e20b:0000 dgfx:1 gfx:Xe2_LPG / Xe2_HPG (20.01) media:Xe2_LPM / Xe2_HPM (13.01) display:no dma_m_s:46 tc:1 gscfi:0 cscfi:1
[ 7238.499202] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] Stepping = (G:A0, M:A1, D:**, B:**)
[ 7238.499226] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] SR-IOV support: no (mode: none)
[ 7238.504068] xe 0000:03:00.0: [drm] Using GuC firmware from intel-ci/xe/bmg_guc_70.bin version 70.29.1
[ 7238.507749] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 0] = 0x004645cf
[ 7238.507787] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 1] = 0x00000000
[ 7238.507807] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 2] = 0x00000000
[ 7238.507825] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 3] = 0x00000003
[ 7238.507842] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 4] = 0x00001eca
[ 7238.507858] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 5] = 0xe20b0000
[ 7238.507874] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 6] = 0x00000000
[ 7238.507890] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 7] = 0x00000000
[ 7238.507906] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 8] = 0x00000000
[ 7238.507921] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[ 9] = 0x00000000
[ 7238.507937] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[10] = 0x00000000
[ 7238.507953] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[11] = 0x00000000
[ 7238.507968] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[12] = 0x00000000
[ 7238.507984] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT0: GuC param[13] = 0x00000000
[ 7238.508001] xe 0000:03:00.0: [drm:xe_wopcm_init [xe]] WOPCM: 4096K
[ 7238.508037] xe 0000:03:00.0: [drm:xe_wopcm_init [xe]] GuC WOPCM is already locked [6144K, 832K)
[ 7238.509587] xe 0000:03:00.0: [drm:__xe_guc_upload [xe]] GT0: load still in progress, timeouts = 0, freq = 2150MHz (req 2133MHz), status = 0x00000072 [0x39/00]
[ 7238.514307] xe 0000:03:00.0: [drm:__xe_guc_upload [xe]] GT0: load still in progress, timeouts = 0, freq = 2150MHz (req 2133MHz), status = 0x80000534 [0x1A/05]
[ 7238.515645] xe 0000:03:00.0: [drm:__xe_guc_upload [xe]] GT0: init took 6ms, freq = 2150MHz (req = 2133MHz), before = 2150MHz, status = 0x8002F034, timeouts = 0
[ 7238.516296] xe 0000:03:00.0: [drm:xe_guc_ct_enable [xe]] GT0: GuC CT communication channel enabled
[ 7238.516337] xe 0000:03:00.0: [drm:xe_guc_ct_enable [xe]] GT0: GuC CT safe-mode enabled
[ 7238.516370] xe 0000:03:00.0: [drm] H\xc7D$\x08 dss mask (geometry): 00000000,00000000,000fffff
[ 7238.516373] xe 0000:03:00.0: [drm] H\xc7D$\x08 dss mask (compute): 00000000,00000000,000fffff
[ 7238.516378] xe 0000:03:00.0: [drm] H\xc7D$\x08 EU mask per DSS: 000000ff
[ 7238.516380] xe 0000:03:00.0: [drm] H\xc7D$\x08 EU type: simd16
[ 7238.516382] xe 0000:03:00.0: [drm] H\xc7D$\x08 L3 bank mask: 00000000,00ffffff
[ 7238.518553] xe 0000:03:00.0: [drm] Using GuC firmware from intel-ci/xe/bmg_guc_70.bin version 70.29.1
[ 7238.520914] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 0] = 0x0184a5cf
[ 7238.520946] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 1] = 0x00000000
[ 7238.520966] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 2] = 0x00000000
[ 7238.520984] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 3] = 0x00000003
[ 7238.521001] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 4] = 0x00004696
[ 7238.521018] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 5] = 0xe20b0000
[ 7238.521035] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 6] = 0x00000000
[ 7238.521052] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 7] = 0x00000000
[ 7238.521069] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 8] = 0x00000000
[ 7238.521085] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[ 9] = 0x00000000
[ 7238.521102] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[10] = 0x00000000
[ 7238.521118] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[11] = 0x00000000
[ 7238.521135] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[12] = 0x00000000
[ 7238.521151] xe 0000:03:00.0: [drm:guc_print_params [xe]] GT1: GuC param[13] = 0x00000000
[ 7238.531245] xe 0000:03:00.0: Direct firmware load for xe/bmg_huc.bin failed with error -2
[ 7238.531264] xe 0000:03:00.0: [drm] HuC firmware xe/bmg_huc.bin: fetch failed with error -2
[ 7238.531266] xe 0000:03:00.0: [drm] HuC firmware(s) can be downloaded from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
[ 7238.531267] xe 0000:03:00.0: [drm] *ERROR* GT1: HuC: initialization failed: -ENOENT
[ 7238.538897] xe 0000:03:00.0: [drm] *ERROR* GT1: Failed to initialize uC (-ENOENT)
[ 7238.546362] xe 0000:03:00.0: probe with driver xe failed with error -2
[ 7238.554512] xe 0000:03:00.0: [drm:xe_guc_ct_disable [xe]] GT0: GuC CT safe-mode disabled
[ 7238.558058] BUG: unable to handle page fault for address: ffffc90009815e28
[ 7238.564883] #PF: supervisor write access in kernel mode
[ 7238.570067] #PF: error_code(0x0002) - not-present page
[ 7238.575164] PGD 100000067 P4D 100000067 PUD 1002c8067 PMD 0
[ 7238.580780] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 7238.585621] CPU: 10 PID: 8426 Comm: modprobe Tainted: G U O 6.10.0-rc3+ #1
[ 7238.593641] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.3492.A00.2211291114 11/29/2022
[ 7238.606886] RIP: 0010:xe_ggtt_set_pte_and_flush+0x1a/0xc0 [xe]
[ 7238.612710] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 54 48 c1 ee 0c 55 53 48 8b 87 b0 00 00 00 48 89 fb 48 8d 04 f0 <48> 89 10 48 8b 17 4c 8b 62 10 49 8b 84 24 d0 6a 00 00 48 8b 08 f7
[ 7238.631275] RSP: 0018:ffffc9000646fa68 EFLAGS: 00010206
[ 7238.636460] RAX: ffffc90009815e28 RBX: ffff88817ad1e228 RCX: 0000000000000000
[ 7238.643533] RDX: 0000000000000000 RSI: 0000000000002bc5 RDI: ffff88817ad1e228
[ 7238.650605] RBP: 0000000000000000 R08: 00000000867d94c1 R09: ffffffffa03f0db7
[ 7238.657677] R10: ffffc9000646fab0 R11: 0000000000000000 R12: 0000000002bcafff
[ 7238.664749] R13: ffff88817ad1e228 R14: ffff888121d44000 R15: dead000000000100
[ 7238.671823] FS: 00007fd758f88c40(0000) GS:ffff88888d300000(0000) knlGS:0000000000000000
[ 7238.679841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7238.685544] CR2: ffffc90009815e28 CR3: 000000016b4ac000 CR4: 0000000000f50ef0
[ 7238.692616] PKRU: 55555554
[ 7238.695314] Call Trace:
[ 7238.697752] <TASK>
[ 7238.699845] ? __die+0x1f/0x70
[ 7238.702890] ? page_fault_oops+0x155/0x470
[ 7238.706959] ? search_extable+0x26/0x30
[ 7238.710777] ? xe_ggtt_set_pte_and_flush+0x1a/0xc0 [xe]
[ 7238.715993] ? search_module_extables+0x32/0x90
[ 7238.720491] ? exc_page_fault+0x109/0x200
[ 7238.724478] ? asm_exc_page_fault+0x26/0x30
[ 7238.728641] ? xe_ggtt_remove_node+0x97/0xf0 [xe]
[ 7238.733336] ? xe_ggtt_set_pte_and_flush+0x1a/0xc0 [xe]
[ 7238.738542] xe_ggtt_clear+0x60/0x80 [xe]
[ 7238.742547] xe_ggtt_remove_node+0xa7/0xf0 [xe]
[ 7238.747066] xe_ttm_bo_destroy+0xea/0xf0 [xe]
[ 7238.751422] drm_managed_release+0xb0/0x160 [drm]
[ 7238.756127] devm_drm_dev_init_release+0x54/0x70 [drm]
[ 7238.761248] release_nodes+0x2e/0xf0
[ 7238.764806] devres_release_all+0x8a/0xc0
[ 7238.768792] device_unbind_cleanup+0x9/0x70
[ 7238.772947] really_probe+0x1a0/0x380
[ 7238.776588] __driver_probe_device+0x73/0x150
[ 7238.780917] driver_probe_device+0x19/0x90
[ 7238.784986] __driver_attach+0xd5/0x1d0
[ 7238.788795] ? __pfx___driver_attach+0x10/0x10
[ 7238.793206] bus_for_each_dev+0x77/0xd0
[ 7238.797018] bus_add_driver+0x110/0x240
[ 7238.800833] driver_register+0x5b/0x110
[ 7238.804645] xe_init+0x3b/0x80 [xe]
[ 7238.808142] ? __pfx_xe_init+0x10/0x10 [xe]
[ 7238.812320] do_one_initcall+0x5e/0x2b0
[ 7238.816133] ? kmalloc_trace_noprof+0x27a/0x320
[ 7238.820639] do_init_module+0x5f/0x210
[ 7238.824370] init_module_from_file+0x86/0xd0
[ 7238.828615] idempotent_init_module+0x17c/0x230
[ 7238.833113] __x64_sys_finit_module+0x59/0xb0
[ 7238.837441] do_syscall_64+0x68/0x140
[ 7238.841080] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 7238.846091] RIP: 0033:0x7fd75871e88d
[ 7238.849649] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[ 7238.868212] RSP: 002b:00007ffc205d67c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 7238.875715] RAX: ffffffffffffffda RBX: 00005591fa058040 RCX: 00007fd75871e88d
[ 7238.882791] RDX: 0000000000000000 RSI: 00005591fa05e470 RDI: 000000000000000c
[ 7238.889863] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
[ 7238.896932] R10: 000000000000000c R11: 0000000000000246 R12: 00005591fa05e470
[ 7238.904006] R13: 00005591fa057e30 R14: 0000000000000000 R15: 00005591fa05e9b0
[ 7238.911081] </TASK>
[ 7238.913265] Modules linked in: xe(O+) drm_kunit_helpers drm_ttm_helper ttm drm_suballoc_helper drm_buddy gpu_sched drm_gpuvm drm_exec drm_kms_helper nfnetlink br_netfilter overlay x86_pkg_temp_thermal mei_hdcp coretemp wmi_bmof kvm_intel mei_me mei video intel_pmc_core wmi intel_vsec pmt_telemetry pmt_class fuse drm ip_tables x_tables crct10dif_pclmul crc32_pclmul e1000e i2c_i801 ptp i2c_mux ghash_clmulni_intel i2c_smbus pps_core
[ 7238.950885] CR2: ffffc90009815e28
[ 7238.954197] ---[ end trace 0000000000000000 ]---
[ 7239.464042] RIP: 0010:xe_ggtt_set_pte_and_flush+0x1a/0xc0 [xe]
[ 7239.469885] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 54 48 c1 ee 0c 55 53 48 8b 87 b0 00 00 00 48 89 fb 48 8d 04 f0 <48> 89 10 48 8b 17 4c 8b 62 10 49 8b 84 24 d0 6a 00 00 48 8b 08 f7
[ 7239.488452] RSP: 0018:ffffc9000646fa68 EFLAGS: 00010206
[ 7239.493640] RAX: ffffc90009815e28 RBX: ffff88817ad1e228 RCX: 0000000000000000
[ 7239.500713] RDX: 0000000000000000 RSI: 0000000000002bc5 RDI: ffff88817ad1e228
[ 7239.507786] RBP: 0000000000000000 R08: 00000000867d94c1 R09: ffffffffa03f0db7
[ 7239.514859] R10: ffffc9000646fab0 R11: 0000000000000000 R12: 0000000002bcafff
[ 7239.521933] R13: ffff88817ad1e228 R14: ffff888121d44000 R15: dead000000000100
[ 7239.529006] FS: 00007fd758f88c40(0000) GS:ffff88888d300000(0000) knlGS:0000000000000000
[ 7239.537020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7239.542716] CR2: ffffc90009815e28 CR3: 000000016b4ac000 CR4: 0000000000f50ef0
[ 7239.549789] PKRU: 55555554
[ 7239.552488] note: modprobe[8426] exited with irqs disabled
Edited by Ashutosh Dixit