Supermicro X13SAE-F BIOS v2.1 - no DVI-D/HDMI video signal from i7-13700K GPU when ASPEED BMC VGA is enabled
Hi, I have a system which has assummingly broken BIOS but either way, kernel messages contain some other errors and warning which might be checked as well. I posted this at https://bugzilla.kernel.org/show_bug.cgi?id=217646 (see the ZIP file with logs in there) but as nobody is subscribed to those "Other" reports I am looking for some inspection here.
1. BMC VGA masks iGPU from i7-13700K CPU
The Motherboard in its default configuration has enabled VGA from BMC controller but that somehow masks the GPU from i7-13700K CPU so that its DVI-D, HDMI a DP ports do not work (the PCI devices do appear in the logs but physical ports are not giving any signal). Linux picks up the fb1 for Wayland, which is the BMC VGA device and the local desktop session run on the BMC VGA.
$ grep -e i915 -e ' ast' -e fbcon -e fb1 dmesg-6.1.38_with_default_jumper_settings.txt
[ 0.486691] fbcon: Taking over console
[ 8.951704] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[ 8.952277] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 8.952295] Loading firmware: i915/adls_dmc_ver2_01.bin
[ 8.952656] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i9xx_always_on_power_well_ops [i915])
[ 8.956209] Loading firmware: i915/tgl_guc_70.bin
[ 9.271740] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
[ 9.631367] Loading firmware: i915/tgl_huc.bin
[ 9.720861] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
[ 9.720867] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
[ 9.723982] i915 0000:00:02.0: [drm] HuC authenticated
[ 9.724220] i915 0000:00:02.0: [drm] GuC submission enabled
[ 9.724222] i915 0000:00:02.0: [drm] GuC SLPC enabled
[ 9.724629] i915 0000:00:02.0: [drm] GuC RC: enabled
[ 9.725298] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops tfp410_ops [i915])
[ 9.725366] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[ 9.766126] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[ 9.862817] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[ 10.502116] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops __SCT__tp_func_intel_frontbuffer_flush [i915])
[ 11.349856] ast 0000:09:00.0: [drm] P2A bridge disabled, using default configuration
[ 11.349941] ast 0000:09:00.0: [drm] AST 2600 detected
[ 11.456634] ast 0000:09:00.0: [drm] Using analog VGA
[ 11.456678] ast 0000:09:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[ 11.457148] [drm] Initialized ast 0.1.0 20120228 for 0000:09:00.0 on minor 1
[ 11.461303] fbcon: astdrmfb (fb1) is primary device
[ 11.461307] fbcon: Remapping primary device, fb1, to tty 1-63
[ 11.630581] ast 0000:09:00.0: [drm] fb1: astdrmfb frame buffer device
Here an attempt to prevent binding to the BMV VGA console (still no screen using the iGPU connectors and now also no Grub menu nor boot console log in the BMC VGA either, also no gdm login screen session visible anywhere):
# dmesg | grep -e i915 -e ' ast' -e fbcon -e fb1
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.38-gentoo-dist root=UUID=5c5020ce-1d72-40f7-a136-7d3d130f817d ro fbcon=map:0
[ 0.180078] Kernel command line: BOOT_IMAGE=/vmlinuz-6.1.38-gentoo-dist root=UUID=5c5020ce-1d72-40f7-a136-7d3d130f817d ro fbcon=map:0
[ 0.486291] fbcon: Taking over console
[ 8.095507] i915 0000:00:02.0: vgaarb: deactivate vga console
[ 8.095543] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[ 8.096105] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 8.096123] Loading firmware: i915/adls_dmc_ver2_01.bin
[ 8.096487] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i9xx_always_on_power_well_ops [i915])
[ 8.096706] Loading firmware: i915/tgl_guc_70.bin
[ 8.205980] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
[ 8.215638] Loading firmware: i915/tgl_huc.bin
[ 8.290782] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
[ 8.290798] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
[ 8.294681] i915 0000:00:02.0: [drm] HuC authenticated
[ 8.295049] i915 0000:00:02.0: [drm] GuC submission enabled
[ 8.295053] i915 0000:00:02.0: [drm] GuC SLPC enabled
[ 8.295454] i915 0000:00:02.0: [drm] GuC RC: enabled
[ 8.296298] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops tfp410_ops [i915])
[ 8.296402] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[ 8.349250] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[ 8.350614] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_fence_ops [i915])
[ 8.533978] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[ 8.594476] ast 0000:09:00.0: [drm] P2A bridge disabled, using default configuration
[ 8.594501] ast 0000:09:00.0: [drm] AST 2600 detected
[ 8.699943] ast 0000:09:00.0: [drm] Using analog VGA
[ 8.700022] ast 0000:09:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[ 8.700258] [drm] Initialized ast 0.1.0 20120228 for 0000:09:00.0 on minor 1
[ 8.703847] ast 0000:09:00.0: [drm] fb1: astdrmfb frame buffer device
[ Note: Sorry but seems I forgot do backup the dmesg file, here is I believe merely same attempt with fbcon=map:0
and SRIOV/DMAR/ASPM enabled in BIOS: dmesg-6.1.38.txt]
Likewise, gdm session is running but I do not see it on the BMC VGA screen not on DVI-D/HDMI/DP ports of the iGPU (IPMI/BMC console has not screen signal and as always, the iGPU connectors provide no signal either):
# dmesg | grep -e i915 -e ' ast' -e fbcon -e fb1
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.38-gentoo-dist root=UUID=5c5020ce-1d72-40f7-a136-7d3d130f817d ro fbcon=map:1
[ 0.179960] Kernel command line: BOOT_IMAGE=/vmlinuz-6.1.38-gentoo-dist root=UUID=5c5020ce-1d72-40f7-a136-7d3d130f817d ro fbcon=map:1
[ 0.486118] fbcon: Taking over console
[ 8.336084] i915 0000:00:02.0: vgaarb: deactivate vga console
[ 8.336133] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[ 8.336741] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 8.336760] Loading firmware: i915/adls_dmc_ver2_01.bin
[ 8.337304] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i9xx_always_on_power_well_ops [i915])
[ 8.337554] Loading firmware: i915/tgl_guc_70.bin
[ 8.391201] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
[ 8.467600] Loading firmware: i915/tgl_huc.bin
[ 8.516118] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
[ 8.516124] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
[ 8.518582] i915 0000:00:02.0: [drm] HuC authenticated
[ 8.518834] i915 0000:00:02.0: [drm] GuC submission enabled
[ 8.518836] i915 0000:00:02.0: [drm] GuC SLPC enabled
[ 8.519194] i915 0000:00:02.0: [drm] GuC RC: enabled
[ 8.519799] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops tfp410_ops [i915])
[ 8.519839] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[ 8.563311] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[ 8.564362] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_fence_ops [i915])
[ 8.607535] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[ 8.696779] ast 0000:09:00.0: [drm] P2A bridge disabled, using default configuration
[ 8.696784] ast 0000:09:00.0: [drm] AST 2600 detected
[ 8.803397] ast 0000:09:00.0: [drm] Using analog VGA
[ 8.803411] ast 0000:09:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[ 8.803726] [drm] Initialized ast 0.1.0 20120228 for 0000:09:00.0 on minor 1
[ 8.820104] ast 0000:09:00.0: [drm] fb1: astdrmfb frame buffer device
[ Note: Again, matching full dmesg was lost by me but here is a similar one with fbcon=map:1
and SRIOV/DMAR/ASPM enabled in BIOS:: dmesg-6.1.38-startx-came_up_on_iGPU_HDMI.txt]
Below is a difference in lspci output if one disabled the BMC VGA by a jumper switch. Moving the jumper to 2-3 positions enables the HDMI and DP ports, the DVI-D is still dead.
Could Linux kernel or i915 driver do something better? Further below I also show a memory address range conflict overlapping the iGPU.
$ diff -u -w lspci-6.1.38_with_default_jumper_settings.txt JPG1_pins_2-3_shortcut/lspci.txt
--- lspci-6.1.38_with_default_jumper_settings.txt 2023-07-09 21:16:10.633685053 +0200
+++ JPG1_pins_2-3_shortcut/lspci.txt 2023-07-08 10:00:38.000000000 +0200
@@ -26,4 +26,3 @@
05:00.0 PCI bridge: Integrated Technology Express, Inc. IT8893E PCIe to PCI Bridge (rev 41)
07:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-LM (rev 03)
08:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 06)
-09:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52)
In the default setup when BMC VGA port is enabled (and as primary) this error is logged by kernel, maybe a hint about the iGPU being masked?
Below is a snippet from 6.3.9 kernel: dmesg-6.3.9.txt:
[ 7.638492] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[ 7.638809] ------------[ cut here ]------------
[ 7.638810] i915 0000:00:02.0: Port A asks to use VBT vswing/preemph tables
[ 7.638850] WARNING: CPU: 8 PID: 241 at drivers/gpu/drm/i915/display/intel_bios.c:2709 intel_bios_init+0x19c6/0x2090 [i915]
[ 7.639041] Modules linked in: i915(+) drm_buddy crc32_pclmul crc32c_intel sha512_ssse3 aesni_intel crypto_simd ttm ast(+) cryptd spi_intel_pci nvme i2c_algo_bit drm_display_helper igc e1000e spi_intel nvme_core xhci_pci cec xhci_pci
_renesas nvme_common intel_gtt video wmi
[ 7.639058] CPU: 8 PID: 241 Comm: (udev-worker) Not tainted 6.3.9-arch1-1 #1 124dc55df4f5272ccb409f39ef4872fc2b3376a2
[ 7.639061] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 2.1 04/06/2023
[ 7.639062] RIP: 0010:intel_bios_init+0x19c6/0x2090 [i915]
[ 7.639174] Code: 48 8b 7d 08 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 0f fa 09 ce 44 89 e1 48 89 da 48 c7 c7 88 1c b9 c0 48 89 c6 e8 3a 5b 86 cd <0f> 0b e9 7c f8 ff ff 80 fa 01 45 19 d2 41 81 e2 15 b7 ff ff 41 81
[ 7.639175] RSP: 0018:ffffbbfb828079d0 EFLAGS: 00010282
[ 7.639177] RAX: 0000000000000000 RBX: ffff97b941fab260 RCX: c0000000ffffefff
[ 7.639178] RDX: 0000000000000000 RSI: 00000000ffffefff RDI: 0000000000000001
[ 7.639179] RBP: ffff97b95d470000 R08: 0000000000000000 R09: ffffbbfb82807860
[ 7.639180] R10: 0000000000000003 R11: ffffffff900ca1e8 R12: 0000000000000041
[ 7.639181] R13: 0000000000000000 R14: ffff97b95d470000 R15: 0000000000000000
[ 7.639182] FS: 00007f39c4c1e200(0000) GS:ffff97d83fa00000(0000) knlGS:0000000000000000
[ 7.639183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.639184] CR2: 000055f285a86708 CR3: 000000011a6b4000 CR4: 0000000000f50ee0
[ 7.639185] PKRU: 55555554
[ 7.639187] Call Trace:
[ 7.639195] <TASK>
[ 7.639196] ? intel_bios_init+0x19c6/0x2090 [i915 e4cc587302673c22b84ac3f9158a6e29e0aeefc7]
[ 7.639313] ? __warn+0x81/0x130
[ 7.639320] ? intel_bios_init+0x19c6/0x2090 [i915 e4cc587302673c22b84ac3f9158a6e29e0aeefc7]
[ 7.639417] ? report_bug+0x171/0x1a0
[ 7.639423] ? prb_read_valid+0x1b/0x30
[ 7.639426] ? handle_bug+0x3c/0x80
[ 7.639429] ? exc_invalid_op+0x17/0x70
[ 7.639430] ? asm_exc_invalid_op+0x1a/0x20
[ 7.639440] ? intel_bios_init+0x19c6/0x2090 [i915 e4cc587302673c22b84ac3f9158a6e29e0aeefc7]
[ 7.639536] ? drm_vblank_worker_init+0x6b/0x80
[ 7.639540] intel_modeset_init_noirq+0x39/0x260 [i915 e4cc587302673c22b84ac3f9158a6e29e0aeefc7]
[ 7.639649] i915_driver_probe+0x63a/0xc50 [i915 e4cc587302673c22b84ac3f9158a6e29e0aeefc7]
[ 7.639740] local_pci_probe+0x42/0xa0
[ 7.639744] pci_device_probe+0xc1/0x260
[ 7.639746] ? sysfs_do_create_link_sd+0x6e/0xe0
[ 7.639749] really_probe+0x19b/0x3e0
[ 7.639753] ? __pfx___driver_attach+0x10/0x10
[ 7.639754] __driver_probe_device+0x78/0x160
[ 7.639755] driver_probe_device+0x1f/0x90
[ 7.639756] __driver_attach+0xd2/0x1c0
[ 7.639759] bus_for_each_dev+0x85/0xd0
[ 7.639765] bus_add_driver+0x116/0x220
[ 7.639768] driver_register+0x59/0x100
[ 7.639769] i915_init+0x22/0xc0 [i915 e4cc587302673c22b84ac3f9158a6e29e0aeefc7]
[ 7.639853] ? __pfx_init_module+0x10/0x10 [i915 e4cc587302673c22b84ac3f9158a6e29e0aeefc7]
[ 7.639948] do_one_initcall+0x5a/0x240
[ 7.639952] do_init_module+0x4a/0x200
[ 7.639955] __do_sys_finit_module+0xad/0x130
[ 7.639959] do_syscall_64+0x5d/0x90
[ 7.639971] ? ksys_mmap_pgoff+0xec/0x1f0
[ 7.639974] ? syscall_exit_to_user_mode+0x1b/0x40
[ 7.639977] ? do_syscall_64+0x6c/0x90
[ 7.639979] ? syscall_exit_to_user_mode+0x1b/0x40
[ 7.639980] ? do_syscall_64+0x6c/0x90
[ 7.639982] ? syscall_exit_to_user_mode+0x1b/0x40
[ 7.639983] ? do_syscall_64+0x6c/0x90
[ 7.639985] ? syscall_exit_to_user_mode+0x1b/0x40
`[ 7.639987] ? do_syscall_64+0x6c/0x90
[ 7.639988] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 7.639990] RIP: 0033:0x7f39c56c72ed
[ 7.639994] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 7a 0d 00 f7 d8 64 89 01 48
[ 7.639996] RSP: 002b:00007fffa2072c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 7.639997] RAX: ffffffffffffffda RBX: 000055f285a68b60 RCX: 00007f39c56c72ed
[ 7.639998] RDX: 0000000000000000 RSI: 00007f39c5823343 RDI: 000000000000000f
[ 7.639999] RBP: 00007f39c5823343 R08: 0000000000000000 R09: 00007fffa2072da0
[ 7.640000] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
[ 7.640000] R13: 000055f285a6b7c0 R14: 000055f285a68b60 R15: 000055f285a6b570
[ 7.640003] </TASK>
[ 7.640003] ---[ end trace 0000000000000000 ]---
[ 7.641027] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 7.642789] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
[ 7.656787] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.5.1
[ 7.656798] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[ 7.661265] i915 0000:00:02.0: [drm] HuC authenticated
[ 7.662118] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[ 7.662119] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[ 7.663485] i915 0000:00:02.0: [drm] GuC RC: enabled
[ 7.664817] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[ 7.666178] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
[ 7.667082] ACPI: video: Video Device [GFX0] (multi-head: yes rom: no post: no)
[ 7.667458] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input3
[ 7.667999] i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
[ 7.668228] i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
If the BMC VGA port is disabled, it does not happen with 6.1.0 and 6.1.38 kernels:
Here is log from 6.1.0 kernel dmesg-6.1.0.txt:
[ 222.010258] i915 0000:00:02.0: vgaarb: deactivate vga console
[ 222.010289] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[ 222.010853] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 222.010869] Loading firmware: i915/adls_dmc_ver2_01.bin
[ 222.015724] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
[ 222.015901] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
[ 222.016142] Loading firmware: i915/tgl_guc_70.bin
[ 222.024883] Loading firmware: i915/tgl_huc.bin
[ 222.026479] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
[ 222.038785] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
[ 222.038788] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
[ 222.041866] i915 0000:00:02.0: [drm] HuC authenticated
[ 222.042116] i915 0000:00:02.0: [drm] GuC submission enabled
[ 222.042117] i915 0000:00:02.0: [drm] GuC SLPC enabled
[ 222.042556] i915 0000:00:02.0: [drm] GuC RC: enabled
[ 222.057682] intel_tcc_cooling: Programmable TCC Offset detected
[ 222.071241] intel_rapl_msr: PL4 support detected.
[ 222.071256] intel_rapl_common: Found RAPL domain package
[ 222.071258] intel_rapl_common: Found RAPL domain core
[ 222.071258] intel_rapl_common: Found RAPL domain uncore
[ 222.099965] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[ 222.100608] ACPI: video: Video Device [GFX0] (multi-head: yes rom: no post: no)
[ 222.100939] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input9
[ 222.100988] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[ 222.144410] fbcon: i915drmfb (fb0) is primary device
[ 222.152095] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x002a7c, prod_id: 0x1c48, dev_id: 0x20)
[ 222.193961] ipmi_si IPI0001:00: IPMI kcs interface initialized
[ 222.243915] Console: switching to colour frame buffer device 240x67
[ 222.283925] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
Here is log from 6.1.38 kernel dmesg-6.1.38.txt:
[ 7.447157] i915 0000:00:02.0: vgaarb: deactivate vga console
[ 7.447190] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[ 7.447759] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 7.447777] Loading firmware: i915/adls_dmc_ver2_01.bin
[ 7.448153] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i9xx_always_on_power_well_ops [i915])
[ 7.451741] Loading firmware: i915/tgl_guc_70.bin
[ 7.505734] Loading firmware: regulatory.db.p7s
[ 7.541314] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
[ 7.580066] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
[ 7.648916] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x002a7c, prod_id: 0x1c48, dev_id: 0x20)
[ 7.690249] ipmi_si IPI0001:00: IPMI kcs interface initialized
[ 7.743002] Loading firmware: i915/tgl_huc.bin
[ 7.747996] snd_hda_intel 0000:00:1f.3: enabling device (0000 -> 0002)
[ 7.748935] ipmi_ssif: IPMI SSIF Interface driver
[ 7.832168] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
[ 7.832173] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
[ 7.835269] i915 0000:00:02.0: [drm] HuC authenticated
[ 7.835727] i915 0000:00:02.0: [drm] GuC submission enabled
[ 7.835728] i915 0000:00:02.0: [drm] GuC SLPC enabled
[ 7.836091] i915 0000:00:02.0: [drm] GuC RC: enabled
[ 7.836704] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops tfp410_ops [i915])
[ 7.836741] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[ 7.869400] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[ 7.870563] ACPI: video: Video Device [GFX0] (multi-head: yes rom: no post: no)
[ 7.871156] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input9
[ 7.871239] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_fence_ops [i915])
[ 7.885949] fbcon: i915drmfb (fb0) is primary device
[ 7.946236] Console: switching to colour frame buffer device 240x67
[ 7.966306] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
Could somebody tell me if having disabled DVI-D/HDMI ports of the iGPU is a BIOS bug or design defect or CPU/MB HW error?
2. Hardware Error:
Few times I observed a Hardware Error reported in dmesg but mostly I missed the output. Those lines should be repeated later on during kernel startup, I cannot capture them before the switch to fb console which clear the screen but even because I cannot really scroll back the screen on a Live USB distro kernel. You can grep for this one case in the logs I collected, merely randomly during my tests.
The blow snippet is from 6.1.0 kernel dmesg-6.1.0.txt :
[ 22.582835] BERT: Error records from previous boot:
[ 22.583067] [Hardware Error]: event severity: fatal
[ 22.583302] [Hardware Error]: Error 0, type: fatal
[ 22.583531] [Hardware Error]: section_type: Firmware Error Record Reference
[ 22.583763] [Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2
[ 22.583994] [Hardware Error]: Revision: 2
[ 22.584226] [Hardware Error]: Record Identifier: 8f87f311-c998-4d9e-a0c4-6065518c4f6d
[ 22.584459] [Hardware Error]: 00000000: 0a98d977 735b3069 a22943b3 702e9bd6 w...i0[s.C)....p
[ 22.584695] [Hardware Error]: 00000010: 3ad7f95c b0e932ff 098c0203 ce8e89b8 \..:.2..........
[ 22.584929] [Hardware Error]: 00000020: 651ab7df c67fb557 953abdfa e2625e67 ...eW.....:.g^b.
[ 22.585167] [Hardware Error]: 00000030: 20973cf8 2bff02b1 6a93b637 fef970e3 .<. ...+7..j.p..
[ 22.585407] [Hardware Error]: 00000040: 9a9fe802 c6f9f2d6 20c2068b 1bee2df4 ........... .-..
[ 22.585646] [Hardware Error]: 00000050: 524af6ac d19544dc b0ce7fff bc793a6f ..JR.D......o:y.
[ 22.585883] [Hardware Error]: 00000060: 98bca064 6d89267c 00056460 08e189ca d...|&.m`d......
[ 22.586121] [Hardware Error]: 00000070: 307e697d 4a911584 072bc34e ba295624 }i~0...JN.+.$V).
[ 22.586362] [Hardware Error]: 00000080: 7c47fb18 1bea011d c702ef20 37eb94c2 ..G|.... ......7
[ 22.586609] [Hardware Error]: 00000090: 0b757561 ad60a070 2a82daaa 4ad5be70 auu.p.`....*p..J
[ 22.586852] [Hardware Error]: 000000a0: 910f998c 6df509db 0b897ceb 3c5b9d09 .......m.|....[<
[ 22.587098] [Hardware Error]: 000000b0: 023a755f a3cbb5d7 204fc1fa e1f9992c _u:.......O ,...
[ 22.587341] [Hardware Error]: 000000c0: 73c2ec07 62fd59f3 6aca21bf c01191d7 ...s.Y.b.!.j....
[ 22.587591] [Hardware Error]: 000000d0: 508f4ed5 f6f9b297 ff1ce921 393e3eb6 .N.P....!....>>9
[ 22.587839] [Hardware Error]: 000000e0: f410cbe6 c29ed832 42a9ff9d 7d752594 ....2......B.%u}
[ 22.588088] [Hardware Error]: 000000f0: 0855feb8 e1d012dc 338c2904 48b97e57 ..U......).3W~.H
[ 22.588339] [Hardware Error]: 00000100: cbcf7e6a 9d8de265 14cdeddf d8306995 j~..e........i0.
[ 22.588589] [Hardware Error]: 00000110: 7c9493ce fb5194ef 814ff54d 36041032 ...|..Q.M.O.2..6
[ 22.588838] [Hardware Error]: 00000120: 58423378 63a12000 dc3972ee 777070d9 x3BX. .c.r9..ppw
[ 22.589088] [Hardware Error]: 00000130: f89b9b3f ccb9db28 f1916573 63c62655 ?...(...se..U&.c
[ 22.589339] [Hardware Error]: 00000140: e0cf73fb 47db601b bf26ef30 0a64a079 .s...`.G0.&.y.d.
[ 22.589589] [Hardware Error]: 00000150: be032c99 f8caa378 04acb9f7 7ced2926 .,..x.......&).|
[ 22.589843] [Hardware Error]: 00000160: b27edd7f 77ecceea d3e8eade be7b803e ..~....w....>.{.
[ 22.590099] [Hardware Error]: 00000170: c8f532b9 faf6c35b 600d7ebe 6df98582 .2..[....~.`...m
[ 22.590353] [Hardware Error]: 00000180: 0a2a63d9 fa9b0c2f a14aa371 c7baac80 .c*./...q.J.....
[ 22.590604] [Hardware Error]: 00000190: 86bddb56 e6478b0a 83293f3b 4342143b V.....G.;?).;.BC
[ 22.590856] [Hardware Error]: 000001a0: a198cc64 171f0ef2 14dfdaa9 3be6b6e2 d..............;
[ 22.591113] [Hardware Error]: 000001b0: ca51ef54 ba7c8b92 3d136f35 f35a5284 T.Q...|.5o.=.RZ.
[ 22.591370] [Hardware Error]: 000001c0: d6578b9f ffb9bb40 3a5b20fa 3f30494a ..W.@.... [:JI0?
[ 22.591628] [Hardware Error]: 000001d0: 040ff8fb ffd86c8c 930afe8d ff53be45 .....l......E.S.
[ 22.591885] [Hardware Error]: 000001e0: a2043792 1ef2f00c a74d3558 92a989cc .7......X5M.....
[ 22.592145] [Hardware Error]: 000001f0: e213a2fd 6f3db716 4806ba39 84e3f43c ......=o9..H<...
[ 22.592401] [Hardware Error]: Error 1, type: fatal
[ 22.592657] [Hardware Error]: section_type: Firmware Error Record Reference
[ 22.592917] [Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2
[ 22.593182] [Hardware Error]: Revision: 2
[ 22.593443] [Hardware Error]: Record Identifier: 8f87f311-c998-4d9e-a0c4-6065518c4f6d
[ 22.593710] [Hardware Error]: 00000000: 475fef7e 59ee379e 858cc656 5a653e9b ~._G.7.YV....>eZ
[ 22.593979] [Hardware Error]: 00000010: 27ffedba bde582c3 9c1d9364 fd675d05 ...'....d....]g.
[ 22.594252] [Hardware Error]: 00000020: 2c2a1317 787ba9b0 ..*,..{x
[ 22.594524] BERT: Total records found: 1
Here is another one from 6.1.38 kernel dmesg-6.1.38_with_default_jumper_settings.txt:
[ 1.501876] BERT: Error records from previous boot:
[ 1.504741] [Hardware Error]: event severity: fatal
[ 1.507611] [Hardware Error]: Error 0, type: fatal
[ 1.510437] [Hardware Error]: section_type: Firmware Error Record Reference
[ 1.513295] [Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2
[ 1.516203] [Hardware Error]: Revision: 2
[ 1.519047] [Hardware Error]: Record Identifier: 8f87f311-c998-4d9e-a0c4-6065518c4f6d
[ 1.521955] [Hardware Error]: 00000000: 0a98d86f 73583009 a22943b3 702e9bd6 o....0Xs.C)....p
[ 1.524907] [Hardware Error]: 00000010: 3ed7f154 b8e932fd 098c0203 0e8e9938 T..>.2......8...
[ 1.527855] [Hardware Error]: 00000020: 653ab7df c67fb557 d532bdfa e2625e67 ..:eW.....2.g^b.
[ 1.530808] [Hardware Error]: 00000030: 20973cf8 0bf70291 6b93a637 7ef97063 .<. ....7..kcp.~
[ 1.533770] [Hardware Error]: 00000040: 9a9fe802 cef9d2d6 60c2468b 1bee2df4 .........F.`.-..
[ 1.536719] [Hardware Error]: 00000050: 5f4af6ac 919544dc b0c67ffe 3c79306f ..J_.D......o0y<
[ 1.539650] [Hardware Error]: 00000060: 18bea0e4 6d89267c 00056561 0ae109ca ....|&.mae......
[ 1.542591] [Hardware Error]: 00000070: 307ce87d 4a911584 050bc34a ba295625 }.|0...JJ...%V).
[ 1.545539] [Hardware Error]: 00000080: 7c47fb18 19fa011d c702ef20 37eb94c2 ..G|.... ......7
[ 1.548508] [Hardware Error]: 00000090: 1a577760 a5e0a070 2a82da9a 4ad5fe70 `wW.p......*p..J
[ 1.551500] [Hardware Error]: 000000a0: 810b998c edf501db 0bab7ce8 3c5b9989 .........|....[<
[ 1.554516] [Hardware Error]: 000000b0: 023a755f 83cb9557 204fe5ff e1719d2c _u:.W.....O ,.q.
[ 1.557543] [Hardware Error]: 000000c0: 73c2ec07 62fd79f3 6aca21bf 40119157 ...s.y.b.!.jW..@
[ 1.560566] [Hardware Error]: 000000d0: 000f4ed5 f6f9b295 ff1ce900 397e3cb6 .N...........<~9
[ 1.563602] [Hardware Error]: 000000e0: f4104fe7 c29ed832 02a8bf9d 3df72594 .O..2........%.=
[ 1.566642] [Hardware Error]: 000000f0: 08d5f638 61d412dc 334c2944 48b95617 8......aD)L3.V.H
[ 1.569694] [Hardware Error]: 00000100: cbcf7e4a 9d8dea6d 14cdeddf d83068b5 J~..m........h0.
[ 1.572761] [Hardware Error]: 00000110: 7c9493da fb5384ef a14fb54d be451032 ...|..S.M.O.2.E.
[ 1.575792] [Hardware Error]: 00000120: 584a7378 63a02008 dc3932ee 777050d9 xsJX. .c.29..Ppw
[ 1.578794] [Hardware Error]: 00000130: f89b9f3f ccb9db60 f1916573 63c62655 ?...`...se..U&.c
[ 1.581807] [Hardware Error]: 00000140: e0cf71fb 4753601b bf26ef30 82648079 .q...`SG0.&.y.d.
[ 1.584838] [Hardware Error]: 00000150: be032d99 f8caa378 04acb9f7 7eed2926 .-..x.......&).~
[ 1.587887] [Hardware Error]: 00000160: b07edd7f 77ecceea 93e8ead6 be7b803e ..~....w....>.{.
[ 1.590954] [Hardware Error]: 00000170: c8f532b9 fad6835b 400f7ebe 6dd88583 .2..[....~.@...m
[ 1.594002] [Hardware Error]: 00000180: 0a2a63d9 fa9b4c2b a34aa371 c5baac82 .c*.+L..q.J.....
[ 1.597028] [Hardware Error]: 00000190: 8699db7e e6478b0a 83293d7b 43425423 ~.....G.{=).#TBC
[ 1.600058] [Hardware Error]: 000001a0: a198e444 179f0ef2 145f9aa9 3be6b6e2 D........._....;
[ 1.603119] [Hardware Error]: 000001b0: 8b51ef54 98cc9b92 3d136f35 f35a5284 T.Q.....5o.=.RZ.
[ 1.606177] [Hardware Error]: 000001c0: d6d78b9f dfb8bbc0 3a5b00fa 3f30485a ..........[:ZH0?
[ 1.609233] [Hardware Error]: 000001d0: 004ff8ff efd86c88 9302fe8d ff11bcc5 ..O..l..........
[ 1.612301] [Hardware Error]: 000001e0: 82043792 16f2f00c 256d3558 12a98d8c .7......X5m%....
[ 1.615372] [Hardware Error]: 000001f0: e213a2fd 6e3d7f16 4806ba39 a4e3f438 ......=n9..H8...
[ 1.618452] [Hardware Error]: Error 1, type: fatal
[ 1.621528] [Hardware Error]: section_type: Firmware Error Record Reference
[ 1.624673] [Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2
[ 1.627884] [Hardware Error]: Revision: 2
[ 1.631079] [Hardware Error]: Record Identifier: 8f87f311-c998-4d9e-a0c4-6065518c4f6d
[ 1.634380] [Hardware Error]: 00000000: 464dff7e 58ee379e 87acd654 5a653e9b ~.MF.7.XT....>eZ
[ 1.637762] [Hardware Error]: 00000010: 2dff6fba bde582c3 9c1d9364 dd665d14 .o.-....d....]f.
[ 1.641188] [Hardware Error]: 00000020: 2c2a1317 787aadb0 ..*,..zx
[ 1.644640] BERT: Total records found: 1
I have some camera pictures of I think two more, but mostly incomplete BERT outputs.
Also I saved ACPI tables and tried to decode them.
3. Other issues:
Some PNP device overlaps the iGPU VGA:
$ grep overlaps JPG1_pins_2-3_shortcut/dmesg-6.1.38.txt
[ 0.790137] pnp 00:06: disabling [mem 0xc0000000-0xcfffffff] because it overlaps 0000:00:02.0 BAR 9 [mem 0x00000000-0xdfffffff 64bit pref]
$
Edit: The overlap
message is gone by enabling in BIOS option related to SRIOV, DMAR, and handing ASPM control given to OS. I enabled all of these at once so will not speculate which of the three fixed that.
$ grep ipmi JPG1_pins_2-3_shortcut/dmesg-6.1.38.txt
[ 7.211389] ipmi device interface
[ 7.266023] ipmi_si: IPMI System Interface driver
[ 7.266270] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[ 7.266510] ipmi_platform: ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0
[ 7.266771] ipmi_si: Adding SMBIOS-specified kcs state machine
[ 7.267064] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
[ 7.267363] ipmi_si IPI0001:00: ipmi_platform: [io 0x0ca2] regsize 1 spacing 1 irq 0
[ 7.298427] ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI
[ 7.298696] ipmi_si: Adding ACPI-specified kcs state machine
[ 7.298974] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0
[ 7.580066] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
[ 7.648916] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x002a7c, prod_id: 0x1c48, dev_id: 0x20)
[ 7.690249] ipmi_si IPI0001:00: IPMI kcs interface initialized
[ 7.748935] ipmi_ssif: IPMI SSIF Interface driver
Here is a full dmesg from 6.1.38 kernel with BMC VGA enabled: dmesg-6.1.38_with_default_jumper_settings.txt Here is a full dmesg from 6.1.38 kernel with BMC VGA disabled: dmesg-6.1.38.txt
Thank you for your thoughts.