amdgpu_discovery_init fails when no monitor is connected
Brief summary of the problem:
There are 3 graphics devices in this machine, a motherboard adaptor (Intel), a 6600XT connected to the PCIx16 slot on the mainboard, and an 3050 GTX connected to the other PCIx16 slot via a riser cable.
If I boot the machine without a display connector plugged into the 6600XT, the device is invisible from userspace, and I notice errors in dmesg,
[ 4.721059] [drm] amdgpu kernel modesetting enabled.
[ 4.721134] amdgpu: CRAT table not found
[ 4.721138] amdgpu: Virtual CRAT table created for CPU
[ 4.721148] amdgpu: Topology: Add CPU node
[ 4.721244] amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
[ 4.721958] amdgpu 0000:03:00.0: amdgpu: get invalid ip discovery binary signature from vram
[ 4.721960] amdgpu 0000:03:00.0: amdgpu: amdgpu_discovery is not set properly
[ 4.721961] amdgpu 0000:03:00.0: amdgpu: failed to read ip discovery binary from file
[ 4.721966] [drm:amdgpu_discovery_reg_base_init [amdgpu]] *ERROR* amdgpu_discovery_init failed
[ 4.722377] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[ 4.722380] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
[ 4.722414] amdgpu: probe of 0000:03:00.0 failed with error -22
If instead I boot with a DP connector plugged to the 6600XT, I see the correct boot sequence,
[ 4.674409] [drm] amdgpu kernel modesetting enabled.
[ 4.674480] amdgpu: CRAT table not found
[ 4.674483] amdgpu: Virtual CRAT table created for CPU
[ 4.674496] amdgpu: Topology: Add CPU node
[ 4.674612] amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
[ 4.703327] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 4.703330] amdgpu: ATOM BIOS: 113-D534-R66E
[ 4.716623] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 4.716629] amdgpu 0000:03:00.0: amdgpu: PCIE atomic ops is not supported
[ 4.716675] amdgpu 0000:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[ 4.716677] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 4.716678] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 4.717026] [drm] amdgpu: 8176M of VRAM memory ready
[ 4.717028] [drm] amdgpu: 15509M of GTT memory ready.
[ 8.060710] amdgpu 0000:03:00.0: amdgpu: STB initialized to 2048 entries
[ 8.061802] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[ 8.226462] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 8.244083] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 8.244106] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2900 (59.41.0)
[ 8.244112] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[ 8.244140] amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
[ 8.292744] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[ 8.533245] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 8.533392] amdgpu: sdma_bitmap: ffff
[ 8.588166] amdgpu: HMM registered 8176MB device memory
[ 8.588335] amdgpu: SRAT table not found
[ 8.588337] amdgpu: Virtual CRAT table created for GPU
[ 8.588531] amdgpu: Topology: Add dGPU node [0x73ff:0x1002]
[ 8.588536] kfd kfd: amdgpu: added device 1002:73ff
[ 8.588556] amdgpu 0000:03:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 28
[ 8.588682] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 8.588685] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 8.588687] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 8.588689] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 8.588690] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 8.588692] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 8.588694] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 8.588696] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 8.588698] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 8.588700] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 8.588702] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 8.588705] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 8.588706] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[ 8.588708] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[ 8.588710] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[ 8.588712] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[ 8.589591] amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
[ 8.590162] [drm] Initialized amdgpu 3.52.0 20150101 for 0000:03:00.0 on minor 1
How can I use the device headless?
Hardware description:
- CPU: Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz
- GPU:
*-display
description: VGA compatible controller
product: Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73FF]
vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]
physical id: 0
bus info: pci@0000:03:00.0
logical name: /dev/fb1
version: c7
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom fb
configuration: depth=32 driver=amdgpu latency=0 resolution=3440,1440
resources: irq:31 memory:e0000000-efffffff memory:f0000000-f01fffff ioport:e000(size=256) memory:f7800000-f78fffff memory:f7900000-f791ffff
*-display
description: VGA compatible controller
product: IvyBridge GT2 [HD Graphics 4000] [8086:162]
vendor: Intel Corporation [8086]
physical id: 2
bus info: pci@0000:00:02.0
logical name: /dev/fb0
version: 09
width: 64 bits
clock: 33MHz
capabilities: msi pm vga_controller bus_master cap_list rom fb
configuration: depth=32 driver=i915 latency=0 resolution=3440,1440
resources: irq:30 memory:f7400000-f77fffff memory:b0000000-bfffffff ioport:f000(size=64) memory:c0000-dffff
*-display
description: VGA compatible controller
product: GA106 [Geforce RTX 3050] [10DE:2507]
vendor: NVIDIA Corporation [10DE]
physical id: 0
bus info: pci@0000:04:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:35 memory:f6000000-f6ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:d000(size=128) memory:f7000000-f707ffff
- System Memory: 32GB
- Display(s): LG 34WN750-B 34" UltraWide QHD IPS Monitor with AMD FreeSync
- Type of Display Connection: DP
System information:
- Distro name and Version: Ubuntu 22.10
- Kernel version: Linux desktop 6.4.3-060403-generic #202307110536 SMP PREEMPT_DYNAMIC Tue Jul 11 05:43:58 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
Plug the 6600XT in a PCI slot without an adaptor plugged, notice that it doesn't get initialized correctly.