The screen froze when amdgpu started when booting the 6.11.0-0.rc0.20240719git720261cfc732.7.fc41 kernel
Brief summary of the problem:
I booted the Fedora Rawhide KDE live image Fedora-KDE-Live-x86_64-Rawhide-20240721.n.0.iso on bare metal on an hp laptop with an AMD A10-9620P CPU and Radeon R5 GPU. The screen froze when amdgpu started when booting the 6.11.0-0.rc0.20240719git720261cfc732.7.fc41 kernel in that image. The boot continued even though the screen didn't change since I heard the Plasma startup sound later. In order to get the kernel logs, I installed 6.11.0-0.rc0.20240719git720261cfc732.7.fc41 in a Fedora 40 KDE installation. The journal showed amdgpu failed to start with errors including "ERROR ring kiq_0.2.1.0 test failed (-110)", "ERROR hw_init of IP block <gfx_v8_0> failed -110", and "amdgpu: amdgpu_device_ip_init failed". errno 110 showed "ETIMEDOUT 110 Connection timed out". Many warnings in amdgpu_irq_put were shown after that.
Jul 22 15:46:51 kernel: [drm] amdgpu kernel modesetting enabled.
Jul 22 15:46:51 kernel: amdgpu: Virtual CRAT table created for CPU
Jul 22 15:46:51 kernel: amdgpu: Topology: Add CPU node
Jul 22 15:46:51 kernel: [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x103C:0x8332 0xCA).
Jul 22 15:46:51 kernel: [drm] register mmio base: 0xF0400000
Jul 22 15:46:51 kernel: [drm] register mmio size: 262144
Jul 22 15:46:51 kernel: [drm] add ip block number 0 <vi_common>
Jul 22 15:46:51 kernel: [drm] add ip block number 1 <gmc_v8_0>
Jul 22 15:46:51 kernel: [drm] add ip block number 2 <cz_ih>
Jul 22 15:46:51 kernel: [drm] add ip block number 3 <gfx_v8_0>
Jul 22 15:46:51 kernel: [drm] add ip block number 4 <sdma_v3_0>
Jul 22 15:46:51 kernel: [drm] add ip block number 5 <powerplay>
Jul 22 15:46:51 kernel: [drm] add ip block number 6 <dm>
Jul 22 15:46:51 kernel: [drm] add ip block number 7 <uvd_v6_0>
Jul 22 15:46:51 kernel: [drm] add ip block number 8 <vce_v3_0>
Jul 22 15:46:51 kernel: [drm] add ip block number 9 <acp_ip>
Jul 22 15:46:51 kernel: amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS from VFCT
Jul 22 15:46:52 kernel: amdgpu: ATOM BIOS: 113-C75100-031
Jul 22 15:46:52 kernel: [drm] UVD is enabled in physical mode
Jul 22 15:46:52 kernel: [drm] VCE enabled in physical mode
Jul 22 15:46:52 kernel: Console: switching to colour dummy device 80x25
Jul 22 15:46:52 kernel: amdgpu 0000:00:01.0: vgaarb: deactivate vga console
Jul 22 15:46:52 kernel: amdgpu 0000:00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
Jul 22 15:46:52 kernel: [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
Jul 22 15:46:52 kernel: amdgpu 0000:00:01.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
Jul 22 15:46:52 kernel: amdgpu 0000:00:01.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
Jul 22 15:46:52 kernel: [drm] Detected VRAM RAM=512M, BAR=512M
Jul 22 15:46:52 kernel: [drm] RAM width 64bits UNKNOWN
Jul 22 15:46:52 kernel: [drm] amdgpu: 512M of VRAM memory ready
Jul 22 15:46:52 kernel: [drm] amdgpu: 3697M of GTT memory ready.
Jul 22 15:46:52 kernel: [drm] GART: num cpu pages 262144, num gpu pages 262144
Jul 22 15:46:52 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400600000).
Jul 22 15:46:52 kernel: amdgpu: hwmgr_sw_init smu backed is smu8_smu
Jul 22 15:46:52 kernel: [drm] Found UVD firmware Version: 1.91 Family ID: 11
Jul 22 15:46:52 kernel: [drm] UVD ENC is disabled
Jul 22 15:46:52 kernel: [drm] Found VCE firmware Version: 52.4 Binary ID: 3
Jul 22 15:46:52 kernel: amdgpu 0000:00:01.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110)
Jul 22 15:46:52 kernel: [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -110
Jul 22 15:46:52 kernel: amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed
Jul 22 15:46:52 kernel: amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init
Jul 22 15:46:52 kernel: amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device.
Jul 22 15:46:52 kernel: ------------[ cut here ]------------
Jul 22 15:46:52 kernel: WARNING: CPU: 1 PID: 409 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:631 amdgpu_irq_put+0x46/0x70 [amdgpu]
Jul 22 15:46:52 kernel: Modules linked in: amdgpu(+) hid_logitech_hidpp crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel amdxcp sha512_ssse3 i2c_algo_bit drm_ttm_helper sha256_ssse3 ttm sha1_ssse3 wdat_wdt drm_exec sp5100_tco gpu_sched drm_suballoc_helper drm_buddy drm_display_helper video cec wmi hid_logitech_dj serio_raw hid_multitouch scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse i2c_dev
Jul 22 15:46:52 kernel: CPU: 1 PID: 409 Comm: (udev-worker) Not tainted 6.11.0-0.rc0.20240719git720261cfc732.7.fc41.x86_64 #1
Jul 22 15:46:52 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 12/03/2019
Jul 22 15:46:52 kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
Jul 22 15:46:52 kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 0f 27 9c f0 e9 1a fd ff ff <0f> 0b b8 ea ff ff ff e9 fe 26 9c f0 b8 ea ff ff ff e9 f4 26 9c f0
Jul 22 15:46:52 kernel: RSP: 0018:ffffbc8e805c39d0 EFLAGS: 00010246
Jul 22 15:46:52 kernel: RAX: ffff9f76c9a4a3c0 RBX: ffff9f76cd898890 RCX: 0000000000000000
Jul 22 15:46:52 kernel: RDX: 0000000000000000 RSI: ffff9f76cd8a54d0 RDI: ffff9f76cd880000
Jul 22 15:46:52 kernel: RBP: 0000000000000000 R08: 0000000000000002 R09: 0720072007200720
Jul 22 15:46:52 kernel: R10: 072007200720072e R11: 0765076307690776 R12: ffff9f76cd880000
Jul 22 15:46:52 kernel: R13: ffff9f76cd880010 R14: ffff9f76cd8a54d0 R15: ffff9f76cd880010
Jul 22 15:46:52 kernel: FS: 00007f29a5b35980(0000) GS:ffff9f77b7480000(0000) knlGS:0000000000000000
Jul 22 15:46:52 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 22 15:46:52 kernel: CR2: 00007f29a5c63b00 CR3: 0000000101998000 CR4: 00000000001506f0
Jul 22 15:46:52 kernel: Call Trace:
Jul 22 15:46:52 kernel: <TASK>
Jul 22 15:46:52 kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu]
Jul 22 15:46:52 kernel: ? __warn.cold+0x8e/0xe8
Jul 22 15:46:52 kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu]
Jul 22 15:46:52 kernel: ? report_bug+0xff/0x140
Jul 22 15:46:52 kernel: ? handle_bug+0x3c/0x80
Jul 22 15:46:52 kernel: ? exc_invalid_op+0x17/0x70
Jul 22 15:46:52 kernel: ? asm_exc_invalid_op+0x1a/0x20
Jul 22 15:46:52 kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu]
Jul 22 15:46:52 kernel: amdgpu_fence_driver_hw_fini+0x116/0x160 [amdgpu]
Jul 22 15:46:52 kernel: amdgpu_device_fini_hw+0x9b/0x460 [amdgpu]
Jul 22 15:46:52 kernel: amdgpu_driver_load_kms.cold+0x18/0x2e [amdgpu]
Jul 22 15:46:52 kernel: amdgpu_pci_probe+0x1ae/0x4b0 [amdgpu]
Jul 22 15:46:52 kernel: local_pci_probe+0x45/0x90
Jul 22 15:46:52 kernel: pci_device_probe+0xc1/0x2a0
Jul 22 15:46:52 kernel: really_probe+0xde/0x340
Jul 22 15:46:52 kernel: ? pm_runtime_barrier+0x54/0x90
Jul 22 15:46:52 kernel: ? __pfx___driver_attach+0x10/0x10
Jul 22 15:46:52 kernel: __driver_probe_device+0x78/0x110
Jul 22 15:46:52 kernel: driver_probe_device+0x1f/0xa0
Jul 22 15:46:52 kernel: __driver_attach+0xba/0x1c0
Jul 22 15:46:52 kernel: bus_for_each_dev+0x8f/0xe0
Jul 22 15:46:52 kernel: bus_add_driver+0x142/0x220
Jul 22 15:46:52 kernel: driver_register+0x72/0xd0
Jul 22 15:46:52 kernel: ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
Jul 22 15:46:52 kernel: do_one_initcall+0x5b/0x310
Jul 22 15:46:52 kernel: do_init_module+0x90/0x260
Jul 22 15:46:52 kernel: __do_sys_init_module+0x17a/0x1b0
Jul 22 15:46:52 kernel: do_syscall_64+0x82/0x160
Jul 22 15:46:52 kernel: ? __do_sys_brk+0x3bc/0x410
Jul 22 15:46:52 kernel: ? syscall_exit_to_user_mode+0x10/0x220
Jul 22 15:46:52 kernel: ? do_syscall_64+0x8e/0x160
Jul 22 15:46:52 kernel: ? count_memcg_events.constprop.0+0x1a/0x30
Jul 22 15:46:52 kernel: ? handle_mm_fault+0x1f0/0x300
Jul 22 15:46:52 kernel: ? do_user_addr_fault+0x55a/0x7b0
Jul 22 15:46:52 kernel: ? exc_page_fault+0x7e/0x180
Jul 22 15:46:52 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 22 15:46:52 kernel: RIP: 0033:0x7f29a59ce60e
Jul 22 15:46:52 kernel: Code: 48 8b 0d 0d a8 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d da a7 0c 00 f7 d8 64 89 01 48
Jul 22 15:46:52 kernel: RSP: 002b:00007fff09dc4508 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
Jul 22 15:46:52 kernel: RAX: ffffffffffffffda RBX: 000055a442eb2d00 RCX: 00007f29a59ce60e
Jul 22 15:46:52 kernel: RDX: 00007f29a5af107d RSI: 0000000002951b0e RDI: 00007f299fc00010
Jul 22 15:46:52 kernel: RBP: 00007fff09dc45c0 R08: 000055a442e67010 R09: 0000000000000007
Jul 22 15:46:52 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 00007f29a5af107d
Jul 22 15:46:52 kernel: R13: 0000000000020000 R14: 000055a442eaaac0 R15: 000055a442eec540
Jul 22 15:46:52 kernel: </TASK>
Jul 22 15:46:52 kernel: ---[ end trace 0000000000000000 ]---
The problem happened 3/3 boots with 6.11.0-0.rc0.20240719git720261cfc732.7.fc41. The problem didn't happen when I booted the image on bare metal in Basic graphics mode with nomodeset on the kernel command line which used the simpledrm kernel driver and llvmpipe mesa driver or in QEMU/KVM VMs using the virtio-gpu kernel driver and llvmpipe or virgl mesa drivers.
6.10.0 and earlier kernels weren't affected. I think the problem might've started in the 6.11 merge window before 720261cfc732. I'll try to bisect.
Hardware description:
- CPU: AMD A10-9620P
- GPU: integrated Radeon R5 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] (rev ca)
- System Memory: 8 GB
- Display(s): integrated Elan touchscreen
- Type of Display Connection: eDP
System information:
- Distro name and Version: Fedora Rawhide
- Kernel version: 6.11.0-0.rc0.20240719git720261cfc732.7.fc41
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
- Download Fedora Rawhide KDE live image Fedora-KDE-Live-x86_64-Rawhide-20240721.n.0.iso from https://koji.fedoraproject.org/koji/buildinfo?buildID=2513237
- install Fedora Media writer in Fedora with sudo dnf install mediawriter
- Start Fedora Media Writer
- write Fedora-KDE-Live-x86_64-Rawhide-20240721.n.0.iso with Fedora Media Writer to a USB flash drive
- Reboot into Fedora-KDE-Live-x86_64-Rawhide-20240721.n.0.iso on a system with an AMD GPU affected by this problem
Attached files:
Log files (for system lockups / game freezes / crashes)
I'm attaching a kernel log for a boot when the screen froze with 6.11.0-0.rc0.20240719git720261cfc732.7.fc41 in a Fedora 40 installation. journal-amdgpu-screen-froze-6.11.0-0.rc0.20240719git720261cfc732.7.fc41-1.txt