WARNING: CPU: 0 PID: 669 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7391 amdgpu_dm_a tomic_commit_tail+0x23af/0x2440 [amdgpu]
Brief summary of the problem:
Upgrading from Linux v5.10.145 to v5.10.146 causes WARNINGS to appear on the log and time outs from the graphics driver. The warnings are two with the same message except different source file line number and offsets. I've bisected this regression to a commit. See the following more detailed description.
As a baseline this is the correct initialization I'm seeing with the good v5.10.145:
[ 3.163444] [drm] amdgpu kernel modesetting enabled.
[ 3.163531] amdgpu: Topology: Add APU node [0x0:0x0]
[ 3.163583] checking generic (e0000000 300000) vs hw (e0000000 10000000)
[ 3.163584] fb0: switching to amdgpudrmfb from EFI VGA
[ 3.163659] Console: switching to colour dummy device 80x25
[ 3.163692] amdgpu 0000:07:00.0: vgaarb: deactivate vga console
[ 3.163835] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x1043:0x876B 0xD8).
[ 3.163838] amdgpu 0000:07:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 3.163855] [drm] register mmio base: 0xFCA00000
[ 3.163856] [drm] register mmio size: 524288
[ 3.163865] [drm] add ip block number 0 <soc15_common>
[ 3.163866] [drm] add ip block number 1 <gmc_v9_0>
[ 3.163866] [drm] add ip block number 2 <vega10_ih>
[ 3.163867] [drm] add ip block number 3 <psp>
[ 3.163867] [drm] add ip block number 4 <gfx_v9_0>
[ 3.163868] [drm] add ip block number 5 <sdma_v4_0>
[ 3.163869] [drm] add ip block number 6 <powerplay>
[ 3.163870] [drm] add ip block number 7 <dm>
[ 3.163870] [drm] add ip block number 8 <vcn_v1_0>
[ 3.188472] [drm] BIOS signature incorrect 0 0
[ 3.188495] amdgpu 0000:07:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 3.188497] amdgpu: ATOM BIOS: 113-PICASSO-118
[ 3.189312] [drm] VCN decode is enabled in VM mode
[ 3.189313] [drm] VCN encode is enabled in VM mode
[ 3.189313] [drm] JPEG decode is enabled in VM mode
[ 3.189332] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 3.189338] amdgpu 0000:07:00.0: amdgpu: VRAM: 64M 0x000000F400000000 - 0x000000F403FFFFFF (64M used)
[ 3.189339] amdgpu 0000:07:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[ 3.189340] amdgpu 0000:07:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
[ 3.189345] [drm] Detected VRAM RAM=64M, BAR=64M
[ 3.189345] [drm] RAM width 64bits DDR4
[ 3.189402] [TTM] Zone kernel: Available graphics memory: 4027350 KiB
[ 3.189403] [TTM] Zone dma32: Available graphics memory: 2097152 KiB
[ 3.189403] [TTM] Initializing pool allocator
[ 3.189406] [TTM] Initializing DMA pool allocator
[ 3.189426] [drm] amdgpu: 64M of VRAM memory ready
[ 3.189428] [drm] amdgpu: 3072M of GTT memory ready.
[ 3.189429] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 3.189568] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[ 3.200970] amdgpu: hwmgr_sw_init smu backed is smu10_smu
[ 3.203532] [drm] Found VCN firmware Version ENC: 1.12 DEC: 2 VEP: 0 Revision: 1
[ 3.203536] amdgpu 0000:07:00.0: amdgpu: Will use PSP to load VCN firmware
[ 3.224374] [drm] reserve 0x400000 from 0xf403c00000 for PSP TMR
[ 3.456242] amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 3.484220] amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 3.486294] [drm] kiq ring mec 2 pipe 1 q 0
[ 3.486846] [drm] DM_PPLIB: values for F clock
[ 3.486848] [drm] DM_PPLIB:»-- 400000 in kHz, 3099 in mV
[ 3.486850] [drm] DM_PPLIB:»-- 933000 in kHz, 3574 in mV
[ 3.486851] [drm] DM_PPLIB:»-- 1067000 in kHz, 4250 in mV
[ 3.486852] [drm] DM_PPLIB:»-- 1200000 in kHz, 4399 in mV
[ 3.486854] [drm] DM_PPLIB: values for DCF clock
[ 3.486856] [drm] DM_PPLIB:»-- 300000 in kHz, 3099 in mV
[ 3.486857] [drm] DM_PPLIB:»-- 600000 in kHz, 3574 in mV
[ 3.486858] [drm] DM_PPLIB:»-- 626000 in kHz, 4250 in mV
[ 3.486859] [drm] DM_PPLIB:»-- 654000 in kHz, 4399 in mV
[ 3.487121] [drm] Display Core initialized with v3.2.104!
[ 3.605611] [drm] VCN decode and encode initialized successfully(under SPG Mode).
[ 3.607663] kfd kfd: Allocated 3969056 bytes on gart
[ 3.608889] amdgpu: Topology: Add APU node [0x15d8:0x1002]
[ 3.608894] kfd kfd: added device 1002:15d8
[ 3.608898] amdgpu 0000:07:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 11, active_cu_number 11
[ 3.609808] [drm] fb mappable at 0xCCBCA000
[ 3.609810] [drm] vram apper at 0xCC000000
[ 3.609811] [drm] size 4325376
[ 3.609812] [drm] fb depth is 24
[ 3.609813] [drm] pitch is 5632
[ 3.609944] fbcon: amdgpudrmfb (fb0) is primary device
[ 3.661704] Console: switching to colour frame buffer device 170x48
[ 3.682053] amdgpu 0000:07:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 3.708324] amdgpu 0000:07:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[ 3.708327] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 3.708329] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 3.708331] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 3.708332] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 3.708334] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 3.708336] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 3.708337] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 3.708339] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 3.708341] amdgpu 0000:07:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 3.708343] amdgpu 0000:07:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
[ 3.708345] amdgpu 0000:07:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
[ 3.708347] amdgpu 0000:07:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
[ 3.708349] amdgpu 0000:07:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
[ 3.708350] amdgpu 0000:07:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
[ 3.727192] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:07:00.0 on minor 0
On v5.10.146 the WARNINGs show up on boot twice or thrice (one pair per CPU) and then the pair repeat every 30sec (approx) on the log. The machine is being used as a server on console mode, this makes the system slow to boot, and fills the logs, I'm not sure whether this would affect actually using the GPU from a GUI or similar though. This is the dmesg output there (only relevant drm parts):
[ 3.309007] fb0: switching to amdgpudrmfb from EFI VGA
[ 3.309110] Console: switching to colour dummy device 80x25
[ 3.309153] amdgpu 0000:07:00.0: vgaarb: deactivate vga console
[ 3.309301] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x1043:0x876B 0xD8).
[ 3.309304] amdgpu 0000:07:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 3.309325] [drm] register mmio base: 0xFCA00000
[ 3.309325] [drm] register mmio size: 524288
[ 3.309337] [drm] add ip block number 0 <soc15_common>
[ 3.309339] [drm] add ip block number 1 <gmc_v9_0>
[ 3.309340] [drm] add ip block number 2 <vega10_ih>
[ 3.309341] [drm] add ip block number 3 <psp>
[ 3.309342] [drm] add ip block number 4 <gfx_v9_0>
[ 3.309343] [drm] add ip block number 5 <sdma_v4_0>
[ 3.309344] [drm] add ip block number 6 <powerplay>
[ 3.309346] [drm] add ip block number 7 <dm>
[ 3.309347] [drm] add ip block number 8 <vcn_v1_0>
[ 3.339375] [drm] BIOS signature incorrect 0 0
[ 3.339433] amdgpu 0000:07:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 3.339437] amdgpu: ATOM BIOS: 113-PICASSO-118
[ 3.340609] [drm] VCN decode is enabled in VM mode
[ 3.340611] [drm] VCN encode is enabled in VM mode
[ 3.340612] [drm] JPEG decode is enabled in VM mode
[ 3.340669] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 3.340679] amdgpu 0000:07:00.0: amdgpu: VRAM: 64M 0x000000F400000000 - 0x000000F403FFFFFF (64M used)
[ 3.340682] amdgpu 0000:07:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[ 3.340685] amdgpu 0000:07:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
[ 3.340694] [drm] Detected VRAM RAM=64M, BAR=64M
[ 3.340696] [drm] RAM width 64bits DDR4
[ 3.340783] [TTM] Zone kernel: Available graphics memory: 4027350 KiB
[ 3.340785] [TTM] Zone dma32: Available graphics memory: 2097152 KiB
[ 3.340786] [TTM] Initializing pool allocator
[ 3.340793] [TTM] Initializing DMA pool allocator
[ 3.340839] [drm] amdgpu: 64M of VRAM memory ready
[ 3.340843] [drm] amdgpu: 3072M of GTT memory ready.
[ 3.340847] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 3.341006] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[ 3.352495] amdgpu: hwmgr_sw_init smu backed is smu10_smu
[ 3.355110] [drm] Found VCN firmware Version ENC: 1.12 DEC: 2 VEP: 0 Revision: 1
[ 3.355115] amdgpu 0000:07:00.0: amdgpu: Will use PSP to load VCN firmware
[ 3.376091] [drm] reserve 0x400000 from 0xf403c00000 for PSP TMR
[ 3.612099] amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 3.636356] amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 3.637817] [drm] kiq ring mec 2 pipe 1 q 0
[ 3.639083] [drm] DM_PPLIB: values for F clock
[ 3.639085] [drm] DM_PPLIB: 400000 in kHz, 3099 in mV
[ 3.639086] [drm] DM_PPLIB: 933000 in kHz, 3574 in mV
[ 3.639087] [drm] DM_PPLIB: 1067000 in kHz, 4250 in mV
[ 3.639088] [drm] DM_PPLIB: 1200000 in kHz, 4399 in mV
[ 3.639090] [drm] DM_PPLIB: values for DCF clock
[ 3.639091] [drm] DM_PPLIB: 300000 in kHz, 3099 in mV
[ 3.639092] [drm] DM_PPLIB: 600000 in kHz, 3574 in mV
[ 3.639093] [drm] DM_PPLIB: 626000 in kHz, 4250 in mV
[ 3.639094] [drm] DM_PPLIB: 654000 in kHz, 4399 in mV
[ 3.639354] [drm] Display Core initialized with v3.2.104!
[ 3.753515] [drm] VCN decode and encode initialized successfully(under SPG Mode).
[ 3.755475] kfd kfd: Allocated 3969056 bytes on gart
[ 3.756641] amdgpu: Topology: Add APU node [0x15d8:0x1002]
[ 3.756645] kfd kfd: added device 1002:15d8
[ 3.756650] amdgpu 0000:07:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 11, active_cu_number 11
[ 4.276130] [drm] Fence fallback timer expired on ring sdma0
[ 4.276415] [drm] fb mappable at 0xCCBCA000
[ 4.276418] [drm] vram apper at 0xCC000000
[ 4.276419] [drm] size 4325376
[ 4.276421] [drm] fb depth is 24
[ 4.276422] [drm] pitch is 5632
[ 4.276574] fbcon: amdgpudrmfb (fb0) is primary device
[ 4.329282] Console: switching to colour frame buffer device 170x48
[ 14.356144] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[ 14.360695] amdgpu 0000:07:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 14.376242] amdgpu 0000:07:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[ 14.376245] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 14.376247] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 14.376249] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 14.376251] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 14.376253] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 14.376254] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 14.376256] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 14.376258] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 14.376260] amdgpu 0000:07:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 14.376262] amdgpu 0000:07:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
[ 14.376264] amdgpu 0000:07:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
[ 14.376266] amdgpu 0000:07:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
[ 14.376268] amdgpu 0000:07:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
[ 14.376270] amdgpu 0000:07:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
[ 14.396726] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:07:00.0 on minor 0
[ 16.916087] [drm] Fence fallback timer expired on ring gfx
[ 17.428092] [drm] Fence fallback timer expired on ring comp_1.0.0
[ 17.940089] [drm] Fence fallback timer expired on ring comp_1.1.0
[ 18.452108] [drm] Fence fallback timer expired on ring comp_1.2.0
[ 18.964084] [drm] Fence fallback timer expired on ring comp_1.3.0
[ 19.476093] [drm] Fence fallback timer expired on ring comp_1.0.1
[ 19.988088] [drm] Fence fallback timer expired on ring comp_1.1.1
[ 20.500097] [drm] Fence fallback timer expired on ring comp_1.2.1
[ 21.012089] [drm] Fence fallback timer expired on ring comp_1.3.1
[ 21.524131] [drm] Fence fallback timer expired on ring sdma0
[ 22.068110] [drm] Fence fallback timer expired on ring vcn_dec
[ 22.580100] [drm] Fence fallback timer expired on ring vcn_enc0
[ 23.092101] [drm] Fence fallback timer expired on ring vcn_enc1
[ 23.604099] [drm] Fence fallback timer expired on ring jpeg_dec
[ 24.596132] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[ 34.836126] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:79:DP-1] flip_done timed out
[ 45.076121] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:52:plane-3] flip_done timed out
[ 45.076172] ------------[ cut here ]------------
[ 45.076365] WARNING: CPU: 5 PID: 670 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7391 amdgpu_dm_atomic_commit_tail+0x23af/0x2440 [amdgpu]
[ 45.076366] Modules linked in: amdgpu(E) edac_mce_amd(E) kvm_amd(E) mfd_core(E) iwlmvm(E) gpu_sched(E) kvm(E) ttm(E) irqbypass(E) crc32_pclmul(E) mac80211(E) drm_kms_helper(E) libarc4(E) cec(E) ghash_clmulni_intel(E) drm(E) iwlwifi(E) aesni_intel(E) wmi_bmof(E) libaes(E) xhci_pci(E) r8169(E) i2c_algo_bit(E) crypto_simd(E) xhci_hcd(E) cfg80211(E) fb_sys_fops(E) snd_pcm(E) realtek(E) syscopyarea(E) cryptd(E) sysfillrect(E) mdio_devres(E) sysimgblt(E) glue_helper(E) rfkill(E) usbcore(E) sp5100_tco(E) snd_timer(E) libphy(E) snd(E) tpm_crb(E) watchdog(E) usb_common(E) ccp(E) sg(E) soundcore(E) i2c_piix4(E) k10temp(E) pcspkr(E) efi_pstore(E) tpm_tis(E) tpm_tis_core(E) wmi(E) tpm(E) rng_core(E) acpi_cpufreq(E) video(E) button(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) ahci(E) libahci(E) crct10dif_pclmul(E) crct10dif_common(E) crc32c_intel(E) libata(E) evdev(E) serio_raw(E) scsi_mod(E) gpio_amdpt(E) gpio_generic(E)
[ 45.076430] CPU: 5 PID: 670 Comm: setfont Tainted: G E 5.10.145-pulsar+ #14
[ 45.076431] Hardware name: System manufacturer System Product Name/PRIME A320M-K, BIOS 6042 04/28/2022
[ 45.076603] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x23af/0x2440 [amdgpu]
[ 45.076606] Code: a8 fd ff ff 01 c7 85 a4 fd ff ff 37 00 00 00 c7 85 ac fd ff ff 20 00 00 00 e8 bd 39 13 00 e9 f4 fa ff ff 0f 0b e9 5f f9 ff ff <0f> 0b e9 af f9 ff ff 0f 0b 0f 0b e9 c6 f9 ff ff 49 8b 06 41 0f b6
[ 45.076608] RSP: 0018:ffffa83c8060f748 EFLAGS: 00010002
[ 45.076610] RAX: 0000000000000002 RBX: 0000000000000984 RCX: ffff95c280a3f918
[ 45.076612] RDX: 0000000000000001 RSI: 0000000000000297 RDI: ffff95c288c80188
[ 45.076613] RBP: ffffa83c8060fa40 R08: 0000000000000005 R09: 0000000000000000
[ 45.076614] R10: ffffa83c8060f6a8 R11: ffffa83c8060f6ac R12: 0000000000000297
[ 45.076615] R13: ffff95c280a3f800 R14: ffff95c283552a00 R15: ffff95c288cb0880
[ 45.076617] FS: 00007f6626b09580(0000) GS:ffff95c3a6940000(0000) knlGS:0000000000000000
[ 45.076619] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.076620] CR2: 00007fff41c41868 CR3: 00000001090b2000 CR4: 00000000003506e0
[ 45.076621] Call Trace:
[ 45.076648] commit_tail+0x94/0x130 [drm_kms_helper]
[ 45.076663] drm_atomic_helper_commit+0x11b/0x140 [drm_kms_helper]
[ 45.076688] drm_client_modeset_commit_atomic+0x1e8/0x230 [drm]
[ 45.076712] drm_client_modeset_commit_locked+0x56/0x160 [drm]
[ 45.076725] drm_fb_helper_pan_display+0xdc/0x210 [drm_kms_helper]
[ 45.076731] fb_pan_display+0x87/0x110
[ 45.076735] bit_update_start+0x1a/0x40
[ 45.076738] fbcon_switch+0x31c/0x490
[ 45.076744] redraw_screen+0xe5/0x250
[ 45.076748] fbcon_do_set_font+0x1d6/0x210
[ 45.076751] con_font_op+0x25e/0x3e0
[ 45.076756] ? tomoyo_init_request_info+0x97/0xc0
[ 45.076760] ? security_capable+0x36/0x60
[ 45.076764] vt_ioctl+0x38e/0x1310
[ 45.076768] tty_ioctl+0x3b2/0x940
[ 45.076774] __x64_sys_ioctl+0x8b/0xc0
[ 45.076778] do_syscall_64+0x33/0x40
[ 45.076783] entry_SYSCALL_64_after_hwframe+0x49/0xae
[ 45.076785] RIP: 0033:0x7f6626a256b7
[ 45.076789] Code: 00 00 00 48 8b 05 d9 c7 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 c7 0d 00 f7 d8 64 89 01 48
[ 45.076790] RSP: 002b:00007fff405ee318 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 45.076792] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f6626a256b7
[ 45.076794] RDX: 00007fff405ee340 RSI: 0000000000004b72 RDI: 0000000000000003
[ 45.076795] RBP: 0000000000000010 R08: 0000000000000008 R09: 0000000000000010
[ 45.076796] R10: fffffffffffffb59 R11: 0000000000000246 R12: 0000000000000010
[ 45.076797] R13: 00005557d0998270 R14: 0000000000000100 R15: 0000000000000003
[ 45.076800] ---[ end trace 48fb08bb347d7ffc ]---
[ 45.076815] ------------[ cut here ]------------
[ 45.076986] WARNING: CPU: 5 PID: 670 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6992 amdgpu_dm_atomic_commit_tail+0x23b8/0x2440 [amdgpu]
[ 45.076986] Modules linked in: amdgpu(E) edac_mce_amd(E) kvm_amd(E) mfd_core(E) iwlmvm(E) gpu_sched(E) kvm(E) ttm(E) irqbypass(E) crc32_pclmul(E) mac80211(E) drm_kms_helper(E) libarc4(E) cec(E) ghash_clmulni_intel(E) drm(E) iwlwifi(E) aesni_intel(E) wmi_bmof(E) libaes(E) xhci_pci(E) r8169(E) i2c_algo_bit(E) crypto_simd(E) xhci_hcd(E) cfg80211(E) fb_sys_fops(E) snd_pcm(E) realtek(E) syscopyarea(E) cryptd(E) sysfillrect(E) mdio_devres(E) sysimgblt(E) glue_helper(E) rfkill(E) usbcore(E) sp5100_tco(E) snd_timer(E) libphy(E) snd(E) tpm_crb(E) watchdog(E) usb_common(E) ccp(E) sg(E) soundcore(E) i2c_piix4(E) k10temp(E) pcspkr(E) efi_pstore(E) tpm_tis(E) tpm_tis_core(E) wmi(E) tpm(E) rng_core(E) acpi_cpufreq(E) video(E) button(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) ahci(E) libahci(E) crct10dif_pclmul(E) crct10dif_common(E) crc32c_intel(E) libata(E) evdev(E) serio_raw(E) scsi_mod(E) gpio_amdpt(E) gpio_generic(E)
[ 45.077029] CPU: 5 PID: 670 Comm: setfont Tainted: G W E 5.10.145-pulsar+ #14
[ 45.077030] Hardware name: System manufacturer System Product Name/PRIME A320M-K, BIOS 6042 04/28/2022
[ 45.077199] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x23b8/0x2440 [amdgpu]
[ 45.077202] Code: ff ff 37 00 00 00 c7 85 ac fd ff ff 20 00 00 00 e8 bd 39 13 00 e9 f4 fa ff ff 0f 0b e9 5f f9 ff ff 0f 0b e9 af f9 ff ff 0f 0b <0f> 0b e9 c6 f9 ff ff 49 8b 06 41 0f b6 8e 2d 01 00 00 48 c7 c6 d0
[ 45.077203] RSP: 0018:ffffa83c8060f748 EFLAGS: 00010086
[ 45.077205] RAX: 0000000000000001 RBX: 0000000000000984 RCX: ffff95c280a3f918
[ 45.077206] RDX: 0000000000000001 RSI: 0000000000000297 RDI: ffff95c288c80188
[ 45.077207] RBP: ffffa83c8060fa40 R08: 0000000000000005 R09: 0000000000000000
[ 45.077208] R10: ffffa83c8060f6a8 R11: ffffa83c8060f6ac R12: 0000000000000297
[ 45.077209] R13: ffff95c280a3f800 R14: ffff95c283552a00 R15: ffff95c288cb0880
[ 45.077211] FS: 00007f6626b09580(0000) GS:ffff95c3a6940000(0000) knlGS:0000000000000000
[ 45.077212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.077213] CR2: 00007fff41c41868 CR3: 00000001090b2000 CR4: 00000000003506e0
[ 45.077214] Call Trace:
[ 45.077237] commit_tail+0x94/0x130 [drm_kms_helper]
[ 45.077251] drm_atomic_helper_commit+0x11b/0x140 [drm_kms_helper]
[ 45.077274] drm_client_modeset_commit_atomic+0x1e8/0x230 [drm]
[ 45.077297] drm_client_modeset_commit_locked+0x56/0x160 [drm]
[ 45.077310] drm_fb_helper_pan_display+0xdc/0x210 [drm_kms_helper]
[ 45.077313] fb_pan_display+0x87/0x110
[ 45.077316] bit_update_start+0x1a/0x40
[ 45.077319] fbcon_switch+0x31c/0x490
[ 45.077323] redraw_screen+0xe5/0x250
[ 45.077327] fbcon_do_set_font+0x1d6/0x210
[ 45.077329] con_font_op+0x25e/0x3e0
[ 45.077332] ? tomoyo_init_request_info+0x97/0xc0
[ 45.077336] ? security_capable+0x36/0x60
[ 45.077339] vt_ioctl+0x38e/0x1310
[ 45.077343] tty_ioctl+0x3b2/0x940
[ 45.077347] __x64_sys_ioctl+0x8b/0xc0
[ 45.077350] do_syscall_64+0x33/0x40
[ 45.077353] entry_SYSCALL_64_after_hwframe+0x49/0xae
[ 45.077354] RIP: 0033:0x7f6626a256b7
[ 45.077357] Code: 00 00 00 48 8b 05 d9 c7 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 c7 0d 00 f7 d8 64 89 01 48
[ 45.077358] RSP: 002b:00007fff405ee318 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 45.077360] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f6626a256b7
[ 45.077361] RDX: 00007fff405ee340 RSI: 0000000000004b72 RDI: 0000000000000003
[ 45.077362] RBP: 0000000000000010 R08: 0000000000000008 R09: 0000000000000010
[ 45.077363] R10: fffffffffffffb59 R11: 0000000000000246 R12: 0000000000000010
[ 45.077364] R13: 00005557d0998270 R14: 0000000000000100 R15: 0000000000000003
[ 45.077367] ---[ end trace 48fb08bb347d7ffd ]---
[ 55.316127] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[ 65.556123] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[ 75.796123] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:79:DP-1] flip_done timed out
[ 86.036123] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:52:plane-3] flip_done timed out
[ 86.036173] ------------[ cut here ]------------
[ 86.036366] WARNING: CPU: 3 PID: 674 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7391 amdgpu_dm_atomic_commit_tail+0x23af/0x2440 [amdgpu]
[ 86.036367] Modules linked in: amdgpu(E) edac_mce_amd(E) kvm_amd(E) mfd_core(E) iwlmvm(E) gpu_sched(E) kvm(E) ttm(E) irqbypass(E) crc32_pclmul(E) mac80211(E) drm_kms_helper(E) libarc4(E) cec(E) ghash_clmulni_intel(E) drm(E) iwlwifi(E) aesni_intel(E) wmi_bmof(E) libaes(E) xhci_pci(E) r8169(E) i2c_algo_bit(E) crypto_simd(E) xhci_hcd(E) cfg80211(E) fb_sys_fops(E) snd_pcm(E) realtek(E) syscopyarea(E) cryptd(E) sysfillrect(E) mdio_devres(E) sysimgblt(E) glue_helper(E) rfkill(E) usbcore(E) sp5100_tco(E) snd_timer(E) libphy(E) snd(E) tpm_crb(E) watchdog(E) usb_common(E) ccp(E) sg(E) soundcore(E) i2c_piix4(E) k10temp(E) pcspkr(E) efi_pstore(E) tpm_tis(E) tpm_tis_core(E) wmi(E) tpm(E) rng_core(E) acpi_cpufreq(E) video(E) button(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) ahci(E) libahci(E) crct10dif_pclmul(E) crct10dif_common(E) crc32c_intel(E) libata(E) evdev(E) serio_raw(E) scsi_mod(E) gpio_amdpt(E) gpio_generic(E)
[ 86.036431] CPU: 3 PID: 674 Comm: setfont Tainted: G W E 5.10.145-pulsar+ #14
[ 86.036433] Hardware name: System manufacturer System Product Name/PRIME A320M-K, BIOS 6042 04/28/2022
[ 86.036605] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x23af/0x2440 [amdgpu]
[ 86.036609] Code: a8 fd ff ff 01 c7 85 a4 fd ff ff 37 00 00 00 c7 85 ac fd ff ff 20 00 00 00 e8 bd 39 13 00 e9 f4 fa ff ff 0f 0b e9 5f f9 ff ff <0f> 0b e9 af f9 ff ff 0f 0b 0f 0b e9 c6 f9 ff ff 49 8b 06 41 0f b6
[ 86.036610] RSP: 0018:ffffa83c8080b748 EFLAGS: 00010002
[ 86.036612] RAX: 0000000000000002 RBX: 0000000000001310 RCX: ffff95c280a3f918
[ 86.036614] RDX: 0000000000000001 RSI: 0000000000000297 RDI: ffff95c288c80188
[ 86.036615] RBP: ffffa83c8080ba40 R08: 0000000000000005 R09: 0000000000000000
[ 86.036616] R10: ffffa83c8080b6a8 R11: ffffa83c8080b6ac R12: 0000000000000297
[ 86.036617] R13: ffff95c280a3f800 R14: ffff95c284902200 R15: ffff95c282216500
[ 86.036619] FS: 00007fa8b8831580(0000) GS:ffff95c3a68c0000(0000) knlGS:0000000000000000
[ 86.036621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 86.036622] CR2: 00005568d3c6f000 CR3: 0000000102248000 CR4: 00000000003506e0
[ 86.036623] Call Trace:
[ 86.036650] commit_tail+0x94/0x130 [drm_kms_helper]
[ 86.036665] drm_atomic_helper_commit+0x11b/0x140 [drm_kms_helper]
[ 86.036690] drm_client_modeset_commit_atomic+0x1e8/0x230 [drm]
[ 86.036715] drm_client_modeset_commit_locked+0x56/0x160 [drm]
[ 86.036728] drm_fb_helper_pan_display+0xdc/0x210 [drm_kms_helper]
[ 86.036734] fb_pan_display+0x87/0x110
[ 86.036738] bit_update_start+0x1a/0x40
[ 86.036741] fbcon_switch+0x31c/0x490
[ 86.036747] redraw_screen+0xe5/0x250
[ 86.036751] fbcon_do_set_font+0x1d6/0x210
[ 86.036754] con_font_op+0x25e/0x3e0
[ 86.036759] ? tomoyo_init_request_info+0x97/0xc0
[ 86.036763] ? security_capable+0x36/0x60
[ 86.036767] vt_ioctl+0x38e/0x1310
[ 86.036772] tty_ioctl+0x3b2/0x940
[ 86.036778] __x64_sys_ioctl+0x8b/0xc0
[ 86.036783] do_syscall_64+0x33/0x40
[ 86.036787] entry_SYSCALL_64_after_hwframe+0x49/0xae
[ 86.036790] RIP: 0033:0x7fa8b874d6b7
[ 86.036793] Code: 00 00 00 48 8b 05 d9 c7 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 c7 0d 00 f7 d8 64 89 01 48
[ 86.036794] RSP: 002b:00007ffe7c701908 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 86.036797] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa8b874d6b7
[ 86.036798] RDX: 00007ffe7c701930 RSI: 0000000000004b72 RDI: 0000000000000003
[ 86.036799] RBP: 0000000000000010 R08: 0000000000000008 R09: 0000000000000010
[ 86.036800] R10: fffffffffffffb59 R11: 0000000000000246 R12: 0000000000000010
[ 86.036801] R13: 00005568d3c6e270 R14: 0000000000000100 R15: 0000000000000003
[ 86.036804] ---[ end trace 48fb08bb347d7ffe ]---
[ 86.036819] ------------[ cut here ]------------
[ 86.036991] WARNING: CPU: 3 PID: 674 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6992 amdgpu_dm_atomic_commit_tail+0x23b8/0x2440 [amdgpu]
[ 86.036991] Modules linked in: amdgpu(E) edac_mce_amd(E) kvm_amd(E) mfd_core(E) iwlmvm(E) gpu_sched(E) kvm(E) ttm(E) irqbypass(E) crc32_pclmul(E) mac80211(E) drm_kms_helper(E) libarc4(E) cec(E) ghash_clmulni_intel(E) drm(E) iwlwifi(E) aesni_intel(E) wmi_bmof(E) libaes(E) xhci_pci(E) r8169(E) i2c_algo_bit(E) crypto_simd(E) xhci_hcd(E) cfg80211(E) fb_sys_fops(E) snd_pcm(E) realtek(E) syscopyarea(E) cryptd(E) sysfillrect(E) mdio_devres(E) sysimgblt(E) glue_helper(E) rfkill(E) usbcore(E) sp5100_tco(E) snd_timer(E) libphy(E) snd(E) tpm_crb(E) watchdog(E) usb_common(E) ccp(E) sg(E) soundcore(E) i2c_piix4(E) k10temp(E) pcspkr(E) efi_pstore(E) tpm_tis(E) tpm_tis_core(E) wmi(E) tpm(E) rng_core(E) acpi_cpufreq(E) video(E) button(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) ahci(E) libahci(E) crct10dif_pclmul(E) crct10dif_common(E) crc32c_intel(E) libata(E) evdev(E) serio_raw(E) scsi_mod(E) gpio_amdpt(E) gpio_generic(E)
[ 86.037034] CPU: 3 PID: 674 Comm: setfont Tainted: G W E 5.10.145-pulsar+ #14
[ 86.037035] Hardware name: System manufacturer System Product Name/PRIME A320M-K, BIOS 6042 04/28/2022
[ 86.037204] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x23b8/0x2440 [amdgpu]
[ 86.037207] Code: ff ff 37 00 00 00 c7 85 ac fd ff ff 20 00 00 00 e8 bd 39 13 00 e9 f4 fa ff ff 0f 0b e9 5f f9 ff ff 0f 0b e9 af f9 ff ff 0f 0b <0f> 0b e9 c6 f9 ff ff 49 8b 06 41 0f b6 8e 2d 01 00 00 48 c7 c6 d0
[ 86.037208] RSP: 0018:ffffa83c8080b748 EFLAGS: 00010082
[ 86.037210] RAX: 0000000000000001 RBX: 0000000000001310 RCX: ffff95c280a3f918
[ 86.037211] RDX: 0000000000000001 RSI: 0000000000000297 RDI: ffff95c288c80188
[ 86.037212] RBP: ffffa83c8080ba40 R08: 0000000000000005 R09: 0000000000000000
[ 86.037213] R10: ffffa83c8080b6a8 R11: ffffa83c8080b6ac R12: 0000000000000297
[ 86.037214] R13: ffff95c280a3f800 R14: ffff95c284902200 R15: ffff95c282216500
[ 86.037216] FS: 00007fa8b8831580(0000) GS:ffff95c3a68c0000(0000) knlGS:0000000000000000
[ 86.037217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 86.037218] CR2: 00005568d3c6f000 CR3: 0000000102248000 CR4: 00000000003506e0
[ 86.037219] Call Trace:
[ 86.037242] commit_tail+0x94/0x130 [drm_kms_helper]
[ 86.037256] drm_atomic_helper_commit+0x11b/0x140 [drm_kms_helper]
[ 86.037279] drm_client_modeset_commit_atomic+0x1e8/0x230 [drm]
[ 86.037303] drm_client_modeset_commit_locked+0x56/0x160 [drm]
[ 86.037315] drm_fb_helper_pan_display+0xdc/0x210 [drm_kms_helper]
[ 86.037319] fb_pan_display+0x87/0x110
[ 86.037321] bit_update_start+0x1a/0x40
[ 86.037324] fbcon_switch+0x31c/0x490
[ 86.037329] redraw_screen+0xe5/0x250
[ 86.037332] fbcon_do_set_font+0x1d6/0x210
[ 86.037335] con_font_op+0x25e/0x3e0
[ 86.037337] ? tomoyo_init_request_info+0x97/0xc0
[ 86.037341] ? security_capable+0x36/0x60
[ 86.037344] vt_ioctl+0x38e/0x1310
[ 86.037348] tty_ioctl+0x3b2/0x940
[ 86.037352] __x64_sys_ioctl+0x8b/0xc0
[ 86.037355] do_syscall_64+0x33/0x40
[ 86.037358] entry_SYSCALL_64_after_hwframe+0x49/0xae
[ 86.037360] RIP: 0033:0x7fa8b874d6b7
[ 86.037362] Code: 00 00 00 48 8b 05 d9 c7 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 c7 0d 00 f7 d8 64 89 01 48
[ 86.037364] RSP: 002b:00007ffe7c701908 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 86.037365] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fa8b874d6b7
[ 86.037367] RDX: 00007ffe7c701930 RSI: 0000000000004b72 RDI: 0000000000000003
[ 86.037368] RBP: 0000000000000010 R08: 0000000000000008 R09: 0000000000000010
[ 86.037369] R10: fffffffffffffb59 R11: 0000000000000246 R12: 0000000000000010
[ 86.037370] R13: 00005568d3c6e270 R14: 0000000000000100 R15: 0000000000000003
[ 86.037373] ---[ end trace 48fb08bb347d7fff ]---
[ 96.276139] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[…] <the two warnings keep repeating every 30 secs approx>
I've bisected the problem to commit 9f55f36f749a7608eeef57d7d72991a9bd557341 "drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega" (in the v5.10.y stable branch), even though this is rather cumbersome this being a remote system. Then tried to revert that commit on top of v5.10.148, but then new initialization errors appears, so I'm assuming other new commits depend on that one:
[ 3.544985] [drm] kiq ring mec 2 pipe 1 q 0
[ 3.711465] amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
[ 3.711604] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <sdma_v4_0> failed -110
[ 3.711607] amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
[ 3.711609] amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
[ 3.711612] amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
While bisecting I also noticed that at least until tags/v5.10.146~70
the two WARNINGs (with different line numbers and offsets) only appear twice or thrice on boot (each per CPU), and never repeat again afterwards, but didn't bisect when these started to repeat in a loop as was hunting for the initial regression, and bisecting was already painful enough. If you'd need that one bisected, I guess I could try to, but if you can infer it, that'd be great. :)
Hardware description:
- CPU: AMD Ryzen 5 PRO 3400G with Radeon Vega Graphics
- GPU: 07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8] (rev d8)
System information:
- Distro name and Version: Debian 11.5 (bullseye)
- Kernel version: Linux pulsar 5.10.145-pulsar+ #13 SMP Sun Oct 16 16:23:14 UTC 2022 x86_64 GNU/Linux
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
Compile Linux kernel, boot with it, check the kernel logs or dmesg output.