AMD GPU screen blanking for seconds with a warning
Hello,
I run Fedora 40 on a ThinkPad T14 Gen3 - comes with AMD Ryzen 7 PRO 6850U with Radeon Graphics. I have my monitor connected via the ThinkPad dock, which is over a USB-C connection.
On F40, I saw these screen blankings maybe once in a day - not enough to be a problem.
Yesterday, I updated to F41 with the 6.11.5-300.fc41 kernel. The screen blanking shot up to maybe 30x per minute, with 1-2s blanking each time, effectively giving me an unusable display.
I booted with the older F40 kernel on the F41 distro -- 6.11.4-200.fc40, and that one's stable, but I saw one blanking event in the last hour. dmesg paste below on this F40 kernel:
[ 80.596764] amdgpu 0000:04:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:142
[ 80.596835] ------------[ cut here ]------------
[ 80.596837] WARNING: CPU: 1 PID: 2996 at drivers/gpu/drm/amd/amdgpu/../display/dc/hubbub/dcn31/dcn31_hubbub.c:151 dcn31_program_compbuf_size+0xd1/0x230 [amdgpu]
[ 80.597235] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack qrtr_mhi nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables bnep binfmt_misc vfat fat r8153_ecm cdc_ether usbnet qrtr snd_soc_acp6x_mach snd_soc_dmic snd_acp6x_pdm_dma snd_sof_amd_acp63 ath11k_pci snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir ath11k snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof qmi_helpers snd_sof_utils snd_pci_ps snd_amd_sdw_acpi soundwire_amd snd_hda_codec_realtek soundwire_generic_allocation snd_hda_codec_generic soundwire_bus snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core snd_hda_intel amd_atl intel_rapl_msr snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi snd_usb_audio snd_compress uvcvideo edac_mce_amd snd_hda_codec ac97_bus snd_pcm_dmaengine uvc snd_rpl_pci_acp6x videobuf2_vmalloc btusb
[ 80.597290] snd_acp_pci snd_usbmidi_lib videobuf2_memops kvm_amd snd_hda_core snd_ump videobuf2_v4l2 snd_acp_legacy_common btrtl mac80211 videobuf2_common snd_rawmidi snd_pci_acp6x snd_hwdep btintel kvm btbcm btmtk libarc4 spd5118 r8152 snd_seq think_lmi bluetooth snd_ctl_led mii rapl videodev pcspkr snd_seq_device firmware_attributes_class cfg80211 thinkpad_acpi mc wmi_bmof i2c_piix4 snd_pcm sparse_keymap k10temp i2c_smbus r8169 snd_pci_acp5x platform_profile snd_rn_pci_acp3x rfkill snd_timer snd_acp_config ipmi_devintf snd_soc_acpi mhi snd snd_pci_acp3x ipmi_msghandler realtek soundcore amd_pmc joydev acpi_tad loop nfnetlink zram dm_crypt hid_logitech_hidpp hid_logitech_dj amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched crct10dif_pclmul crc32_pclmul nvme drm_suballoc_helper crc32c_intel polyval_clmulni drm_buddy polyval_generic drm_display_helper nvme_core ghash_clmulni_intel video ucsi_acpi sha512_ssse3 hid_multitouch sha256_ssse3 typec_ucsi sha1_ssse3 cec sp5100_tco nvme_auth typec i2c_hid_acpi
[ 80.597362] wmi i2c_hid serio_raw ip6_tables ip_tables fuse
[ 80.597369] CPU: 1 UID: 1000 PID: 2996 Comm: KMS thread Not tainted 6.11.4-201.fc40.x86_64 #1
[ 80.597372] Hardware name: LENOVO 21CGS1Q106/21CGS1Q106, BIOS R23ET78W (1.54 ) 08/05/2024
[ 80.597374] RIP: 0010:dcn31_program_compbuf_size+0xd1/0x230 [amdgpu]
[ 80.597624] Code: 00 48 8b 43 28 8b 88 b0 01 00 00 48 8b 43 20 0f b6 50 6c 48 8b 43 18 8b b0 14 01 00 00 e8 b7 28 1a 00 85 c0 0f 85 33 01 00 00 <0f> 0b 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 0f 85 35 01 00 00
[ 80.597626] RSP: 0018:ffffba75c5f47648 EFLAGS: 00010202
[ 80.597629] RAX: 0000000000000001 RBX: ffff9ca59589d400 RCX: 000000000000001f
[ 80.597631] RDX: 0000000000000000 RSI: 000000000000398b RDI: ffff9ca59f580000
[ 80.597632] RBP: 0000000000000004 R08: ffffba75c5f4764c R09: ffffba75c5f475c0
[ 80.597633] R10: 0000000000000000 R11: 0000000000000700 R12: ffff9ca635e40000
[ 80.597635] R13: ffff9ca59589d400 R14: ffff9ca5a0800000 R15: 0000000000000006
[ 80.597636] FS: 00007ff5120006c0(0000) GS:ffff9cac9ea80000(0000) knlGS:0000000000000000
[ 80.597638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 80.597640] CR2: 00007efc7c00b028 CR3: 000000010e4b6000 CR4: 0000000000f50ef0
[ 80.597642] PKRU: 55555554
[ 80.597651] Call Trace:
[ 80.597655] <TASK>
[ 80.597657] ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu]
[ 80.597925] ? __warn.cold+0x8e/0xe8
[ 80.597931] ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu]
[ 80.598150] ? report_bug+0xff/0x140
[ 80.598154] ? handle_bug+0x3c/0x80
[ 80.598156] ? exc_invalid_op+0x17/0x70
[ 80.598158] ? asm_exc_invalid_op+0x1a/0x20
[ 80.598163] ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu]
[ 80.598378] ? dcn31_program_compbuf_size+0xc9/0x230 [amdgpu]
[ 80.598593] dcn20_optimize_bandwidth+0xf2/0x250 [amdgpu]
[ 80.598846] dc_commit_state_no_check+0x1059/0x1a60 [amdgpu]
[ 80.599059] dc_commit_streams+0x178/0x610 [amdgpu]
[ 80.599256] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.599262] amdgpu_dm_atomic_commit_tail+0x67e/0x4430 [amdgpu]
[ 80.599487] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.599489] ? dc_stream_get_scanoutpos+0x8e/0x100 [amdgpu]
[ 80.599706] ? finish_task_switch.isra.0+0x99/0x2e0
[ 80.599710] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.599712] ? dm_crtc_get_scanoutpos+0xc1/0x150 [amdgpu]
[ 80.599967] ? ktime_get+0x41/0xf0
[ 80.599970] ? amdgpu_display_get_crtc_scanoutpos+0xa9/0x240 [amdgpu]
[ 80.600131] ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10 [amdgpu]
[ 80.600303] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.600305] ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu]
[ 80.600471] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.600473] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x165/0x3a0
[ 80.600478] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.600480] ? wait_for_completion_timeout+0x13b/0x170
[ 80.600482] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.600484] ? drm_crtc_get_last_vbltimestamp+0x56/0x90
[ 80.600488] commit_tail+0xaf/0x160
[ 80.600493] drm_atomic_helper_commit+0x11a/0x140
[ 80.600496] drm_atomic_commit+0xa9/0xe0
[ 80.600499] ? __pfx___drm_printfn_info+0x10/0x10
[ 80.600503] drm_mode_atomic_ioctl+0xaaa/0xd00
[ 80.600508] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[ 80.600511] drm_ioctl_kernel+0xb3/0x100
[ 80.600514] drm_ioctl+0x28b/0x540
[ 80.600517] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[ 80.600522] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
[ 80.600709] __x64_sys_ioctl+0x97/0xd0
[ 80.600713] do_syscall_64+0x82/0x160
[ 80.600716] ? __count_memcg_events+0x75/0x130
[ 80.600719] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.600721] ? count_memcg_events.constprop.0+0x1a/0x30
[ 80.600724] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.600725] ? handle_mm_fault+0x21b/0x330
[ 80.600728] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.600730] ? do_user_addr_fault+0x55a/0x7b0
[ 80.600734] ? srso_alias_return_thunk+0x5/0xfbef5
[ 80.600735] ? exc_page_fault+0x7e/0x180
[ 80.600738] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 80.600741] RIP: 0033:0x7ff5376fe0ad
[ 80.600765] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 80.600767] RSP: 002b:00007ff511ffe930 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 80.600769] RAX: ffffffffffffffda RBX: 00007ff4f403b6b0 RCX: 00007ff5376fe0ad
[ 80.600770] RDX: 00007ff511ffe9d0 RSI: 00000000c03864bc RDI: 000000000000000c
[ 80.600772] RBP: 00007ff511ffe980 R08: 00000000000001c0 R09: 0000000000000001
[ 80.600773] R10: 0000000000000016 R11: 0000000000000246 R12: 00007ff511ffe9d0
[ 80.600774] R13: 00000000c03864bc R14: 000000000000000c R15: 00007ff4f403a4c0
[ 80.600778] </TASK>
[ 80.600778] ---[ end trace 0000000000000000 ]---
I will try the F41 kernel again, and collect its dump here as well, but it was the same signature on the warning.
I also haven't tried w/o the external monitor.
Will add info from those two later.