VCN ring timeout when watching accelerated video
Brief summary of the problem:
Similar symptoms to #3909 Several possible issues while watching HW accelerated video
- freeze and recovery a few seconds later (most common)
- low FPS (~1-3) for a few minutes followed by display freezing. System still running, audio still plays. Can SSH in but no coredump present. Sometimes recoverable by sleeping and waking, sometimes trying to sleep causes hard crash
- black screen hard crash, doesn't respond to input, doesn't continue playing audio and doesn't appear on network
Hardware description:
Lenovo Yoga Pro 7
- CPU: AMD Ryzen AI 9 365 w/ Radeon 880M
- GPU: Advanced Micro Devices, Inc. [AMD/ATI] Strix [Radeon 880M / 890M] (rev c4)
- System Memory: 32GB
- Display(s): Internal OLED
- Type of Display Connection: eDP?
System information:
- Distro name and Version: Arch
- Kernel version: 6.12.10-zen1-1-zen
- Custom kernel: Issue appears same on vanilla & zen
- AMD official driver version: mesa 1:24.3.4-1
How to reproduce the issue:
Play HW accelerated video in Firefox and wait for up to several hours. May also occur with video playback in mpv (both open at the time)
Attached files:
Log files (for system lockups / game freezes / crashes)
From recovered fault
[ 96.956975] ------------[ cut here ]------------
[ 96.956981] WARNING: CPU: 4 PID: 1196 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:1590 dc_dmub_srv_apply_idle_power_optimizations+0x1d9/0x550 [amdgpu]
[ 96.957239] Modules linked in: rfcomm ccm cmac algif_hash algif_skcipher af_alg nft_masq nft_ct nft_reject_ipv4 nf_reject_ipv4 nft_reject act_csum cls_u32 sch_htb nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c bridge stp llc bnep btusb btrtl btintel btbcm btmtk bluetooth qrtr uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev uinput snd_ctl_led mc snd_acp_legacy_mach snd_acp_mach snd_soc_nau8821 snd_acp3x_rn snd_acp70 snd_acp_i2s snd_acp_pdm snd_soc_dmic snd_acp_pcm nls_iso8859_1 snd_sof_amd_acp70 amd_atl vfat snd_sof_amd_acp63 intel_rapl_msr snd_soc_acpi_amd_match fat intel_rapl_common snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof amdgpu snd_sof_utils snd_pci_ps snd_amd_sdw_acpi snd_hda_codec_realtek soundwire_amd soundwire_generic_allocation kvm_amd snd_hda_codec_generic soundwire_bus joydev snd_hda_scodec_component mousedev mt7921e snd_hda_codec_hdmi snd_soc_core
[ 96.957274] mt7921_common kvm mt792x_lib amdxcp snd_hda_intel snd_compress mt76_connac_lib drm_exec ac97_bus snd_intel_dspcfg gpu_sched mt76 snd_pcm_dmaengine snd_intel_sdw_acpi drm_buddy i2c_algo_bit crct10dif_pclmul snd_hda_codec snd_rpl_pci_acp6x drm_suballoc_helper crc32_pclmul drm_ttm_helper snd_acp_pci mac80211 polyval_clmulni hid_sensor_custom polyval_generic snd_hda_core snd_acp_legacy_common ttm ghash_clmulni_intel sha512_ssse3 snd_hwdep snd_pci_acp6x spd5118 hid_multitouch sha256_ssse3 drm_display_helper hid_sensor_hub snd_pcm libarc4 snd_pci_acp5x ideapad_laptop sha1_ssse3 sp5100_tco snd_rn_pci_acp3x hid_generic wmi_bmof cec sparse_keymap amd_pmf snd_timer ucsi_acpi aesni_intel snd_acp_config cfg80211 typec_ucsi amdtee snd_soc_acpi snd gf128mul video i2c_piix4 crypto_simd cryptd rapl pcspkr typec amd_sfh thunderbolt ccp snd_pci_acp3x soundcore rfkill k10temp i2c_smbus platform_profile roles i2c_hid_acpi wmi tee i2c_hid amd_pmc mac_hid sg crypto_user loop dm_mod nfnetlink ip_tables x_tables ext4
[ 96.957316] crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 vivaldi_fmap crc32c_intel i8042 serio nvme nvme_core nvme_auth
[ 96.957325] CPU: 4 UID: 0 PID: 1196 Comm: kworker/u80:2 Not tainted 6.12.10-zen1-1-zen #1 d5d12d63c112106178c29e94e613eef8da6d3bf3
[ 96.957329] Hardware name: LENOVO 83HN/LNVNB161216, BIOS PSCN14WW 06/27/2024
[ 96.957330] Workqueue: dm_vblank_control_workqueue amdgpu_dm_crtc_vblank_control_worker [amdgpu]
[ 96.957575] RIP: 0010:dc_dmub_srv_apply_idle_power_optimizations+0x1d9/0x550 [amdgpu]
[ 96.957758] Code: f8 ff ff 44 0f b6 c3 48 c7 c2 40 75 f2 c1 48 c7 c6 f8 b8 fc c1 48 c7 c7 90 cd a8 c1 e8 40 99 51 d6 84 db 75 bd e9 8a fe ff ff <0f> 0b 41 c7 44 24 68 00 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
[ 96.957760] RSP: 0018:ffffb052c2097db0 EFLAGS: 00010286
[ 96.957761] RAX: 00000000ffffffff RBX: ffffb053ff940000 RCX: 0000000000000000
[ 96.957763] RDX: 0000000000000001 RSI: 000000000000da8c RDI: ffff975ae2580000
[ 96.957763] RBP: 0000000000000012 R08: ffffb052c2097d5c R09: 0000000000000100
[ 96.957764] R10: 0000000000000080 R11: 000000000000001c R12: ffff975ae08f3180
[ 96.957765] R13: ffff975ae4800000 R14: 0000000000000000 R15: ffff975ae08f3180
[ 96.957766] FS: 0000000000000000(0000) GS:ffff976121800000(0000) knlGS:0000000000000000
[ 96.957767] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 96.957768] CR2: 000070be75fce2f8 CR3: 0000000604022000 CR4: 0000000000f50ef0
[ 96.957769] PKRU: 55555554
[ 96.957770] Call Trace:
[ 96.957773] <TASK>
[ 96.957774] ? dc_dmub_srv_apply_idle_power_optimizations+0x1d9/0x550 [amdgpu 49b8098b59c0454b843528b3e609eb8944e58aa7]
[ 96.957934] ? __warn.cold+0x93/0xed
[ 96.957942] ? dc_dmub_srv_apply_idle_power_optimizations+0x1d9/0x550 [amdgpu 49b8098b59c0454b843528b3e609eb8944e58aa7]
[ 96.958117] ? report_bug+0xe7/0x210
[ 96.958121] ? handle_bug+0x58/0x90
[ 96.958124] ? exc_invalid_op+0x19/0xc0
[ 96.958126] ? asm_exc_invalid_op+0x1a/0x20
[ 96.958130] ? dc_dmub_srv_apply_idle_power_optimizations+0x1d9/0x550 [amdgpu 49b8098b59c0454b843528b3e609eb8944e58aa7]
[ 96.958336] dcn35_apply_idle_power_optimizations+0xd1/0xf0 [amdgpu 49b8098b59c0454b843528b3e609eb8944e58aa7]
[ 96.958586] dc_allow_idle_optimizations_internal+0x8a/0xe0 [amdgpu 49b8098b59c0454b843528b3e609eb8944e58aa7]
[ 96.958781] amdgpu_dm_crtc_vblank_control_worker+0x11a/0x260 [amdgpu 49b8098b59c0454b843528b3e609eb8944e58aa7]
[ 96.959002] process_one_work+0x18f/0x350
[ 96.959007] worker_thread+0x24c/0x380
[ 96.959009] ? __pfx_worker_thread+0x10/0x10
[ 96.959011] kthread+0xcf/0x100
[ 96.959014] ? __pfx_kthread+0x10/0x10
[ 96.959016] ret_from_fork+0x31/0x50
[ 96.959020] ? __pfx_kthread+0x10/0x10
[ 96.959021] ret_from_fork_asm+0x1a/0x30
[ 96.959026] </TASK>
[ 96.959028] ---[ end trace 0000000000000000 ]---
[ 3476.200205] amdgpu 0000:62:00.0: amdgpu: Dumping IP State
[ 3476.204659] amdgpu 0000:62:00.0: amdgpu: Dumping IP State Completed
[ 3476.204780] amdgpu 0000:62:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=132428, emitted seq=132429
[ 3476.204788] amdgpu 0000:62:00.0: amdgpu: Process information: process RDD Process pid 4140 thread firefox:cs0 pid 5071
[ 3476.204794] amdgpu 0000:62:00.0: amdgpu: GPU reset begin!
[ 3476.569057] amdgpu 0000:62:00.0: amdgpu: MODE2 reset
[ 3476.593104] amdgpu 0000:62:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 3476.593409] [drm] PCIE GART of 512M enabled (table at 0x00000080FFB00000).
[ 3476.593466] amdgpu 0000:62:00.0: amdgpu: SMU is resuming...
[ 3476.598608] amdgpu 0000:62:00.0: amdgpu: SMU is resumed successfully!
[ 3476.611874] [drm] DMUB hardware initialized: version=0x09000D00
[ 3477.382464] amdgpu 0000:62:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 3477.382474] amdgpu 0000:62:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 3477.382476] amdgpu 0000:62:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 3477.382477] amdgpu 0000:62:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 3477.382478] amdgpu 0000:62:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 3477.382479] amdgpu 0000:62:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 3477.382480] amdgpu 0000:62:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 3477.382482] amdgpu 0000:62:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 3477.382483] amdgpu 0000:62:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 3477.382484] amdgpu 0000:62:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 3477.382486] amdgpu 0000:62:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 3477.382487] amdgpu 0000:62:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 1 on hub 8
[ 3477.382488] amdgpu 0000:62:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 3477.382490] amdgpu 0000:62:00.0: amdgpu: ring vpe uses VM inv eng 4 on hub 8
[ 3477.388410] amdgpu 0000:62:00.0: amdgpu: GPU reset(1) succeeded!
[ 3477.404576] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 3477.425775] firefox:cs0[5071]: segfault at 0 ip 00005f1dd6766740 sp 00007b19835fe9c0 error 6 in firefox[87740,5f1dd6700000+98000] likely on CPU 17 (core 11, socket 0)
[ 3477.425796] Code: 53 50 48 89 fb 4c 8b 35 56 25 03 00 49 8b 36 ff 15 0d 26 03 00 49 8b 36 bf 0a 00 00 00 ff 15 77 26 03 00 48 89 1d 90 5b 03 00 <c7> 04 25 00 00 00 00 23 00 00 00 e8 00 00 00 00 f3 0f 1e fa 50 48
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 35, firmware version: 0x0000001d
PFP feature version: 35, firmware version: 0x00000027
CE feature version: 0, firmware version: 0x00000000
RLC feature version: 1, firmware version: 0x11510440
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
RLCP feature version: 1, firmware version: 0x11510341
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 35, firmware version: 0x00000019
IMU feature version: 0, firmware version: 0x0b331a00
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 553648359, firmware version: 0x210000e7
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x17000042
TA DTM feature version: 0x00000000, firmware version: 0x12000018
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 9, firmware version: 0x095d0400 (93.4.0)
SDMA0 feature version: 60, firmware version: 0x0000000b
VCN feature version: 0, firmware version: 0x0911600c
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x09000d00
TOC feature version: 0, firmware version: 0x0000000b
MES_KIQ feature version: 6, firmware version: 0x0000006d
MES feature version: 1, firmware version: 0x0000006a
VPE feature version: 60, firmware version: 0x00000036
VBIOS version: 113-STRIXEMU-001