5.16.x kernel occasionally lock up system when turning DPMS on/off
Brief summary of the problem:
On upgrading to 5.16 from 5.15, I have begun to get issues where there is a probability of my entire system locking up, with no ability to even ping the machine, when my monitors go to sleep via DPMS. The monitors wake back up on mouse movement, but no other indication that the machine is on exists.
Hardware description:
- CPU: AMD Ryzen 9 3900X
- GPU:
Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
- System Memory: 32 GB
- Display(s): 2x 1440p144 Hz monitors, running at 120 Hz
- Type of Display Connection: DP 1.4
System information:
- Distro name and Version: Arch Linux
- Kernel version: 5.16.1.arch1-1
- Custom kernel: No
- AMD official driver version: N/A
Other relevant information: I'm using wayland, sway 1.6.1.
How to reproduce the issue:
- Boot into 5.16
- Turn off DPMS on your monitors
- ????
- Try to wake monitors back up
- Your machine is hardlocked
Attached files:
Screenshots/video files
N/A
Log files (for system lockups / game freezes / crashes)
I should mention that these logs are not at the time of the crash, they are minutes before it. The logs end very abruptly in my journal and I would guess that the lock up caused writes to journald to fail, thus, no information available of the actual crash. That behind said amdgpu did give some feedback before the crash:
Log before crash
Jan 17 18:16:00 mami kernel: ------------[ cut here ]------------
Jan 17 18:16:00 mami kernel: amdgpu 0000:0a:00.0: drm_WARN_ON(atomic_read(&vblank->refcount) == 0)
Jan 17 18:16:00 mami kernel: WARNING: CPU: 5 PID: 257 at drivers/gpu/drm/drm_vblank.c:1210 drm_vblank_put+0xee/0x100
Jan 17 18:16:00 mami kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs nct6775 hwmon_vid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c md_mod sunrpc nls_iso8859_1 vfat fat snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr wireguard snd_hda_intel intel_rapl_common curve25519_x86_64 snd_intel_dspcfg libchacha20poly1305 snd_intel_sdw_acpi chacha_x86_64 snd_hda_codec poly1305_x86_64 libblake2s edac_mce_amd snd_hda_core blake2s_x86_64 eeepc_wmi asus_wmi libcurve25519_generic snd_hwdep amdgpu sparse_keymap snd_pcm libchacha kvm_amd libblake2s_generic platform_profile snd_timer ip6_udp_tunnel gpu_sched kvm snd drm_ttm_helper rapl udp_tunnel video pcspkr mxm_wmi wmi_bmof k10temp i2c_piix4 ttm soundcore mousedev joydev cfg80211 tpm_crb tpm_tis rfkill tpm_tis_core mac_hid acpi_cpufreq fuse ip_tables x_tables ext4 crc32c_generic
Jan 17 18:16:00 mami kernel: crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm hid_logitech_hidpp hid_logitech_dj usbhid uas usb_storage bridge crct10dif_pclmul crc32_pclmul stp crc32c_intel llc ghash_clmulni_intel aesni_intel crypto_simd cryptd ccp sp5100_tco rng_core sr_mod igb cdrom xhci_pci dca xhci_pci_renesas wmi pinctrl_amd vfio_pci vfio_pci_core irqbypass vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod xpad ff_memless ipmi_devintf ipmi_msghandler sg bonding tls
Jan 17 18:16:00 mami kernel: CPU: 5 PID: 257 Comm: kworker/u64:5 Not tainted 5.16.1-arch1-1 #1 49bbb8d20d0329f70e47963ef5feb4a66c3cd442
Jan 17 18:16:00 mami kernel: Hardware name: System manufacturer System Product Name/Pro WS X570-ACE, BIOS 1201 11/18/2019
Jan 17 18:16:00 mami kernel: Workqueue: events_unbound commit_work
Jan 17 18:16:00 mami kernel: RIP: 0010:drm_vblank_put+0xee/0x100
Jan 17 18:16:00 mami kernel: Code: 8b 7f 08 4c 8b 67 50 4d 85 e4 74 22 e8 6b 86 01 00 48 c7 c1 e0 1d 51 93 4c 89 e2 48 c7 c7 9a 87 50 93 48 89 c6 e8 32 9c 3f 00 <0f> 0b eb b9 4c 8b 27 eb d9 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
Jan 17 18:16:00 mami kernel: RSP: 0018:ffffbe0cc0d5faa0 EFLAGS: 00010246
Jan 17 18:16:00 mami kernel: RAX: 0000000000000000 RBX: ffff99f5c8dec800 RCX: 0000000000000000
Jan 17 18:16:00 mami kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jan 17 18:16:00 mami kernel: RBP: ffffbe0cc0d5fe58 R08: 0000000000000000 R09: 0000000000000000
Jan 17 18:16:00 mami kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff99f541f3f8d0
Jan 17 18:16:00 mami kernel: R13: ffff99f546050f80 R14: 0000000000000000 R15: ffff99f5c8defe00
Jan 17 18:16:00 mami kernel: FS: 0000000000000000(0000) GS:ffff99fc5eb40000(0000) knlGS:0000000000000000
Jan 17 18:16:00 mami kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 18:16:00 mami kernel: CR2: 00007f578840d000 CR3: 00000001e6ecc000 CR4: 0000000000350ee0
Jan 17 18:16:00 mami kernel: Call Trace:
Jan 17 18:16:00 mami kernel: <TASK>
Jan 17 18:16:00 mami kernel: amdgpu_dm_atomic_commit_tail+0x1793/0x2690 [amdgpu f85b8a8caf867a5d5ba40878af31ffe87241aba2]
Jan 17 18:16:00 mami kernel: ? 0xffffffff92000000
Jan 17 18:16:00 mami kernel: commit_tail+0x94/0x130
Jan 17 18:16:00 mami kernel: process_one_work+0x1e8/0x3c0
Jan 17 18:16:00 mami kernel: worker_thread+0x50/0x3c0
Jan 17 18:16:00 mami kernel: ? rescuer_thread+0x380/0x380
Jan 17 18:16:00 mami kernel: kthread+0x15c/0x180
Jan 17 18:16:00 mami kernel: ? set_kthread_struct+0x50/0x50
Jan 17 18:16:00 mami kernel: ret_from_fork+0x22/0x30
Jan 17 18:16:00 mami kernel: </TASK>
Jan 17 18:16:00 mami kernel: ---[ end trace b92f0f6d1b0ff057 ]---
Jan 17 18:16:00 mami kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '5'!
Jan 17 18:16:00 mami kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
Jan 17 18:16:00 mami kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
Jan 17 18:16:00 mami kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
Jan 17 18:16:00 mami kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
Jan 17 18:16:00 mami kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR* dc_stream_state is NULL for crtc '4'!
These errors seem very similar to errors I've had in the past, such as #1247 (closed). But they have not led to catastrophic lock ups like this, only sometimes monitors not waking up. I'm not sure if they're related.