[amdgpu]] ERROR ring sdma0 timeout

Same issue here with 6850u on 6.1.0-rc2+

Is this a regression from an older kernel? If so, when was it introduced?
Can you both please share a full log?
What version of linux-firwmare are you using?

I dont know what regression this is. As i said i used the kernel from lukenukem copr repository.
I attached a log from another day where the system freezed. freeze-kernel-amdgpu.log
firmware:

➜  ~ sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 41, firmware version: 0x00000040
PFP feature version: 41, firmware version: 0x0000005f
CE feature version: 41, firmware version: 0x00000024
RLC feature version: 1, firmware version: 0x00000052
RLC SRLC feature version: 1, firmware version: 0x00000001
RLC SRLG feature version: 1, firmware version: 0x00000001
RLC SRLS feature version: 1, firmware version: 0x00000001
MEC feature version: 41, firmware version: 0x00000064
MEC2 feature version: 41, firmware version: 0x00000064
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 553648248, firmware version: 0x21000078
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x1700002b
TA DTM feature version: 0x00000000, firmware version: 0x12000010
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 4, firmware version: 0x04453500 (69.53.0)
SDMA0 feature version: 52, firmware version: 0x00000025
VCN feature version: 0, firmware version: 0x02117005
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x0400002a
TOC feature version: 0, firmware version: 0x00000003
VBIOS version: 113-REMBRANDT-X37

Same issue here. Specs:

nicolas@thinkryzen 
------------------ 
OS: Arch Linux x86_64 
Host: 21CQ000GUS ThinkPad T14s Gen 3 
Kernel: 6.1.0-rc5
DE: GNOME 43.1 
CPU: AMD Ryzen 7 PRO 6850U with Radeon
GPU: AMD ATI Radeon 680M 
Memory: 10755MiB / 30572MiB

Kernel is 6.1.0-RC5 with these patches applied from #2171 (closed) https://patchwork.freedesktop.org/series/110885/

Crashes are random and seem to happen more often when using VAAPI video playing (Youtube on Firefox).

This crash happened while coding on VSCode, without any VAAPI acceleration running. Decoded logs: ring0-rc5-110885.txt

The larger stack trace @ 15:51:49 occurred after I ran sudo shutdown now remotely using SSH. Had to force power off.

I've also encountered this crash on similar hardware, and it does seem to be related to usage of va-api (with mpv for me). The screen freezes for a few seconds, and then blanks. (Audio is still playing when this happens, though.)

System specifications:

OS: Gentoo Linux
System: ThinkPad T14 Gen 3 AMD (21CFCTO1WW), BIOS version 1.30
Kernel: 6.0.9
Desktop: KDE 5.26.3 (using Wayland)
CPU: AMD Ryzen 7 PRO 6850U
GPU: Radeon 680M (iGPU)
RAM: 32 GB LPDDR5

dmesg log:

[kernel] [85371.986974] [drm:amdgpu_job_timedout] *ERROR* ring sdma0 timeout, signaled seq=114923, emitted seq=114925
[kernel] [85371.986995] [drm:amdgpu_job_timedout] *ERROR* Process information: process  pid 0 thread  pid 0
[kernel] [85371.987005] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[kernel] [85372.848510] amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)
[kernel] [85372.848517] [drm:gfx_v10_0_hw_fini] *ERROR* KGQ disable failed
[kernel] [85373.043738] [drm:gfx_v10_0_hw_fini] *ERROR* failed to halt cp gfx
[kernel] [85373.054756] [drm] free PSP TMR buffer
[kernel] [85373.086394] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[kernel] [85373.095384] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[kernel] [85373.095621] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[kernel] [85373.095635] [drm] PSP is resuming...
[kernel] [85373.118583] [drm] reserve 0xa00000 from 0xf437000000 for PSP TMR
[kernel] [85373.432021] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
[kernel] [85379.029783] ------------[ cut here ]------------
[kernel] [85379.029790] refcount_t: underflow; use-after-free.
[kernel] [85379.029812] WARNING: CPU: 5 PID: 1068756 at lib/refcount.c:28 refcount_warn_saturate+0xd2/0x130
[kernel] [85379.029828] Modules linked in: msr vfat fat uas usb_storage isofs cdrom fuse ctr ccm michael_mic bnep bluetooth ecdh_generic ecc lz4 lz4_compress zram zsmalloc iptable_filter ip_tables bpfilter xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment ip6table_filter ip6_tables x_tables squashfs lz4_decompress vhost_net tun vhost vhost_iotlb tap vboxnetadp(O) vboxnetflt(O) vboxdrv(O) vhba(O) i2c_dev qrtr_mhi snd_soc_acp6x_mach snd_soc_dmic snd_acp6x_pdm_dma qrtr snd_ctl_led snd_sof_amd_renoir snd_sof_pci kvm_amd snd_hda_codec_realtek snd_sof_amd_acp joydev snd_hda_codec_generic snd_sof ath11k_pci snd_hda_codec_hdmi snd_sof_utils mhi kvm ath11k snd_soc_core snd_hda_intel uvcvideo snd_intel_dspcfg videobuf2_vmalloc qmi_helpers irqbypass hid_multitouch mac80211 polyval_clmulni snd_rpl_pci_acp6x videobuf2_memops snd_hda_codec polyval_generic videobuf2_v4l2 tpm_crb i2c_hid_acpi gf128mul i2c_hid videobuf2_common snd_acp_pci ghash_clmulni_intel snd_hwdep snd_acp_config mousedev
[kernel] [85379.029998]  cfg80211 snd_soc_acpi r8169 videodev snd_hda_core snd_pci_acp6x k10temp realtek snd_pcm mc mdio_devres ucsi_acpi snd_timer ccp thinkpad_acpi tpm_tis typec_ucsi libphy tpm_tis_core roles ledtrig_audio libarc4 platform_profile tpm typec snd soundcore i2c_designware_platform acpi_tad i2c_designware_core btrfs blake2b_generic libcrc32c raid6_pq
[kernel] [85379.030035] CPU: 5 PID: 1068756 Comm: kworker/5:0 Tainted: G           O    T  6.0.9-DEK-T14G3A #10
[kernel] [85379.030041] Hardware name: LENOVO 21CFCTO1WW/21CFCTO1WW, BIOS R23ET60W (1.30 ) 09/14/2022
[kernel] [85379.030044] Workqueue: events drm_sched_entity_kill_jobs_work
[kernel] [85379.030054] RIP: 0010:refcount_warn_saturate+0xd2/0x130
[kernel] [85379.030059] Code: 0b 31 f6 31 ff c3 cc cc cc cc 80 3d f4 8c 6d 01 00 0f 85 71 ff ff ff 48 c7 c7 00 33 dc ad c6 05 e0 8c 6d 01 01 e8 d0 89 93 00 <0f> 0b 31 f6 31 ff c3 cc cc cc cc 80 3d c7 8c 6d 01 00 0f 85 46 ff
[kernel] [85379.030063] RSP: 0018:ffffa547c194be90 EFLAGS: 00010246
[kernel] [85379.030067] RAX: 0000000000000000 RBX: ffff90bd37ae2028 RCX: 0000000000000000
[kernel] [85379.030070] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[kernel] [85379.030071] RBP: ffff90c39ef6f900 R08: 0000000000000000 R09: 0000000000000000
[kernel] [85379.030073] R10: 0000000000000000 R11: 0000000000000000 R12: ffff90c39ef74800
[kernel] [85379.030075] R13: 0000000000000000 R14: ffff90bcde0f4600 R15: ffff90bd37ae2030
[kernel] [85379.030077] FS:  0000000000000000(0000) GS:ffff90c39ef40000(0000) knlGS:0000000000000000
[kernel] [85379.030080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[kernel] [85379.030082] CR2: 00007f63dc8defd0 CR3: 00000001ae3a6000 CR4: 0000000000750ee0
[kernel] [85379.030085] PKRU: 55555554
[kernel] [85379.030087] Call Trace:
[kernel] [85379.030089]  <TASK>
[kernel] [85379.030091]  process_one_work+0x1c2/0x380
[kernel] [85379.030100]  worker_thread+0x4f/0x3b0
[kernel] [85379.030105]  ? rescuer_thread+0x3b0/0x3b0
[kernel] [85379.030110]  ? rescuer_thread+0x3b0/0x3b0
[kernel] [85379.030114]  kthread+0xdc/0x100
[kernel] [85379.030119]  ? kthread_complete_and_exit+0x20/0x20
[kernel] [85379.030123]  ret_from_fork+0x22/0x30
[kernel] [85379.030129]  </TASK>
[kernel] [85379.030131] ---[ end trace 0000000000000000 ]---
[kernel] [85400.981449] elogind-daemon[3808]: Power key pressed.

I had also again a freeze. Exact same behaviour like @gerbilsoft and @nicolas.frenay. I thought it will be fixed in newer kernel versions, but it seems not

log:

Nov 16 08:40:52 attrobit-001 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Nov 16 08:40:52 attrobit-001 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Nov 16 08:40:52 attrobit-001 kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 16 08:40:51 attrobit-001 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Nov 16 08:40:51 attrobit-001 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=33112, emitted seq=33114

Just had this happen to me for the first time. My timeline:

Removed from Dock with closed lid -> Suspend
Not less than a minute later went to a meeting
Everything working fine for about 20min, without any devices attached
Display went completely black.
- Couldn't switch to another TTY
- Attaching external display did not work
- Only solution was to force shutdown with power button

System:

OS: Manjaro
DE: GNOME Shell 43.1
Model: Lenovo Thinkpad T14 Gen 3 (AMD)
CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics
GPU: Radeon 680M (Integrated in CPU)
Kernel: 6.0.8-1-MANJARO

Journal:

Nov 21 16:18:45 jens-t14 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=332465, emitted seq=332467
Nov 21 16:18:45 jens-t14 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Nov 21 16:18:45 jens-t14 kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
Nov 21 16:18:46 jens-t14 kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 21 16:18:46 jens-t14 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Nov 21 16:18:46 jens-t14 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Nov 21 16:18:46 jens-t14 kernel: [drm] free PSP TMR buffer
Nov 21 16:18:46 jens-t14 kernel: amdgpu 0000:04:00.0: amdgpu: MODE2 reset
Nov 21 16:18:46 jens-t14 kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
Nov 21 16:18:46 jens-t14 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400A00000).
Nov 21 16:18:46 jens-t14 kernel: [drm] VRAM is lost due to GPU reset!
Nov 21 16:18:46 jens-t14 kernel: [drm] PSP is resuming...
Nov 21 16:18:46 jens-t14 kernel: [drm] reserve 0xa00000 from 0xf409000000 for PSP TMR
Nov 21 16:18:47 jens-t14 kernel: amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
Nov 21 16:18:47 jens-t14 thunderbird.desktop[31385]: amdgpu: The CS has been cancelled because the context is lost.
Nov 21 16:18:47 jens-t14 thunderbird.desktop[31385]: Crash Annotation GraphicsCriticalError: |[0][GFX1-]: GFX: RenderThread detected a device reset>
Nov 21 16:18:47 jens-t14 gnome-shell[16540]: amdgpu: The CS has been cancelled because the context is lost.
Nov 21 16:18:47 jens-t14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 21 16:18:47 jens-t14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 21 16:18:47 jens-t14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 21 16:18:47 jens-t14 gnome-shell[16540]: amdgpu: The CS has been cancelled because the context is lost.
Nov 21 16:18:47 jens-t14 gnome-shell[16540]: amdgpu: The CS has been cancelled because the context is lost.
Nov 21 16:18:47 jens-t14 gnome-shell[16540]: amdgpu: The CS has been cancelled because the context is lost.
Nov 21 16:18:47 jens-t14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 21 16:18:47 jens-t14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 21 16:18:47 jens-t14 gnome-shell[16540]: amdgpu: The CS has been cancelled because the context is lost.
Nov 21 16:18:47 jens-t14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 21 16:18:47 jens-t14 gnome-shell[16540]: amdgpu: The CS has been cancelled because the context is lost.
Nov 21 16:18:47 jens-t14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

OS: Fedora 37
Kernel: 6.0.8-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC
HW: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA507RE
CPU: AMD Ryzen™ 7 6800H with Radeon™ Graphics × 16
Graphic: NVIDIA GeForce RTX™ 3050 Ti Laptop GPU / REMBRANDT
Gnome: 43.0
Window manager: Wayland

I use my notebook with dual screen mode with an external monitor on HDMI.

Description: My notebook froze and didn't respond to anything. Runned only a web browser with a news portal and a Gimp. After I rebooted it, I found this note in logs:

Nov 19 20:06:40 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=617402, emitted seq=617404
Nov 19 20:06:40 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Nov 19 20:06:40 fedora kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
Nov 19 20:06:41 fedora kernel: amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 19 20:06:41 fedora kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Nov 19 20:06:41 fedora kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Nov 19 20:06:41 fedora kernel: [drm] free PSP TMR buffer
Nov 19 20:06:41 fedora kernel: amdgpu 0000:05:00.0: amdgpu: MODE2 reset
Nov 19 20:06:41 fedora kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
Nov 19 20:06:41 fedora kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
Nov 19 20:06:41 fedora kernel: [drm] PSP is resuming...
Nov 19 20:06:41 fedora kernel: [drm] reserve 0xa00000 from 0xf409000000 for PSP TMR
Nov 19 20:06:41 fedora kernel: amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
Nov 19 20:07:32 fedora NetworkManager[1355]: <info>  [1668884852.1422] dhcp4 (eno1): state changed new lease, address=192.168.240.241
-- Boot 82f901bc7b0a449cbd2b6f1dff9570aa --

Just had another one, while using Firefox with VAAPI accelerated video running.

crash-sdma0.txt

Looks like it tried to recover, going from frozen image -> no signal -> frozen image but had to force reboot. I'm running 6.1.0-rc5 on a 6850U.

How can we debug this further? Seems like a difficult problem to diagnose but maybe we can add some logs to try to identify what events causes this?

EDIT: I think #1974 might be the same issue.

Just had the crash happen 3 times in a row. All three times, video was playing (twice in mpv, once in Chrome; mpv was definitely using vaapi, not sure about Chrome), and I was running e4defrag on a USB 3.0 HDD. I'm thinking the e4defrag process may have had something to do with it...

same as #2068 ?

Hi folks, I have also experienced this issue. My specs are:

OS: Arch Linux x86_64 
Host: 21CMCTO1WW ThinkPad X13 Gen 3 
Kernel: 6.0.12-arch1-1 
CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics (16) @ 4.768GHz 
GPU: AMD ATI Radeon 680M 
Memory: 32Gb

besides the screen freezes has anyone also experienced screen flickering?
In my case the screen also flickers from time to time, interestingly enough this only happens on the laptopscreen, if I am using an external monitor it's only the laptop screen which flickers.

Hi @jxs .

I have a similar setup (Thinkpad T14s G3).

Using the latest mesa (from git) solved these issues for me. Commit ff928d9567 specifically.

Here's my installed mesa-git version: mesa-git 23.0.0_devel.163984.ff928d9567a.5269a95f00c4d6964d487d9dbd94f62b-1

And I haven't seen any flickering on mine.

Hey @jxs

I'm experience the same screen flickering issue.

OS: Arch Linux x86_64
Host: 21CQCTO1WW ThinkPad T14s Gen 3
Kernel: 6.0.12-arch1-1
CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics (16) @ 4.768GHz 
GPU: AMD ATI Radeon 680M 
Memory: 32Gb

What desktop/window manager are you using? I'm using sway and a college with the same hardware is using qtile and does not experience any flickering.

Thanks! @nicolas.frenay I will try that, I have since downgraded the kernel to the 5.19.9 to see how it behaves, if flickering persists I will try the mesa-git.

@sylv-io exactly, I am using sway 1.7 to be precise

@jxs me too. Looks like this flickering issue is sway related We should probably generate some debug logs and create an sway issue at github

Marcello see Nicolas remark above regarding using mesa-git with ff928d9567 and if it fixes it for you. I have been for some some hours with kernel 5.19.9 and haven't had both the screen nor the flickering issues. Will update this as soon as (and if) it occurs.

@jxs oh right, thanks. I now installed mesa-git: 23.0.0_devel.164523.6b3f085c3cd.5269a95f00c4d6964d487d9dbd94f62b-1 I'll let you know if it solves the flickering on my setup.

My primary display is still flickering after installing mesa-git and doing a reboot :/ However it looks faster than before.

have you tried downgrading the kernel? For me it seems to have fixed the issue (downgrading to 5.19.9) but I haven't tried with external monitor yet.

First i was not sure if i want to downgrade the kernel but this bug is really annoying. I downgraded it now. Let see if it solves it

btw did you find any bugtracker addressing this issue? would be great to know, when this workaround is not required anymore.

downgrading the kernel fixes the flickering, but then thunderbolt does not work anymore, and i need it for my monitor

hey @jxs fyi: I do not have any screen flickering since Kernel 6.1.1. Even with extra/mesa 22.3.1-1

Thanks Marcello, though it did not fix it for me weirdly enough.

hmm you're right. In my case it only happens if there is no monitor connected. Maybe pm could be related.

HI, did you solve this problem? I still have this problem.Some times my screen also flickers.I can provide some logs My system infomation:

❯ neofetch
                   -`                    tim@TimTu-Arch 
                  .o+`                   -------------- 
                 `ooo/                   OS: Arch Linux x86_64 
                `+oooo:                  Host: Code01 Ver2.0 1 
               `+oooooo:                 Kernel: 6.4.3-arch1-1 
               -+oooooo+:                Uptime: 1 hour, 1 min 
             `/:-:++oooo+:               Packages: 700 (pacman) 
            `/++++/+++++++:              Shell: bash 5.1.16 
           `/++++++++++++++:             Resolution: 2560x1600 
          `/+++ooooooooooooo/`           DE: Plasma 5.27.6 
         ./ooosssso++osssssso+`          WM: kwin 
        .oossssso-````/ossssss+`         Theme: NephriteLight [Plasma], Breeze [GTK2/3] 
       -osssssso.      :ssssssso.        Icons: [Plasma], Colloid [GTK2/3] 
      :osssssss/        osssso+++.       Terminal: konsole 
     /ossssssss/        +ssssooo/-       Terminal Font: Hack [SRC] 10 
   `/ossssso+/:-        -:/+osssso+-     CPU: AMD Ryzen 7 6800H with Radeon Graphics (16) @ 3.200GHz 
  `+sso+:-`                 `.-/+oso:    GPU: AMD ATI Radeon 680M 
 `++:.                           `-/+/   Memory: 4803MiB / 30808MiB 
 .`                                 `/
                                                                 
                                                                 


❯ glxinfo | grep "OpenGL version"
OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.1.3

@ovo-Tim Oh right, I completely forgot about this thread.

Setting the kernel parameter amdgpu.dcdebugmask=0x10 solved the problem for me.

^ This solution was suggested in #2352.

Sorry, I didn't see it until now. I will have a try, thanks.

@gerbilsoft your dmesg message contains a refcount_t error similar to the errors in #2281 (closed). Can you try if

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 919bbea2e3ac..4e684c2afc70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1506,7 +1509,8 @@ u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
 uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev,
 					    uint32_t domain)
 {
-	if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) {
+	if ((domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) &&
+	    ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY))) {
 		domain = AMDGPU_GEM_DOMAIN_VRAM;
 		if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD)
 			domain = AMDGPU_GEM_DOMAIN_GTT;

from commit

commit 81d0bcf9900932633d270d5bc4a54ff599c6ebdb
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Dec 7 11:08:53 2022 -0500

    drm/amdgpu: make display pinning more flexible (v2)
    
    Only apply the static threshold for Stoney and Carrizo.
    This hardware has certain requirements that don't allow
    mixing of GTT and VRAM.  Newer asics do not have these
    requirements so we should be able to be more flexible
    with where buffers end up.
    
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2291
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2255
    Acked-by: Luben Tuikov <luben.tuikov@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org

fixes the issue?

I added the display pinning patch, and it improved things, but it's still a bit wonky.

Kernel: 6.0.15 + display pinning patch
Mesa: Mesa 23.0.0-devel (git-58e1d14edf) [2022/12/16]

The sdma lockup still happens occasionally, but after the screen blanks, it reappears, though still locked up. Pressing Ctrl+Alt+F2 does switch to a VT after 10-15 seconds, and the VT is responsive; switching back to the Wayland session results in a frozen black screen, but I can switch back to the working VT. Killing the Wayland compositor and restarting it does appear to work.

[drm:amdgpu_job_timedout] *ERROR* ring gfx_0.0.0 timeout, signaled seq=6600570, emitted seq=6600572
[drm:amdgpu_job_timedout] *ERROR* Process information: process kwin_wayland pid 5088 thread kwin_wayla:cs0 pid 5138
amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)
[drm:gfx_v10_0_hw_fini] *ERROR* KGQ disable failed
[drm:gfx_v10_0_hw_fini] *ERROR* failed to halt cp gfx
[drm] free PSP TMR buffer
amdgpu 0000:04:00.0: amdgpu: MODE2 reset
amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[drm] PSP is resuming...
[drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR
amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
[drm] DMUB hardware initialized: version=0x0400002E
[drm] kiq ring mec 2 pipe 1 q 0
[drm] VCN decode and encode initialized successfully(under DPG Mode).
[drm] JPEG decode initialized successfully.
amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
[drm] Skip scheduling IBs!
amdgpu 0000:04:00.0: amdgpu: GPU reset(2) succeeded!
[drm] Skip scheduling IBs!
[drm] Skip scheduling IBs!
[drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!

EDIT: That's a slightly different timeout (on gfx), but I did get an sdma0 timeout later, with the same symptoms wrt VT switching:

[drm:amdgpu_job_timedout] *ERROR* ring sdma0 timeout, signaled seq=200216, emitted seq=200218
[drm:amdgpu_job_timedout] *ERROR* Process information: process  pid 0 thread  pid 0
amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)
[drm:gfx_v10_0_hw_fini] *ERROR* KGQ disable failed
[drm:gfx_v10_0_hw_fini] *ERROR* failed to halt cp gfx
[drm] free PSP TMR buffer
amdgpu 0000:04:00.0: amdgpu: MODE2 reset
amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[drm] PSP is resuming...
[drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR
amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
[drm] DMUB hardware initialized: version=0x0400002E
[drm] kiq ring mec 2 pipe 1 q 0
[drm] VCN decode and encode initialized successfully(under DPG Mode).
[drm] JPEG decode initialized successfully.
amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
amdgpu 0000:04:00.0: amdgpu: GPU reset(1) succeeded!

I also experience this issue, mostly matching the descriptions above. I experience occasional flickering during normal use. When it crashes, the screen blanks but sound continues playing. I haven't noticed if it is directly related to use of VA-API, although I have taken care to ensure all hardware codecs are enabled and it tends to crash while running a lot of things and switching between apps. I'm able to restore the session and the login screen appears, but upon logging in only an unresponsive terminal appears, showing the text @^@^@^@^@^@^@^@^@^@^@^@^@^. A working console is available on another session though, so dmesg can be dumped. See attachments for logs from two such crashes, I'm happy to try running a patch as well, given instructions.

dmesg.txt dmesg2.txt

System specifications:

OS: Fedora 37
System: ThinkPad Z16 Gen1 (21D4000HUS), BIOS version 1.27
Kernel: Linux fedora 6.0.15-300.fc37.x86_64
Desktop: KDE 43.2 (using Wayland)
CPU: AMD Ryzen 7 PRO 6850H
GPU: Radeon 680M (iGPU, PCI 67:00.0)
GPU: Radeon RX 6500M (dGPU, PCI 03:00.0)
RAM: 32 GB LPDDR5

This also happens to on my Thinkpad T14 Gen3 AMD when connected with an external display (no matter if USB-C and HDMI).

[  928.849617] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=10601, emitted seq=10603
[  928.849915] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[  928.850150] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[  929.348921] amdgpu 0000:04:00.0: amdgpu: free PSP TMR buffer
[  929.380677] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[  929.391127] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[  929.391358] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[  929.396288] [drm] VRAM is lost due to GPU reset!
[  929.396289] [drm] PSP is resuming...
[  929.418718] [drm] reserve 0xa00000 from 0xf439000000 for PSP TMR
[  929.744375] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
[  951.327022] ------------[ cut here ]------------
[  951.327029] refcount_t: underflow; use-after-free.
[  951.327039] WARNING: CPU: 8 PID: 135 at lib/refcount.c:28 refcount_warn_saturate+0xa3/0x150
[  951.327049] Modules linked in: zstd zram tls veth nft_masq nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock rfcomm xt_mark snd_seq_dummy snd_hrtimer vboxnetadp(OE) vboxnetflt(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 vboxdrv(OE) xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc cmac algif_hash algif_skcipher af_alg bnep overlay binfmt_misc nls_iso8859_1 snd_usb_audio snd_usbmidi_lib btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic ecc cdc_acm option cdc_mbim usb_wwan cdc_wdm usbserial cdc_ncm qrtr_mhi uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc joydev qrtr snd_soc_dmic snd_soc_acp6x_mach snd_acp6x_pdm_dma ath11k_pci
[  951.327148]  snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci ath11k snd_ctl_led snd_sof intel_rapl_msr snd_hda_codec_realtek qmi_helpers snd_sof_utils intel_rapl_common snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core mac80211 snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_pci_ps snd_intel_dspcfg snd_rpl_pci_acp6x snd_seq_midi snd_intel_sdw_acpi edac_mce_amd snd_seq_midi_event snd_hda_codec snd_acp_pci cfg80211 snd_rawmidi snd_hda_core snd_pci_acp6x kvm_amd snd_pci_acp5x snd_hwdep snd_rn_pci_acp3x thinkpad_acpi snd_acp_config libarc4 input_leds snd_pcm snd_seq kvm think_lmi snd_soc_acpi hid_multitouch snd_seq_device nvram ccp snd_pci_acp3x ledtrig_audio efi_pstore serio_raw mhi k10temp platform_profile rapl firmware_attributes_class wmi_bmof snd_timer ucsi_acpi snd typec_ucsi typec soundcore amd_pmc mac_hid acpi_tad sch_fq_codel kyber_iosched ipmi_devintf ipmi_msghandler msr parport_pc ppdev lp parport ip_tables x_tables autofs4 zfs(POE)
[  951.327240]  zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) btrfs blake2b_generic zstd_compress dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cdc_ether usbnet r8152 mii hid_logitech_hidpp hid_logitech_dj usbhid amdgpu iommu_v2 gpu_sched drm_buddy i2c_algo_bit drm_ttm_helper ttm drm_display_helper cec crct10dif_pclmul rc_core crc32_pclmul polyval_clmulni polyval_generic drm_kms_helper ghash_clmulni_intel syscopyarea nvme sha512_ssse3 sysfillrect sysimgblt aesni_intel r8169 nvme_core video fb_sys_fops crypto_simd hid_generic cryptd psmouse drm i2c_piix4 xhci_pci xhci_pci_renesas nvme_common realtek i2c_hid_acpi i2c_hid wmi hid
[  951.327300] CPU: 8 PID: 135 Comm: kworker/8:1 Tainted: P        W  OE      6.1.0-060100rc5-generic #202211132230
[  951.327303] Hardware name: LENOVO 21CF004PGE/21CF004PGE, BIOS R23ET62W (1.32 ) 11/11/2022
[  951.327305] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[  951.327316] RIP: 0010:refcount_warn_saturate+0xa3/0x150
[  951.327320] Code: cc cc 0f b6 1d 06 ed ca 01 80 fb 01 0f 87 76 27 79 00 83 e3 01 75 dd 48 c7 c7 40 59 40 ac c6 05 ea ec ca 01 01 e8 10 01 75 00 <0f> 0b eb c6 0f b6 1d dd ec ca 01 80 fb 01 0f 87 36 27 79 00 83 e3
[  951.327322] RSP: 0018:ffffc1300060fe30 EFLAGS: 00010246
[  951.327325] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  951.327326] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  951.327327] RBP: ffffc1300060fe38 R08: 0000000000000000 R09: 0000000000000000
[  951.327328] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9e68c307c400
[  951.327329] R13: 0000000000000000 R14: ffff9e68c307c428 R15: ffff9e6741e02240
[  951.327331] FS:  0000000000000000(0000) GS:ffff9e6e5f000000(0000) knlGS:0000000000000000
[  951.327333] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  951.327334] CR2: 000002040112e000 CR3: 000000023b210000 CR4: 0000000000750ee0
[  951.327336] PKRU: 55555554
[  951.327337] Call Trace:
[  951.327339]  <TASK>
[  951.327342]  amdgpu_job_free_cb+0x7f/0x90 [amdgpu]
[  951.327600]  drm_sched_entity_kill_jobs_work+0x3d/0x50 [gpu_sched]
[  951.327607]  process_one_work+0x225/0x400
[  951.327612]  worker_thread+0x50/0x3e0
[  951.327615]  ? process_one_work+0x400/0x400
[  951.327617]  kthread+0xe9/0x110
[  951.327620]  ? kthread_complete_and_exit+0x20/0x20
[  951.327622]  ret_from_fork+0x22/0x30
[  951.327628]  </TASK>
[  951.327629] ---[ end trace 0000000000000000 ]---

I understand that the actual refcount crash might be fixed via the patch in #2220 (comment 1695917), but that also means that the actual cause of the issue remains unresolved / unclear?

I've tried new amd-staging-drm-next in arch (ThinkPad T14s Gen3 AMD 6580U), which has quite a lot of updates from yesterdays drop for 6.3 version. Crashes usually occur while recording wayland screen with OBS (it uses VAAPI which trigger it for me much faster), withing 15-30 mins.

I can confirm reset still occurs (see below), BUT it recovered. Zoom meeting I had continued, OBS resumed recording, everything worked - there was just a pause for that timeout period. This NEVER happened before, so if this is new normal, much better at least.

Jan 12 16:14:46 arrow kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=154548, emitted seq=154550
Jan 12 16:14:46 arrow kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 12 16:14:46 arrow kernel: amdgpu 0000:33:00.0: amdgpu: GPU reset begin!
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: MODE2 reset
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 12 16:14:47 arrow kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400A00000).
Jan 12 16:14:47 arrow kernel: [drm] PSP is resuming...
Jan 12 16:14:47 arrow kernel: [drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: SMU is resuming...
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: SMU is resumed successfully!
Jan 12 16:14:47 arrow kernel: [drm] DMUB hardware initialized: version=0x0400002E
Jan 12 16:14:47 arrow kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jan 12 16:14:47 arrow kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Jan 12 16:14:47 arrow kernel: [drm] JPEG decode initialized successfully.
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: recover vram bo from shadow start
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: recover vram bo from shadow done
Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: GPU reset(1) succeeded!

With Kernel 6.2.0-rc4, more information/additional messages are logged:

[141106.268781] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=10124525, emitted seq=10124527
[141106.269274] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process alacritty.real pid 133801 thread alacritty.:cs0 pid 133811
[141106.269711] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[141106.813196] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[141106.821875] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[141106.822200] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[141106.822260] [drm] PSP is resuming...
[141106.844509] [drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR
[141107.167966] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
[141107.180296] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
[141107.180302] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[141107.180310] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
[141107.180718] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
[141107.182582] [drm] DMUB hardware initialized: version=0x0400002A
[141107.187713] [drm] Watermarks table not configured properly by SMU
[141108.656494] [drm] kiq ring mec 2 pipe 1 q 0
[141108.661600] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[141108.662347] [drm] JPEG decode initialized successfully.
[141108.662352] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[141108.662355] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[141108.662356] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[141108.662357] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[141108.662358] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[141108.662359] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[141108.662359] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[141108.662360] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[141108.662361] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[141108.662361] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[141108.662362] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[141108.662363] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[141108.662364] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[141108.662364] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[141108.662365] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[141108.670802] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
[141108.670805] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
[141108.670830] amdgpu 0000:04:00.0: amdgpu: GPU reset(2) succeeded!
[141108.672246] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141111.302333] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141114.343197] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141117.384241] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141118.812913] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=10124529, emitted seq=10124532
[141118.813402] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 12369 thread Xwayland:cs0 pid 13190
[141118.813851] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[141118.955379] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[141118.964203] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[141118.964616] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[141118.964637] [drm] PSP is resuming...
[141118.986961] [drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR
[141119.309301] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
[141119.321711] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
[141119.321717] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[141119.321726] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
[141119.322848] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
[141119.324751] [drm] DMUB hardware initialized: version=0x0400002A
[141119.328586] [drm] Watermarks table not configured properly by SMU
[141120.417277] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141120.835462] [drm] kiq ring mec 2 pipe 1 q 0
[141120.840049] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[141120.840637] [drm] JPEG decode initialized successfully.
[141120.840642] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[141120.840645] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[141120.840647] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[141120.840647] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[141120.840649] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[141120.840649] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[141120.840650] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[141120.840651] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[141120.840651] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[141120.840652] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[141120.840653] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[141120.840653] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[141120.840654] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[141120.840655] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[141120.840656] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[141120.848771] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
[141120.848774] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
[141120.848881] amdgpu 0000:04:00.0: amdgpu: GPU reset(4) succeeded!
[141120.931330] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141123.458253] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141126.498789] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141129.530542] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141131.100766] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=10124534, emitted seq=10124537
[141131.101257] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 10273 thread gnome-shel:cs0 pid 10390
[141131.101695] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[141131.252402] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[141131.261135] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[141131.261536] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[141131.261556] [drm] PSP is resuming...
[141131.283958] [drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR
[141131.612648] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
[141131.624959] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
[141131.624963] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[141131.624971] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
[141131.625920] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
[141131.627774] [drm] DMUB hardware initialized: version=0x0400002A
[141131.631726] [drm] Watermarks table not configured properly by SMU
[141132.563236] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141133.134856] [drm] kiq ring mec 2 pipe 1 q 0
[141133.138545] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[141133.139151] [drm] JPEG decode initialized successfully.
[141133.139161] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[141133.139164] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[141133.139165] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[141133.139166] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[141133.139167] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[141133.139167] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[141133.139168] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[141133.139169] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[141133.139169] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[141133.139170] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[141133.139171] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[141133.139172] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[141133.139172] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[141133.139173] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[141133.139174] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[141133.144863] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
[141133.144865] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
[141133.144891] amdgpu 0000:04:00.0: amdgpu: GPU reset(6) succeeded!
[141133.144934] [drm] Skip scheduling IBs!
[141133.145799] [drm] Skip scheduling IBs!
[141133.145941] [drm] Skip scheduling IBs!
[141133.181977] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141133.188701] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141133.189213] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141133.225059] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141133.245102] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141133.285385] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[141374.548251] INFO: task kworker/u32:7:2855204 blocked for more than 120 seconds.
[141374.548263]       Tainted: P        W  OE      6.2.0-060200rc4-generic #202301151633
[141374.548267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[141374.548270] task:kworker/u32:7   state:D stack:0     pid:2855204 ppid:2      flags:0x00004000
[141374.548278] Workqueue: events_unbound commit_work [drm_kms_helper]
[141374.548308] Call Trace:
[141374.548311]  <TASK>
[141374.548314]  __schedule+0x293/0x610
[141374.548322]  ? check_preempt_wakeup+0x13e/0x320
[141374.548330]  schedule+0x63/0x110
[141374.548333]  schedule_timeout+0x128/0x160
[141374.548338]  dma_fence_default_wait+0x13d/0x210
[141374.548345]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[141374.548349]  dma_fence_wait_timeout+0x116/0x140
[141374.548354]  drm_atomic_helper_wait_for_fences+0x89/0xf0 [drm_kms_helper]
[141374.548376]  commit_tail+0x3c/0x190 [drm_kms_helper]
[141374.548394]  ? __schedule+0x29b/0x610
[141374.548398]  commit_work+0x12/0x20 [drm_kms_helper]
[141374.548416]  process_one_work+0x225/0x400
[141374.548422]  worker_thread+0x50/0x3e0
[141374.548426]  ? __pfx_worker_thread+0x10/0x10
[141374.548429]  kthread+0xe9/0x110
[141374.548434]  ? __pfx_kthread+0x10/0x10
[141374.548439]  ret_from_fork+0x2c/0x50
[141374.548447]  </TASK>

As you can see from the timestamps, this time it took a while for the problem to appear. The system could not recover (but the display did recover with some green garbage pixels here and there).

Happened here too on a Minis Forum UM690 (Mini PC). I was just coding (IntelliJ IDEA) and played some music (Plexamp). Was not playing a game nor watching a video.

CPU: AMD Ryzen Mobile 6900hx
GPU: Integrated GPU (680M)
System Memory: 64 GB Ram
Display(s): One HDMI connected 32" Screen with WQHD resolution
OS: Fedora Workstation 37 (Wayland)
Kernel: 6.1.6-200.fc37.x86_64
Gnome: 43.2

> rpm -qa | grep -e mesa-va-drivers -e mesa-vdpau-drivers
mesa-va-drivers-freeworld-22.3.3-1.fc37.x86_64

sudo dmesg

...
[13182.778160] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=108759, emitted seq=108761
[13182.778677] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[13182.779184] amdgpu 0000:35:00.0: amdgpu: GPU reset begin!
[13183.221460] amdgpu 0000:35:00.0: amdgpu: free PSP TMR buffer
[13183.253019] amdgpu 0000:35:00.0: amdgpu: MODE2 reset
[13183.262864] amdgpu 0000:35:00.0: amdgpu: GPU reset succeeded, trying to resume
[13183.263014] [drm] PCIE GART of 1024M enabled (table at 0x000000F4FFC00000).
[13183.263058] [drm] PSP is resuming...
[13183.285152] [drm] reserve 0xa00000 from 0xf4fe000000 for PSP TMR
[13183.580343] amdgpu 0000:35:00.0: amdgpu: RAS: optional ras ta ucode is not available
[13183.589649] amdgpu 0000:35:00.0: amdgpu: RAP: optional rap ta ucode is not available
[13183.589652] amdgpu 0000:35:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[13183.589656] amdgpu 0000:35:00.0: amdgpu: SMU is resuming...
[13183.590502] amdgpu 0000:35:00.0: amdgpu: SMU is resumed successfully!
[13183.591868] [drm] DMUB hardware initialized: version=0x0400002E
[13183.669737] [drm] kiq ring mec 2 pipe 1 q 0
[13183.673467] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[13183.674268] [drm] JPEG decode initialized successfully.
[13183.674270] amdgpu 0000:35:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[13183.674272] amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[13183.674272] amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[13183.674273] amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[13183.674273] amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[13183.674274] amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[13183.674274] amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[13183.674275] amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[13183.674275] amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[13183.674276] amdgpu 0000:35:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[13183.674277] amdgpu 0000:35:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[13183.674277] amdgpu 0000:35:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[13183.674278] amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[13183.674278] amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[13183.674279] amdgpu 0000:35:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[13183.680699] amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow start
[13183.680701] amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow done
[13183.680724] amdgpu 0000:35:00.0: amdgpu: GPU reset(1) succeeded!

I've noticed 2 workarounds.

To recover your session, switch to another tty. Assuming your default tty is tty (check with w command), run Ctrl+Alt+F6, then change back with Ctrl+Alt+F7 to go back. I recovered my browser session this way.

To work around the hangs, uninstall VAAPI support (eg. sudo pacman -R libva-mesa-driver). This has helped prevent these problems from occurring for now.

Removing libva-mesa-driver workaround does not work for me. It seems to happen more often when VAAPI is used, but not only then. For example, I do not get crash in normal usage, but if I record using OBS or share screen on wayland using zoom, crash is almost certain within hour or so (both use VAAPI to my understanding, but as said, even removing va lib does not solve it fully).

I'm also running linux-amd-staging-drm-next, which is working little bit better (less frequent crashes, though within 1-2 hours of doing above still happens).

Hmm I don't have any issues like this: journalctl -b -1 -k | grep sdma. I use CLion and firefox heavily

➜  ~ journalctl -b -1 -k | grep sdma
Feb 08 17:55:47 g9 kernel: [drm] add ip block number 7 <sdma_v5_2>
Feb 08 17:55:48 g9 kernel: amdgpu: sdma_bitmap: 3
Feb 08 17:55:48 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 09 06:47:13 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 09 08:16:28 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 09 17:07:29 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 09 18:55:48 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 10 06:49:18 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
➜  ~

Same Problem on 6.2rc6 on Pop OS, Yoga Slim Pro X (6800HS) on dual 2K 144Hz.

Mostly happens while working with IntelliJ.

Feb 10 10:45:22 punk kernel: [ 7435.266748] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=115059, emitted seq=115061
Feb 10 10:45:22 punk kernel: [ 7435.267324] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Feb 10 10:45:22 punk kernel: [ 7435.267836] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
Feb 10 10:45:23 punk kernel: [ 7436.015677] [drm] Send DSC disable to synaptics
Feb 10 10:45:23 punk kernel: [ 7436.320718] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
Feb 10 10:45:23 punk kernel: [ 7436.329754] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
Feb 10 10:45:23 punk kernel: [ 7436.330121] [drm] PCIE GART of 1024M enabled (table at 0x000000F4FFC00000).
Feb 10 10:45:23 punk kernel: [ 7436.330211] [drm] PSP is resuming...
Feb 10 10:45:23 punk kernel: [ 7436.352192] [drm] reserve 0xa00000 from 0xf4fe000000 for PSP TMR
Feb 10 10:45:24 punk kernel: [ 7436.668223] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
Feb 10 10:45:24 punk kernel: [ 7436.679594] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
Feb 10 10:45:24 punk kernel: [ 7436.679599] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Feb 10 10:45:24 punk kernel: [ 7436.679606] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
Feb 10 10:45:24 punk kernel: [ 7436.680036] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
Feb 10 10:45:24 punk kernel: [ 7436.681968] [drm] DMUB hardware initialized: version=0x0400002A
Feb 10 10:45:24 punk kernel: [ 7436.961944] [drm:check_syncd_pipes_for_disabled_master_pipe [amdgpu]] *ERROR* DC: Failure: pipe_idx[2] syncd with disabled master pipe_idx[1]
Feb 10 10:45:24 punk kernel: [ 7437.302940] [drm] Send DSC enable to synaptics
Feb 10 10:45:25 punk kernel: [ 7437.542981] [drm] Send DSC enable to synaptics
Feb 10 10:45:25 punk kernel: [ 7437.616855] [drm] kiq ring mec 2 pipe 1 q 0
Feb 10 10:45:25 punk kernel: [ 7437.622523] [drm] VCN decode and encode initialized successfully(under DPG Mode).
Feb 10 10:45:25 punk kernel: [ 7437.622963] [drm] JPEG decode initialized successfully.
Feb 10 10:45:25 punk kernel: [ 7437.622968] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622971] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622972] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622972] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622973] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622974] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622974] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622975] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622975] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622976] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622977] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 10 10:45:25 punk kernel: [ 7437.622977] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Feb 10 10:45:25 punk kernel: [ 7437.622978] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Feb 10 10:45:25 punk kernel: [ 7437.622979] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Feb 10 10:45:25 punk kernel: [ 7437.622979] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Feb 10 10:45:25 punk kernel: [ 7437.632346] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
Feb 10 10:45:25 punk kernel: [ 7437.632352] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
Feb 10 10:45:25 punk kernel: [ 7437.632372] amdgpu 0000:04:00.0: amdgpu: GPU reset(1) succeeded!
Feb 10 10:45:25 punk kernel: [ 7437.632550] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:25 punk kernel: [ 7437.632560] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103900000 from client 0x1b (UTCL2)
Feb 10 10:45:25 punk kernel: [ 7437.632565] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00241051
Feb 10 10:45:25 punk kernel: [ 7437.632568] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
Feb 10 10:45:25 punk kernel: [ 7437.632571] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x1
Feb 10 10:45:25 punk kernel: [ 7437.632573] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:25 punk kernel: [ 7437.632574] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x5
Feb 10 10:45:25 punk kernel: [ 7437.632576] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:25 punk kernel: [ 7437.632578] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x1
Feb 10 10:45:25 punk kernel: [ 7437.632581] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:25 punk kernel: [ 7437.632586] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103900000 from client 0x1b (UTCL2)
Feb 10 10:45:25 punk kernel: [ 7437.632589] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:25 punk kernel: [ 7437.632591] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:25 punk kernel: [ 7437.632593] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:25 punk kernel: [ 7437.632595] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:25 punk kernel: [ 7437.632596] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:25 punk kernel: [ 7437.632598] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:25 punk kernel: [ 7437.632600] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810704] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810724] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103900000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810731] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00241051
Feb 10 10:45:35 punk kernel: [ 7447.810736] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
Feb 10 10:45:35 punk kernel: [ 7447.810740] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x1
Feb 10 10:45:35 punk kernel: [ 7447.810743] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810746] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x5
Feb 10 10:45:35 punk kernel: [ 7447.810749] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810752] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x1
Feb 10 10:45:35 punk kernel: [ 7447.810762] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810769] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103900000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810775] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810779] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810783] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810785] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810788] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810792] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810795] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810812] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810818] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001038e2000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810503] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
Feb 10 10:45:35 punk kernel: [ 7447.810823] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810825] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810827] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810829] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810831] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810834] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810836] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810840] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810846] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001038e0000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810849] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810851] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810852] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810853] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810855] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810856] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810857] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810860] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810863] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001038e0000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810865] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810866] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810867] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810868] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810870] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810871] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810872] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810875] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810877] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001038e2000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810879] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810880] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810882] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810883] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810884] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810885] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810886] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810889] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810891] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001038f4000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810893] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810894] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810896] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810897] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810898] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810900] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810901] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810904] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810906] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001038f4000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810908] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810909] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810911] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810912] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810913] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810914] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810915] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810918] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810921] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001038fd000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810923] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810924] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810926] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810927] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810928] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810929] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810930] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810933] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process Xwayland pid 3864 thread Xwayland:cs0 pid 3882)
Feb 10 10:45:35 punk kernel: [ 7447.810936] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001038fd000 from client 0x1b (UTCL2)
Feb 10 10:45:35 punk kernel: [ 7447.810937] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 10 10:45:35 punk kernel: [ 7447.810939] amdgpu 0000:04:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
Feb 10 10:45:35 punk kernel: [ 7447.810940] amdgpu 0000:04:00.0: amdgpu: 	 MORE_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810941] amdgpu 0000:04:00.0: amdgpu: 	 WALKER_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810943] amdgpu 0000:04:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810944] amdgpu 0000:04:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
Feb 10 10:45:35 punk kernel: [ 7447.810945] amdgpu 0000:04:00.0: amdgpu: 	 RW: 0x0
Feb 10 10:45:45 punk kernel: [ 7458.050417] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
Feb 10 10:45:55 punk kernel: [ 7468.300520] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=746329, emitted seq=746332
Feb 10 10:45:55 punk kernel: [ 7468.301024] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 3864 thread Xwayland:cs0 pid 3882
Feb 10 10:45:55 punk kernel: [ 7468.301467] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
Feb 10 10:45:56 punk kernel: [ 7468.578513] [drm] Send DSC disable to synaptics
Feb 10 10:45:56 punk kernel: [ 7468.883096] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
Feb 10 10:45:56 punk kernel: [ 7468.893106] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
Feb 10 10:45:56 punk kernel: [ 7468.893230] [drm] PCIE GART of 1024M enabled (table at 0x000000F4FFC00000).
Feb 10 10:45:56 punk kernel: [ 7468.893245] [drm] PSP is resuming...
Feb 10 10:45:56 punk kernel: [ 7468.915219] [drm] reserve 0xa00000 from 0xf4fe000000 for PSP TMR
Feb 10 10:45:56 punk kernel: [ 7469.224143] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
Feb 10 10:45:56 punk kernel: [ 7469.233625] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
Feb 10 10:45:56 punk kernel: [ 7469.233631] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Feb 10 10:45:56 punk kernel: [ 7469.233636] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
Feb 10 10:45:56 punk kernel: [ 7469.234030] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
Feb 10 10:45:56 punk kernel: [ 7469.235525] [drm] DMUB hardware initialized: version=0x0400002A
Feb 10 10:45:57 punk kernel: [ 7469.491313] [drm:check_syncd_pipes_for_disabled_master_pipe [amdgpu]] *ERROR* DC: Failure: pipe_idx[2] syncd with disabled master pipe_idx[1]
Feb 10 10:45:57 punk kernel: [ 7469.826695] [drm] Send DSC enable to synaptics
Feb 10 10:45:57 punk kernel: [ 7470.066713] [drm] Send DSC enable to synaptics
Feb 10 10:45:57 punk kernel: [ 7470.130821] [drm] kiq ring mec 2 pipe 1 q 0
Feb 10 10:45:57 punk kernel: [ 7470.135571] [drm] VCN decode and encode initialized successfully(under DPG Mode).
Feb 10 10:45:57 punk kernel: [ 7470.136327] [drm] JPEG decode initialized successfully.
Feb 10 10:45:57 punk kernel: [ 7470.136332] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136337] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136338] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136339] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136343] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136344] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136345] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136347] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136348] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136349] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136351] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 10 10:45:57 punk kernel: [ 7470.136352] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Feb 10 10:45:57 punk kernel: [ 7470.136353] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Feb 10 10:45:57 punk kernel: [ 7470.136355] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Feb 10 10:45:57 punk kernel: [ 7470.136356] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Feb 10 10:45:57 punk kernel: [ 7470.144478] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
Feb 10 10:45:57 punk kernel: [ 7470.144481] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
Feb 10 10:45:57 punk kernel: [ 7470.144520] amdgpu 0000:04:00.0: amdgpu: GPU reset(5) succeeded!
Feb 10 10:45:57 punk kernel: [ 7470.144553] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144567] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144574] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144579] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144584] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144589] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144593] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144596] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144600] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144604] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144625] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144628] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144630] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.144643] amdgpu_cs_ioctl: 8 callbacks suppressed
Feb 10 10:45:57 punk kernel: [ 7470.144646] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.145035] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145041] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145046] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145049] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145051] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145054] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145056] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145058] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145061] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145063] [drm] Skip scheduling IBs!
Feb 10 10:45:57 punk kernel: [ 7470.145130] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.145801] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.146374] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.146641] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.147314] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.147449] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.147652] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.147927] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:45:57 punk kernel: [ 7470.148985] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:03 punk kernel: [ 7475.542606] amdgpu_cs_ioctl: 130 callbacks suppressed
Feb 10 10:46:03 punk kernel: [ 7475.542612] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:03 punk kernel: [ 7475.959064] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:03 punk kernel: [ 7475.959515] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:03 punk kernel: [ 7476.043202] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:04 punk kernel: [ 7476.543891] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:04 punk kernel: [ 7476.961421] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:04 punk kernel: [ 7476.962192] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:04 punk kernel: [ 7477.044522] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:05 punk kernel: [ 7477.544937] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:05 punk kernel: [ 7477.959428] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:08 punk kernel: [ 7480.548558] amdgpu_cs_ioctl: 11 callbacks suppressed
Feb 10 10:46:08 punk kernel: [ 7480.548564] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:08 punk kernel: [ 7480.961684] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:08 punk kernel: [ 7480.962341] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:08 punk kernel: [ 7481.049078] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:09 punk kernel: [ 7481.549496] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:09 punk kernel: [ 7481.959253] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:09 punk kernel: [ 7481.960001] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:09 punk kernel: [ 7482.049765] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:10 punk kernel: [ 7482.550555] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:10 punk kernel: [ 7482.551082] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:13 punk kernel: [ 7485.554933] amdgpu_cs_ioctl: 11 callbacks suppressed
Feb 10 10:46:13 punk kernel: [ 7485.554940] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:13 punk kernel: [ 7485.959227] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:13 punk kernel: [ 7485.960004] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:13 punk kernel: [ 7486.055496] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:14 punk kernel: [ 7486.556012] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:14 punk kernel: [ 7486.957512] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:14 punk kernel: [ 7486.958006] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:14 punk kernel: [ 7487.056587] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:15 punk kernel: [ 7487.557191] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:15 punk kernel: [ 7487.957601] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:18 punk kernel: [ 7490.561253] amdgpu_cs_ioctl: 11 callbacks suppressed
Feb 10 10:46:18 punk kernel: [ 7490.561260] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:18 punk kernel: [ 7490.959342] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:18 punk kernel: [ 7490.960004] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:18 punk kernel: [ 7491.061726] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:19 punk kernel: [ 7491.562538] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:19 punk kernel: [ 7491.948388] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:19 punk kernel: [ 7491.948819] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:19 punk kernel: [ 7492.063255] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:20 punk kernel: [ 7492.563437] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:20 punk kernel: [ 7492.564007] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:23 punk kernel: [ 7495.567293] amdgpu_cs_ioctl: 11 callbacks suppressed
Feb 10 10:46:23 punk kernel: [ 7495.567301] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:23 punk kernel: [ 7495.959987] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:23 punk kernel: [ 7495.960516] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:23 punk kernel: [ 7496.067677] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:24 punk kernel: [ 7496.568402] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:24 punk kernel: [ 7496.959786] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:24 punk kernel: [ 7496.960227] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:24 punk kernel: [ 7497.069140] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:25 punk kernel: [ 7497.569850] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:25 punk kernel: [ 7497.959852] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:28 punk kernel: [ 7500.573695] amdgpu_cs_ioctl: 12 callbacks suppressed
Feb 10 10:46:28 punk kernel: [ 7500.573703] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:28 punk kernel: [ 7500.961177] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:28 punk kernel: [ 7500.962012] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:28 punk kernel: [ 7501.074148] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:29 punk kernel: [ 7501.574675] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:29 punk kernel: [ 7501.941253] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:29 punk kernel: [ 7501.942103] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:29 punk kernel: [ 7502.075196] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:30 punk kernel: [ 7502.575801] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:30 punk kernel: [ 7502.953006] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:33 punk kernel: [ 7505.579008] amdgpu_cs_ioctl: 10 callbacks suppressed
Feb 10 10:46:33 punk kernel: [ 7505.579014] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:33 punk kernel: [ 7505.960040] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:33 punk kernel: [ 7505.960572] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:33 punk kernel: [ 7506.079586] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:34 punk kernel: [ 7506.580184] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:34 punk kernel: [ 7506.940952] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:34 punk kernel: [ 7506.941424] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:34 punk kernel: [ 7507.080895] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:35 punk kernel: [ 7507.581564] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:35 punk kernel: [ 7507.960633] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:38 punk kernel: [ 7510.584886] amdgpu_cs_ioctl: 10 callbacks suppressed
Feb 10 10:46:38 punk kernel: [ 7510.584890] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:38 punk kernel: [ 7510.957670] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:38 punk kernel: [ 7510.958194] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:38 punk kernel: [ 7511.085772] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:39 punk kernel: [ 7511.586621] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:39 punk kernel: [ 7511.961809] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:39 punk kernel: [ 7511.962585] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:39 punk kernel: [ 7512.087023] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:40 punk kernel: [ 7512.587651] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:40 punk kernel: [ 7512.959955] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:43 punk kernel: [ 7515.591434] amdgpu_cs_ioctl: 10 callbacks suppressed
Feb 10 10:46:43 punk kernel: [ 7515.591443] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:43 punk kernel: [ 7515.960667] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:43 punk kernel: [ 7515.961155] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:43 punk kernel: [ 7516.092378] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:44 punk kernel: [ 7516.593254] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:44 punk kernel: [ 7516.960258] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:44 punk kernel: [ 7516.961017] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:44 punk kernel: [ 7517.094016] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:45 punk kernel: [ 7517.594583] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb 10 10:46:45 punk kernel: [ 7517.960005] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

I experienced a problem today that was similar to the descriptions above. When I closed a tab in Firefox (I think it was a YouTube tab), my desktop session seemed to freeze (with audio still playing) before going black. I was able to get back to GDM with Ctrl+Alt+F1 to reboot in a controlled fashion, but wasn't able to get back to my desktop session. Similar things have happened to me a few times in the near past.

However when I looked at journalctl, I saw the message [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma2 timeout, signaled seq=23679, emitted seq=23681 so it wasn't sdma0 but rather sdma2 indicated in the error message. Is that still the same bug then, or is it something else?

For background:

Hardware description:

CPU: AMD Ryzen 7 5800X3D
GPU: RX 6800 XT
System Memory: 32 GB Ram
Display(s): Samsung Odyssey G7 27"
Type of Display Connection: Displayport

System information:

Distro name and Version: Fedora Workstation 37
Kernel version: 6.1.11-200.fc37.x86_64
AMD official driver version: OpenSource driver from kernel (amdgpu)

[amdgpu]] ERROR ring sdma0 timeout

Brief summary of the problem:

Hardware description:

System information:

How to reproduce the issue:

Logs:

Designs

Child items 0

Activity

Hardware description:

System information:

Admin message

Admin message

[amdgpu]] *ERROR* ring sdma0 timeout

Brief summary of the problem:

Hardware description:

System information:

How to reproduce the issue:

Logs:

Activity

Hardware description:

System information:

[amdgpu]] ERROR ring sdma0 timeout