Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
I've also encountered this crash on similar hardware, and it does seem to be related to usage of va-api (with mpv for me). The screen freezes for a few seconds, and then blanks. (Audio is still playing when this happens, though.)
System specifications:
OS: Gentoo Linux
System: ThinkPad T14 Gen 3 AMD (21CFCTO1WW), BIOS version 1.30
I had also again a freeze. Exact same behaviour like @gerbilsoft and @nicolas.frenay. I thought it will be fixed in newer kernel versions, but it seems not
log:
Nov 16 08:40:52 attrobit-001 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfxNov 16 08:40:52 attrobit-001 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failedNov 16 08:40:52 attrobit-001 kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)Nov 16 08:40:51 attrobit-001 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0Nov 16 08:40:51 attrobit-001 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=33112, emitted seq=33114
OS: Fedora 37Kernel: 6.0.8-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMICHW: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA507RECPU: AMD Ryzen™ 7 6800H with Radeon™ Graphics × 16Graphic: NVIDIA GeForce RTX™ 3050 Ti Laptop GPU / REMBRANDTGnome: 43.0Window manager: Wayland
I use my notebook with dual screen mode with an external monitor on HDMI.
Description:
My notebook froze and didn't respond to anything. Runned only a web browser with a news portal and a Gimp. After I rebooted it, I found this note in logs:
Nov 19 20:06:40 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=617402, emitted seq=617404Nov 19 20:06:40 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0Nov 19 20:06:40 fedora kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!Nov 19 20:06:41 fedora kernel: amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)Nov 19 20:06:41 fedora kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failedNov 19 20:06:41 fedora kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfxNov 19 20:06:41 fedora kernel: [drm] free PSP TMR bufferNov 19 20:06:41 fedora kernel: amdgpu 0000:05:00.0: amdgpu: MODE2 resetNov 19 20:06:41 fedora kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resumeNov 19 20:06:41 fedora kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).Nov 19 20:06:41 fedora kernel: [drm] PSP is resuming...Nov 19 20:06:41 fedora kernel: [drm] reserve 0xa00000 from 0xf409000000 for PSP TMRNov 19 20:06:41 fedora kernel: amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not availableNov 19 20:07:32 fedora NetworkManager[1355]: <info> [1668884852.1422] dhcp4 (eno1): state changed new lease, address=192.168.240.241-- Boot 82f901bc7b0a449cbd2b6f1dff9570aa --
Just had the crash happen 3 times in a row. All three times, video was playing (twice in mpv, once in Chrome; mpv was definitely using vaapi, not sure about Chrome), and I was running e4defrag on a USB 3.0 HDD. I'm thinking the e4defrag process may have had something to do with it...
Hi folks, I have also experienced this issue. My specs are:
OS: Arch Linux x86_64 Host: 21CMCTO1WW ThinkPad X13 Gen 3 Kernel: 6.0.12-arch1-1 CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics (16) @ 4.768GHz GPU: AMD ATI Radeon 680M Memory: 32Gb
besides the screen freezes has anyone also experienced screen flickering?
In my case the screen also flickers from time to time, interestingly enough this only happens on the laptopscreen, if I am using an external monitor it's only the laptop screen which flickers.
Thanks! @nicolas.frenay I will try that, I have since downgraded the kernel to the 5.19.9 to see how it behaves, if flickering persists I will try the mesa-git.
@sylv-io exactly, I am using sway 1.7 to be precise
Marcello see Nicolas remark above regarding using mesa-git with ff928d9567 and if it fixes it for you. I have been for some some hours with kernel 5.19.9 and haven't had both the screen nor the flickering issues. Will update this as soon as (and if) it occurs.
@jxs oh right, thanks. I now installed mesa-git: 23.0.0_devel.164523.6b3f085c3cd.5269a95f00c4d6964d487d9dbd94f62b-1
I'll let you know if it solves the flickering on my setup.
commit 81d0bcf9900932633d270d5bc4a54ff599c6ebdbAuthor: Alex Deucher <alexander.deucher@amd.com>Date: Wed Dec 7 11:08:53 2022 -0500 drm/amdgpu: make display pinning more flexible (v2) Only apply the static threshold for Stoney and Carrizo. This hardware has certain requirements that don't allow mixing of GTT and VRAM. Newer asics do not have these requirements so we should be able to be more flexible with where buffers end up. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2270 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2291 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2255 Acked-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
The sdma lockup still happens occasionally, but after the screen blanks, it reappears, though still locked up. Pressing Ctrl+Alt+F2 does switch to a VT after 10-15 seconds, and the VT is responsive; switching back to the Wayland session results in a frozen black screen, but I can switch back to the working VT. Killing the Wayland compositor and restarting it does appear to work.
[drm:amdgpu_job_timedout] *ERROR* ring gfx_0.0.0 timeout, signaled seq=6600570, emitted seq=6600572[drm:amdgpu_job_timedout] *ERROR* Process information: process kwin_wayland pid 5088 thread kwin_wayla:cs0 pid 5138amdgpu 0000:04:00.0: amdgpu: GPU reset begin!amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)[drm:gfx_v10_0_hw_fini] *ERROR* KGQ disable failed[drm:gfx_v10_0_hw_fini] *ERROR* failed to halt cp gfx[drm] free PSP TMR bufferamdgpu 0000:04:00.0: amdgpu: MODE2 resetamdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume[drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).[drm] PSP is resuming...[drm] reserve 0xa00000 from 0xf43e000000 for PSP TMRamdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not availableamdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not availableamdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not availableamdgpu 0000:04:00.0: amdgpu: SMU is resuming...amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully![drm] DMUB hardware initialized: version=0x0400002E[drm] kiq ring mec 2 pipe 1 q 0[drm] VCN decode and encode initialized successfully(under DPG Mode).[drm] JPEG decode initialized successfully.amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow startamdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done[drm] Skip scheduling IBs!amdgpu 0000:04:00.0: amdgpu: GPU reset(2) succeeded![drm] Skip scheduling IBs![drm] Skip scheduling IBs![drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
EDIT: That's a slightly different timeout (on gfx), but I did get an sdma0 timeout later, with the same symptoms wrt VT switching:
[drm:amdgpu_job_timedout] *ERROR* ring sdma0 timeout, signaled seq=200216, emitted seq=200218[drm:amdgpu_job_timedout] *ERROR* Process information: process pid 0 thread pid 0amdgpu 0000:04:00.0: amdgpu: GPU reset begin!amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)[drm:gfx_v10_0_hw_fini] *ERROR* KGQ disable failed[drm:gfx_v10_0_hw_fini] *ERROR* failed to halt cp gfx[drm] free PSP TMR bufferamdgpu 0000:04:00.0: amdgpu: MODE2 resetamdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume[drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).[drm] PSP is resuming...[drm] reserve 0xa00000 from 0xf43e000000 for PSP TMRamdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not availableamdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not availableamdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not availableamdgpu 0000:04:00.0: amdgpu: SMU is resuming...amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully![drm] DMUB hardware initialized: version=0x0400002E[drm] kiq ring mec 2 pipe 1 q 0[drm] VCN decode and encode initialized successfully(under DPG Mode).[drm] JPEG decode initialized successfully.amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow startamdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow doneamdgpu 0000:04:00.0: amdgpu: GPU reset(1) succeeded!
I also experience this issue, mostly matching the descriptions above. I experience occasional flickering during normal use. When it crashes, the screen blanks but sound continues playing. I haven't noticed if it is directly related to use of VA-API, although I have taken care to ensure all hardware codecs are enabled and it tends to crash while running a lot of things and switching between apps. I'm able to restore the session and the login screen appears, but upon logging in only an unresponsive terminal appears, showing the text @^@^@^@^@^@^@^@^@^@^@^@^@^. A working console is available on another session though, so dmesg can be dumped. See attachments for logs from two such crashes, I'm happy to try running a patch as well, given instructions.
I understand that the actual refcount crash might be fixed via the patch in #2220 (comment 1695917), but that also means that the actual cause of the issue remains unresolved / unclear?
I've tried new amd-staging-drm-next in arch (ThinkPad T14s Gen3 AMD 6580U), which has quite a lot of updates from yesterdays drop for 6.3 version. Crashes usually occur while recording wayland screen with OBS (it uses VAAPI which trigger it for me much faster), withing 15-30 mins.
I can confirm reset still occurs (see below), BUT it recovered. Zoom meeting I had continued, OBS resumed recording, everything worked - there was just a pause for that timeout period. This NEVER happened before, so if this is new normal, much better at least.
Jan 12 16:14:46 arrow kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=154548, emitted seq=154550Jan 12 16:14:46 arrow kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0Jan 12 16:14:46 arrow kernel: amdgpu 0000:33:00.0: amdgpu: GPU reset begin!Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: MODE2 resetJan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: GPU reset succeeded, trying to resumeJan 12 16:14:47 arrow kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400A00000).Jan 12 16:14:47 arrow kernel: [drm] PSP is resuming...Jan 12 16:14:47 arrow kernel: [drm] reserve 0xa00000 from 0xf43e000000 for PSP TMRJan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: RAS: optional ras ta ucode is not availableJan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: RAP: optional rap ta ucode is not availableJan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not availableJan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: SMU is resuming...Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: SMU is resumed successfully!Jan 12 16:14:47 arrow kernel: [drm] DMUB hardware initialized: version=0x0400002EJan 12 16:14:47 arrow kernel: [drm] kiq ring mec 2 pipe 1 q 0Jan 12 16:14:47 arrow kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).Jan 12 16:14:47 arrow kernel: [drm] JPEG decode initialized successfully.Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1Jan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: recover vram bo from shadow startJan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: recover vram bo from shadow doneJan 12 16:14:47 arrow kernel: amdgpu 0000:33:00.0: amdgpu: GPU reset(1) succeeded!
With Kernel 6.2.0-rc4, more information/additional messages are logged:
[141106.268781] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=10124525, emitted seq=10124527[141106.269274] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process alacritty.real pid 133801 thread alacritty.:cs0 pid 133811[141106.269711] amdgpu 0000:04:00.0: amdgpu: GPU reset begin![141106.813196] amdgpu 0000:04:00.0: amdgpu: MODE2 reset[141106.821875] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume[141106.822200] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).[141106.822260] [drm] PSP is resuming...[141106.844509] [drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR[141107.167966] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available[141107.180296] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available[141107.180302] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available[141107.180310] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...[141107.180718] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully![141107.182582] [drm] DMUB hardware initialized: version=0x0400002A[141107.187713] [drm] Watermarks table not configured properly by SMU[141108.656494] [drm] kiq ring mec 2 pipe 1 q 0[141108.661600] [drm] VCN decode and encode initialized successfully(under DPG Mode).[141108.662347] [drm] JPEG decode initialized successfully.[141108.662352] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0[141108.662355] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0[141108.662356] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0[141108.662357] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0[141108.662358] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0[141108.662359] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0[141108.662359] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0[141108.662360] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0[141108.662361] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0[141108.662361] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0[141108.662362] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0[141108.662363] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1[141108.662364] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1[141108.662364] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1[141108.662365] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1[141108.670802] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start[141108.670805] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done[141108.670830] amdgpu 0000:04:00.0: amdgpu: GPU reset(2) succeeded![141108.672246] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141111.302333] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141114.343197] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141117.384241] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141118.812913] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=10124529, emitted seq=10124532[141118.813402] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 12369 thread Xwayland:cs0 pid 13190[141118.813851] amdgpu 0000:04:00.0: amdgpu: GPU reset begin![141118.955379] amdgpu 0000:04:00.0: amdgpu: MODE2 reset[141118.964203] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume[141118.964616] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).[141118.964637] [drm] PSP is resuming...[141118.986961] [drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR[141119.309301] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available[141119.321711] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available[141119.321717] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available[141119.321726] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...[141119.322848] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully![141119.324751] [drm] DMUB hardware initialized: version=0x0400002A[141119.328586] [drm] Watermarks table not configured properly by SMU[141120.417277] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141120.835462] [drm] kiq ring mec 2 pipe 1 q 0[141120.840049] [drm] VCN decode and encode initialized successfully(under DPG Mode).[141120.840637] [drm] JPEG decode initialized successfully.[141120.840642] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0[141120.840645] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0[141120.840647] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0[141120.840647] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0[141120.840649] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0[141120.840649] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0[141120.840650] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0[141120.840651] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0[141120.840651] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0[141120.840652] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0[141120.840653] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0[141120.840653] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1[141120.840654] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1[141120.840655] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1[141120.840656] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1[141120.848771] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start[141120.848774] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done[141120.848881] amdgpu 0000:04:00.0: amdgpu: GPU reset(4) succeeded![141120.931330] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141123.458253] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141126.498789] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141129.530542] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141131.100766] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=10124534, emitted seq=10124537[141131.101257] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 10273 thread gnome-shel:cs0 pid 10390[141131.101695] amdgpu 0000:04:00.0: amdgpu: GPU reset begin![141131.252402] amdgpu 0000:04:00.0: amdgpu: MODE2 reset[141131.261135] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume[141131.261536] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).[141131.261556] [drm] PSP is resuming...[141131.283958] [drm] reserve 0xa00000 from 0xf43e000000 for PSP TMR[141131.612648] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available[141131.624959] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available[141131.624963] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available[141131.624971] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...[141131.625920] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully![141131.627774] [drm] DMUB hardware initialized: version=0x0400002A[141131.631726] [drm] Watermarks table not configured properly by SMU[141132.563236] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141133.134856] [drm] kiq ring mec 2 pipe 1 q 0[141133.138545] [drm] VCN decode and encode initialized successfully(under DPG Mode).[141133.139151] [drm] JPEG decode initialized successfully.[141133.139161] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0[141133.139164] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0[141133.139165] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0[141133.139166] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0[141133.139167] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0[141133.139167] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0[141133.139168] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0[141133.139169] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0[141133.139169] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0[141133.139170] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0[141133.139171] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0[141133.139172] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1[141133.139172] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1[141133.139173] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1[141133.139174] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1[141133.144863] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start[141133.144865] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done[141133.144891] amdgpu 0000:04:00.0: amdgpu: GPU reset(6) succeeded![141133.144934] [drm] Skip scheduling IBs![141133.145799] [drm] Skip scheduling IBs![141133.145941] [drm] Skip scheduling IBs![141133.181977] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141133.188701] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141133.189213] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141133.225059] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141133.245102] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141133.285385] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125![141374.548251] INFO: task kworker/u32:7:2855204 blocked for more than 120 seconds.[141374.548263] Tainted: P W OE 6.2.0-060200rc4-generic #202301151633[141374.548267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.[141374.548270] task:kworker/u32:7 state:D stack:0 pid:2855204 ppid:2 flags:0x00004000[141374.548278] Workqueue: events_unbound commit_work [drm_kms_helper][141374.548308] Call Trace:[141374.548311] <TASK>[141374.548314] __schedule+0x293/0x610[141374.548322] ? check_preempt_wakeup+0x13e/0x320[141374.548330] schedule+0x63/0x110[141374.548333] schedule_timeout+0x128/0x160[141374.548338] dma_fence_default_wait+0x13d/0x210[141374.548345] ? __pfx_dma_fence_default_wait_cb+0x10/0x10[141374.548349] dma_fence_wait_timeout+0x116/0x140[141374.548354] drm_atomic_helper_wait_for_fences+0x89/0xf0 [drm_kms_helper][141374.548376] commit_tail+0x3c/0x190 [drm_kms_helper][141374.548394] ? __schedule+0x29b/0x610[141374.548398] commit_work+0x12/0x20 [drm_kms_helper][141374.548416] process_one_work+0x225/0x400[141374.548422] worker_thread+0x50/0x3e0[141374.548426] ? __pfx_worker_thread+0x10/0x10[141374.548429] kthread+0xe9/0x110[141374.548434] ? __pfx_kthread+0x10/0x10[141374.548439] ret_from_fork+0x2c/0x50[141374.548447] </TASK>
As you can see from the timestamps, this time it took a while for the problem to appear. The system could not recover (but the display did recover with some green garbage pixels here and there).
Happened here too on a Minis Forum UM690 (Mini PC). I was just coding (IntelliJ IDEA) and played some music (Plexamp). Was not playing a game nor watching a video.
CPU: AMD Ryzen Mobile 6900hxGPU: Integrated GPU (680M)System Memory: 64 GB RamDisplay(s): One HDMI connected 32" Screen with WQHD resolutionOS: Fedora Workstation 37 (Wayland)Kernel: 6.1.6-200.fc37.x86_64Gnome: 43.2
...[13182.778160] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=108759, emitted seq=108761[13182.778677] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0[13182.779184] amdgpu 0000:35:00.0: amdgpu: GPU reset begin![13183.221460] amdgpu 0000:35:00.0: amdgpu: free PSP TMR buffer[13183.253019] amdgpu 0000:35:00.0: amdgpu: MODE2 reset[13183.262864] amdgpu 0000:35:00.0: amdgpu: GPU reset succeeded, trying to resume[13183.263014] [drm] PCIE GART of 1024M enabled (table at 0x000000F4FFC00000).[13183.263058] [drm] PSP is resuming...[13183.285152] [drm] reserve 0xa00000 from 0xf4fe000000 for PSP TMR[13183.580343] amdgpu 0000:35:00.0: amdgpu: RAS: optional ras ta ucode is not available[13183.589649] amdgpu 0000:35:00.0: amdgpu: RAP: optional rap ta ucode is not available[13183.589652] amdgpu 0000:35:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available[13183.589656] amdgpu 0000:35:00.0: amdgpu: SMU is resuming...[13183.590502] amdgpu 0000:35:00.0: amdgpu: SMU is resumed successfully![13183.591868] [drm] DMUB hardware initialized: version=0x0400002E[13183.669737] [drm] kiq ring mec 2 pipe 1 q 0[13183.673467] [drm] VCN decode and encode initialized successfully(under DPG Mode).[13183.674268] [drm] JPEG decode initialized successfully.[13183.674270] amdgpu 0000:35:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0[13183.674272] amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0[13183.674272] amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0[13183.674273] amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0[13183.674273] amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0[13183.674274] amdgpu 0000:35:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0[13183.674274] amdgpu 0000:35:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0[13183.674275] amdgpu 0000:35:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0[13183.674275] amdgpu 0000:35:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0[13183.674276] amdgpu 0000:35:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0[13183.674277] amdgpu 0000:35:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0[13183.674277] amdgpu 0000:35:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1[13183.674278] amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1[13183.674278] amdgpu 0000:35:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1[13183.674279] amdgpu 0000:35:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1[13183.680699] amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow start[13183.680701] amdgpu 0000:35:00.0: amdgpu: recover vram bo from shadow done[13183.680724] amdgpu 0000:35:00.0: amdgpu: GPU reset(1) succeeded!
To recover your session, switch to another tty. Assuming your default tty is tty (check with w command), run Ctrl+Alt+F6, then change back with Ctrl+Alt+F7 to go back. I recovered my browser session this way.
To work around the hangs, uninstall VAAPI support (eg. sudo pacman -R libva-mesa-driver). This has helped prevent these problems from occurring for now.
Removing libva-mesa-driver workaround does not work for me. It seems to happen more often when VAAPI is used, but not only then. For example, I do not get crash in normal usage, but if I record using OBS or share screen on wayland using zoom, crash is almost certain within hour or so (both use VAAPI to my understanding, but as said, even removing va lib does not solve it fully).
I'm also running linux-amd-staging-drm-next, which is working little bit better (less frequent crashes, though within 1-2 hours of doing above still happens).
Hmm I don't have any issues like this: journalctl -b -1 -k | grep sdma. I use CLion and firefox heavily
➜ ~ journalctl -b -1 -k | grep sdmaFeb 08 17:55:47 g9 kernel: [drm] add ip block number 7 <sdma_v5_2>Feb 08 17:55:48 g9 kernel: amdgpu: sdma_bitmap: 3Feb 08 17:55:48 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0Feb 09 06:47:13 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0Feb 09 08:16:28 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0Feb 09 17:07:29 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0Feb 09 18:55:48 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0Feb 10 06:49:18 g9 kernel: amdgpu 0000:64:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0➜ ~
I experienced a problem today that was similar to the descriptions above. When I closed a tab in Firefox (I think it was a YouTube tab), my desktop session seemed to freeze (with audio still playing) before going black. I was able to get back to GDM with Ctrl+Alt+F1 to reboot in a controlled fashion, but wasn't able to get back to my desktop session. Similar things have happened to me a few times in the near past.
However when I looked at journalctl, I saw the message [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma2 timeout, signaled seq=23679, emitted seq=23681 so it wasn't sdma0 but rather sdma2 indicated in the error message. Is that still the same bug then, or is it something else?
For background:
Hardware description:
CPU: AMD Ryzen 7 5800X3D
GPU: RX 6800 XT
System Memory: 32 GB Ram
Display(s): Samsung Odyssey G7 27"
Type of Display Connection: Displayport
System information:
Distro name and Version: Fedora Workstation 37
Kernel version: 6.1.11-200.fc37.x86_64
AMD official driver version: OpenSource driver from kernel (amdgpu)