AMD gpu crash *ERROR* ring gfx_0.0.0 timeout while playing modded minecraft.
Brief summary of the problem:
Hello
I'm getting frequent *ERROR* ring gfx_0.0.0 timeout
GPU crashes while playing Minecraft. When gpu crashes the only way out is to use linux sysrq keyboard shortcuts to kill everything and go back to login manager or restart the pc with power button.
Overall there are no issues with playing minecraft, but there is a one specific scenario where the crash occurs.
The crash happens while playing modded minecraft on my private server for friends. It usually occurs within first 60 seconds or 10-15 minutes after joining the server. Unfortunately this is the only place where I can reproduce the crash. On singleplayer it's fine. On other modded minecraft installations it's fine. But on this modpack on this server (didn't tried other servers, as this is the only server I have) the crash occurs.
#1974 seems to be related, but as Alex Deucher said it turned into a mess so I'm opening a new issue. This crash message also looks almost identical to mine #1974 (comment 2191016)
Hardware description:
- CPU: AMD Ryzen 7 3700X
- GPU: AMD Radeon RX 5600 XT
2f:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev ca)
- System Memory: 16GB
- Display: HP x27i
- Type of Display Connection: DisplayPort
- Motherboard: MPG X570 GAMING PLUS (MS-7C37)
- Protocol: Wayland
System information:
- Distro name and Version: Nobara Linux 38 KDE Edition (distro based on fedora)
- Kernel version: 6.6.3-202.fsync.fc38.x86_64
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
The only known to me method of reproducing the crash is joining that one server on that modded minecraft, I don't think I have an universal method. But I can test things for you.
#1974):
What I have already tried (based on- Adding this kernel parameter: amdgpu.ppfeaturemask=0xffffbfff
- Adding this kernel parameter: amdgpu.ppfeaturemask=0xfffd3fff
- Adding this kernel parameter: radeon.dpm=0
- Setting LCLK DPM to Off in bios (UEFI)
None of these resovled the issue. I think disabling dpm helps a bit, making the system survive longer, but still I get a crash after around 10-15 minutes.
Attached files:
Screenshots/video files
For some reason I can't attach anything, so I uploaded the images to some site. Game, after the gpu crash: https://ibb.co/HTHTmDn Visible screen artifacts: https://ibb.co/F88BJR2 https://ibb.co/3SwrML9
Log files (for system lockups / game freezes / crashes)
[ 412.382862] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2018461, emitted seq=2018463
[ 412.383143] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process java pid 4526 thread java:cs0 pid 4672
[ 412.383401] amdgpu 0000:2f:00.0: amdgpu: GPU reset begin!
[ 412.564582] amdgpu 0000:2f:00.0: amdgpu: BACO reset
[ 414.923876] amdgpu 0000:2f:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 415.013985] amdgpu 0000:2f:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 415.019880] amdgpu 0000:2f:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 415.019882] amdgpu 0000:2f:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 415.019884] amdgpu 0000:2f:00.0: amdgpu: SMU is resuming...
[ 415.019933] amdgpu 0000:2f:00.0: amdgpu: use vbios provided pptable
[ 415.019935] amdgpu 0000:2f:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
[ 415.022687] amdgpu 0000:2f:00.0: amdgpu: SMU is resumed successfully!
[ 415.217236] amdgpu 0000:2f:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 415.217238] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 415.217240] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 415.217241] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 415.217243] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 415.217244] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 415.217246] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 415.217247] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 415.217249] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 415.217251] amdgpu 0000:2f:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[ 415.217252] amdgpu 0000:2f:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 415.217254] amdgpu 0000:2f:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 415.217255] amdgpu 0000:2f:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 8
[ 415.217257] amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 8
[ 415.217258] amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 8
[ 415.217260] amdgpu 0000:2f:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[ 415.219889] amdgpu 0000:2f:00.0: amdgpu: recover vram bo from shadow start
[ 415.223558] amdgpu 0000:2f:00.0: amdgpu: recover vram bo from shadow done
[ 415.223590] amdgpu 0000:2f:00.0: amdgpu: GPU reset(2) succeeded!
[ 415.223687] amdgpu_cs_ioctl: 244 callbacks suppressed
[ 415.223688] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 425.684926] amdgpu 0000:2f:00.0: [drm] *ERROR* [CRTC:85:crtc-0] flip_done timed out
[ 438.485398] amdgpu 0000:2f:00.0: [drm] *ERROR* flip_done timed out
[ 438.485403] amdgpu 0000:2f:00.0: [drm] *ERROR* [CRTC:85:crtc-0] commit wait timed out
[ 448.725296] amdgpu 0000:2f:00.0: [drm] *ERROR* flip_done timed out
[ 448.725300] amdgpu 0000:2f:00.0: [drm] *ERROR* [PLANE:70:plane-5] commit wait timed out