ring_gfx hangs/freezes on Navi gpus

Jeremy Attali said:

Not sure if that might help someone else, but I found a workaround in my case with DOOM. I was having the same crashes as Marko described with Starcraft II, I tried the following:

In Steam, I disabled the In Game Steam Overlay
I switched the Graphics API from OpenGL to Vulkan

I did not have any crash so far. But I haven't tried to isolate one or the other.

Packages:
linux 5.3.arch1-1
linux-firmware-agd5f-radeon-navi10 2019.09.13.18.36-1
mesa-git 1:19.3.0_devel.115574.40087ffc5b9-1
vulkan-radeon-git 1:19.3.0_devel.115574.40087ffc5b9-1
libdrm 2.4.99-1
lib32-mesa-git 1:19.3.0_devel.115574.40087ffc5b9-1
lib32-vulkan-radeon-git 1:19.3.0_devel.115574.40087ffc5b9-1
lib32-libdrm 2.4.99-1

Daniel Lu uploaded an attachment:

Attachment 145464, "dmesg output":
dmesg.log

Daniel Lu uploaded an attachment:

Attachment 145465, "output of running sudo umr -R gfx_0.0.0":
umr_gfx_000.log

Daniel Lu said:

I am seeing a similar hang in Starcraft II. Unlike Marko, I am not using d9vk --- instead, I'm using wine-nine. The hang doesn't happen in all games but seems to be particularly frequent in the coop mission "dead of night".

Using mesa-git 19.3.0_devel.115092.3f5b541fc8b-1.

Doug Tyrrell @DougTy said:

I've been getting this too with Minecraft:
https://bugs.freedesktop.org/show_bug.cgi?id=111669

For my particular case at least, AMD_DEBUG=nodma seems to fix it

Marko Popovic said:

(In reply to Doug Ty from comment 5)

I've been getting this too with Minecraft:
https://bugs.freedesktop.org/show_bug.cgi?id=111669

For my particular case at least, AMD_DEBUG=nodma seems to fix it

(In reply to Marko Popovic from comment 0)
> There is another type of freeze/hang happening when playing Starcraft II via
> D9VK. This one doesn't seem to be related to either ngg or dma because I
> have them both disabled by AMD_DEBUG=nodma and AMD_DEBUG=nongg and the hangs
> occur anyway, on exactly the same place every time.

You are refering to sdma0 / sdma1 type hang which is tracked here:https://bugs.freedesktop.org/show_bug.cgi?id=111481

For ring_gfx hangs they're quite more reproducible and are not affected by AMD_DEBUG=nodma or AMD_DEBUG=nongg which I already mentioned above in the bug description.

Doug Tyrrell @DougTy said:

(In reply to Marko Popovic from comment 6)

(In reply to Doug Ty from comment 5)

I've been getting this too with Minecraft:
https://bugs.freedesktop.org/show_bug.cgi?id=111669

For my particular case at least, AMD_DEBUG=nodma seems to fix it

You are refering to sdma0 / sdma1 type hang which is tracked
here:https://bugs.freedesktop.org/show_bug.cgi?id=111481

For ring_gfx hangs they're quite more reproducible and are not affected by
AMD_DEBUG=nodma or AMD_DEBUG=nongg which I already mentioned above in the
bug description.

Sorry, but this is incorrect. My Minecraft hang is most definitely a ring gfx hang, *not* sdma. I've posted logs and apitraces in the linked thread if you'd like to check for yourself.

I can't explain why nodma isn't working for you, perhaps it doesn't work for game? Have you tried putting it in /etc/environment so it's system-wide? I don't know what to tell you regarding nodma, but my hang is definitely ring gfx as well.

Marko Popovic said:

(In reply to Doug Ty from comment 7)

(In reply to Marko Popovic from comment 6)

(In reply to Doug Ty from comment 5)

I've been getting this too with Minecraft:
https://bugs.freedesktop.org/show_bug.cgi?id=111669

For my particular case at least, AMD_DEBUG=nodma seems to fix it

You are refering to sdma0 / sdma1 type hang which is tracked
here:https://bugs.freedesktop.org/show_bug.cgi?id=111481

For ring_gfx hangs they're quite more reproducible and are not affected by
AMD_DEBUG=nodma or AMD_DEBUG=nongg which I already mentioned above in the
bug description.

Sorry, but this is incorrect. My Minecraft hang is most definitely a ring
gfx hang, *not* sdma. I've posted logs and apitraces in the linked thread if
you'd like to check for yourself.

I can't explain why nodma isn't working for you, perhaps it doesn't work for
game? Have you tried putting it in /etc/environment so it's system-wide? I
don't know what to tell you regarding nodma, but my hang is definitely ring
gfx as well.

I guess we just have many different types of hangs then... ring_gfx hangs are more mysterious than sdma0/1 hangs it seems, since there is no "universal" workaround for them. nodma works for stopping global sdma-type hangs for me, nongg works for stopping the citra-related hang of ring_gfx type, but none of those 2 variables work for stopping Starcraft II and RoTR ring_gfx-type hangs for me, so it's really really confusing.

Marko Popovic said:

https://cgit.freedesktop.org/mesa/mesa/commit/?id=a2a68d551c1c2a4f13761ffa8f3f6f13fee7a384

This might actually fix the ring_gfx type hangs or even sdma ones at least for Vulkan API? Not exactly sure but will also be testing the latest MESA builds from Oibaf's PPA in following days and report back on the issue :)

tak..@..ios.de said:

(In reply to Marko Popovic from comment 9)

https://cgit.freedesktop.org/mesa/mesa/commit/
?id=a2a68d551c1c2a4f13761ffa8f3f6f13fee7a384

This might actually fix the ring_gfx type hangs or even sdma ones at least
for Vulkan API? Not exactly sure but will also be testing the latest MESA
builds from Oibaf's PPA in following days and report back on the issue :)

Sadly, I'm still getting the ring_gfx hangs after a few minutes of playing Trackmania 2.

Marko Popovic said:

(In reply to takios+fdbugs from comment 10)

(In reply to Marko Popovic from comment 9)

https://cgit.freedesktop.org/mesa/mesa/commit/
?id=a2a68d551c1c2a4f13761ffa8f3f6f13fee7a384

This might actually fix the ring_gfx type hangs or even sdma ones at least
for Vulkan API? Not exactly sure but will also be testing the latest MESA
builds from Oibaf's PPA in following days and report back on the issue :)

Sadly, I'm still getting the ring_gfx hangs after a few minutes of playing
Trackmania 2.

Oh yes I forgot to add a reply here. It didn't solve any of the hangs for me either.

shahul said:

I am working on Navi10 RX5700
I am facing below issue when i run unigine-heaven benchmark

[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5075872, emitted seq=5075874
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process heaven_x64 pid 13723 thread heaven_x64:cs0 pid 13741
[drm] GPU recovery disabled.

Is any fix for it ?

Thanks on advance.

Pierre-Eric Pelloux-Prayer @pepp said:

For hangs involving radv the AMD_DEBUG options aren't relevant.
You should use RADV_DEBUG instead (probably doesn't support the same values).

Also opening a bug in https://gitlab.freedesktop.org/mesa/mesa/issues is a good idea since gfx hangs are most likely a driver issue (radv or radeonsi, depending on the API used).