AMD GPU always randomly resets when playing Breath of The Wild on Cemu
Pretty much the same situation as #7250 (closed) (Dishonored 2 under wine/dxvk; system info is there) but this time I can't reliably make a crash dump with RADV_DEBUG="hang", possibly because cemu uses async shader compilation and "hang" forces syncshaders option. Or maybe because it runs much slower.
It usually rests with something like:
Oct 30 19:35:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2739158, emitted seq=2739165
Oct 30 19:35:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_x11 pid 3630 thread kwin_x11:cs0 pid 3662
Oct 30 19:35:04 kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
But "offending" process listed is either cemu, compositor (kwin), Xorg or some other GPU user that just was active at the same time.
Maybe using EQAA and VKBasalt exacerbates the issue but I doubt that. The rendering is [flawless]*, nothing like #7406 (closed) But right before the rest geometry often starts to jump around or random coloured squares appear on all GPU-rendered surfaces in all apps. Or it just resets without such prior random artefacts.
It may happen at any point in game but I find that Shrine challenges trigger it the most, maybe because they allow 1.5-2.0 times more fps on average. Usually game runs at 20-40 on RX580 but in Shrines it's 40-70.
Similar things were reported by another radv user about yuzu which also can run another version (slower and less customizable Switch version instead of WiiU) of BOTW with async shaders. Note that Cemu has just recently gone native on Linux but is pretty much feature-complete. See my package as reference for building for your distro.
UPDATE1: * Actually, rendering is only flawless on Vulkan and on a non-Vega card while OpenGL rendering is affected by old #1334 (comment 555735) even natively and requires AMD_DEBUG=nohyperz workaround. Also, lack of async shader compilation causes massive stuttering.
Hang & crash happens in both OpenGL and Vulkan modes. In OpenGL it's especially susceptible to bugged sell menus that cause 100% GPU use.
UPDATE2: There is an opinion that those random hangs are caused by firmware crapping out on dynamic frequency switching because it doesn't like something about powerplay tables. The resulting workaround is disabling dynamic switching and forcing max power-state. Instead, I switched the card to the secondary VBIOS via hardware switch and it was stable since, despite it having slightly higher frequency with the same voltage.