radv: causes hang untl gpu reset on navy_flounder
[ 18.590107] [drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out!
[ 23.709953] [drm:amdgpu_job_timedout] *ERROR* ring sdma1 timeout, signaled seq=54, emitted seq=56
[ 23.709976] [drm:amdgpu_job_timedout] *ERROR* Process information: process vkcube pid 1037 thread vkcube pid 1037
[ 23.709987] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[ 27.710013] amdgpu 0000:03:00.0: amdgpu: failed to suspend display audio
[ 27.755035] [drm] free PSP TMR buffer
[ 27.799899] amdgpu 0000:03:00.0: amdgpu: MODE1 reset
[ 27.799902] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
[ 27.799990] amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
[ 27.860888] snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x220037
[ 27.860903] snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x220037
[ 27.860909] snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x220037
[ 27.860912] snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x220037
[ 27.860915] snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x220037
[ 27.860918] snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x220037
[ 27.860921] snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x220037
[ 27.860924] snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x220037
[ 28.302115] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Bisected to:
ef40f2ccc29ba7031bcb4ef100f8a9d290df9689 is the first bad commit
commit ef40f2ccc29ba7031bcb4ef100f8a9d290df9689
Author: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Date: Fri Jan 21 00:03:14 2022 +0100
radv/amdgpu: Fix handling of IB alignment > 4 words.
We reserved space for chaining by subtracting 4 words from max_dw, but
then the new alignment code in radv_amdgpu_cs_finalize ended up running
all over that. That resulted in going over buffer size when chaining.
When lucky you'd get a crash, and when unlucky other stuff might happen.
This always adds the 4 words at the end, but initializes with NOP by
default. That way we still adhere to the alignment rules.
Fixes: 1f36f6b83f2 ("radv/winsys: use same IBs padding as the kernel")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14644>
src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c | 37 +++++++++++++++++++--------
1 file changed, 26 insertions(+), 11 deletions(-)