anv: A fix and WA for Breaking Limit
This MR contains: a fix for anv_execbuf_add_bo_bitset: Assertion 'bo->refcount > 0' failed.
assert, workaround for the hang on load, and a two tiny fixes I found while working on the issue.
The most important part - hang, happens due to ./resources/binaries/ShaderCache/Vulkan/ddgi-probeDistanceBlendingCS&f54270e74054da22.comp
compute shader. Which seems to be default ProbeBlendingCS.hlsl from RTXGI SDK with RTXGI_DDGI_BLEND_SCROLL_SHARED_MEMORY
defined as 1.
More specifically, the hang happens on barrier()
that corresponds to the line 227.
The problem is this shader is actually violating the spec. LoadScrollSharedMemory
call that contains the barrier is inside if(!isBorderTexel)
condition, and isBorderTexel
is calculated in a way that is non uniform among all invocations. So the barrier is in non-uniform control flow, which is not allowed in both Vulkan and DX12.
On the EU level the hang happens because in SIMD8/SIMD16 mode some threads have isBorderTexel == true
uniformly, which means the SEND/SYNC instructions used to implement the barrier will be completely jumped over and not executed even despite having WE_all
.
Unfortunately I wasn't able to find a better workaround than forcing subgroup size of 32, even though it "fixes" the issue basically coincidentally. If anyone has a better idea I would be happy to hear it :)
Closes: #11497 (closed)
cc @llandwerlin