RADV: TRUNC_COORD breaks gather operations
I've noticed some breakage with vkd3d and FidelityFX-CACAO d3d12 demo from AMD, and I think it's some interaction with TRUNC_COORD that causes it.
The demo has a part which computes an UV coordinate as (float2(ThreadID) + 0.5) * InvResolution
and uses this coordinate as part of a textureGather
operation. This lands exactly on the top-left corner of a gather, and the application relies on this to work.
However, with TRUNC_COORD enabled, even a microscopic FP error causes the sampler unit to round down after applying the half-texel bilinear offset, and the resulting image is broken.
The Vulkan specification states that gather samples are selected with rules of VK_FILTER_LINEAR, but if the sampler uses point sampling, TRUNC_COORD is still applied by hardware (I tested on RX 470, RX 5700 XT).
I made a simple test which aims to sample at various points: https://github.com/Themaister/Granite/commit/9230f42ab326afea9733bed610a870c87ac4384f. I'm seeing three different driver behavior now across different devices:
NVIDIA behavior:
[INFO]: U = 0 + 2047 / 2048
[INFO]: Point: 0
[INFO]: U = 1 + 0 / 2048
[INFO]: Point: 1
[INFO]: U = 1 + 1019 / 2048
[INFO]: Gather: 0
[INFO]: U = 1 + 1020 / 2048 <-- This is where we expect RTE to round to 1 if hardware does n.8 RTE.
[INFO]: Gather: 1
[INFO]: U = 1 + 1021 / 2048
[INFO]: Gather: 1
[INFO]: U = 1 + 1022 / 2048
[INFO]: Gather: 1
[INFO]: U = 1 + 1023 / 2048
[INFO]: Gather: 1
[INFO]: U = 1 + 1024 / 2048
[INFO]: Gather: 1
I think this is how it is supposed to work. For point sampling, we get the floor() semantics that spec wants, and for gather, we get n.8 RTE which matches linear sampling. Although the Vulkan spec is somewhat vague if you're allowed to round before floor-ing. It does specifiy there is subtexel precision though ...
RADV: For RADV I get similar flooring behavior in gather.
AMDGPU-PRO: Apparently amdgpu-pro does not enable TRUNC_COORD at all, since even point sampling gives me n.8 RTE behavior. However, this is rather curious. !3951 (merged) mentions that documents specify "n.6 rounding", but I'm only seeing n.8 RTE behavior on AMD.
ANV: Same behavior as RADV with TRUNC_COORD.