ANV: Pathological performance cliff with compute shader
I have a benchmark which runs 2-3x faster on the Windows driver on same hardware. Tested on Intel UHD 620 on Mesa 20.0.7.
To run benchmark on Linux:
git clone git://github.com/Themaister/parallel-rdp cd parallel-rdp git checkout 2b0ff05bfb49ef9eb02a7ade331fd91331ecc72c git submodule update --init --recursive mkdir build cd build cmake .. -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release --parallel ./rdp-bench
My end result is ~0.1 Gpixels/s. With a similar build process on Windows, I get ~0.32 GPixels/s.
One caveat is that the Windows build assumes
PARALLEL_RDP_SMALL_TYPES=0 since 32-bit arithmetic was significantly faster than 8/16-bit arithmetic. To also run Windows with 8/16-bit arithmetic, use the mentioned env var, and I now observed ~0.22 GPixels/s.
Attached is a Fossilize archive with the shaders compiled for the benchmark.
repro.foz. The pipeline
0771c744744c4da4 is the likely culprit as it's SIMD8 with a ton of spilling. Unfortunately the Windows driver does not support pipeline executable properties so I cannot inspect compilation results.