ac,radeonsi: clear rework, compute/cpdma flags rework, copy shader optimizations, etc. (BIG MR)
This MR continues in !10003 (merged).
Below is the first half.
- Explicit DCC/CMASK clears are parallelized.
- HTILE is enabled for all levels where it's possible (not just level 0).
- Sync flags for CP DMA and internal compute are reworked. Now all callers can specify when they want to sync (e.g. before/after).
- The maximum variable compute shader workgroup size decreased from 1024 to 512 threads to optimize user SGPR usage in internal shaders (to pack the size in 10 bits per channel).
- Some internal compute shaders are optimized.
Tested piglit/glcts/deqp:
-
gfx6-7 -
gfx8 (Polaris11) -
gfx9: (Vega10) -
gfx10: (Navi14) -
gfx10.3 (Sienna)
Edited by Marek Olšák