anv: Reduce tile and data cache flushes
Gen12+ unified cache architecture reduces need to flush tile and data caches. Tile cache contains pixel/depth values and data cache holds SSBO data. Combined, these make up the bulk of data stored in L3$ and are very expensive to flush. Currently, we flush tile/data caches at the start of every command buffer, and usually multiple within a given CB. Most tile/data cache flushes can be avoided by ensuring we flush only at the necessary times, like when CPU needs access to the data or for depth clears. Also implemented INTEL_DEBUG=pc for ANV to make future flush optimization work easier.
Gen12+ enables caching of VB in L3. Previously this was not allowed. L3
caching of VB data is enabled to reduce need for L3 flushing. Performance analysis showed no performance delta enabling vs disabling VB L3 caching, except for perf speedup from reduced flushing.
When taken together with Lionel's earlier commit for moving L3 config init to device init, tila/data cache flush counts are drastically reduce in Vulkan workloads resulting in modest performance gains.
Fallout4, 1 frame Tile Cache flushes: 575 -> 9 Data Cache flushes: 383 -> 2
Shadow of Tomb Raider, 1 frame Tile Cache flushes: 484 -> 30 Data Cache flushes: 309 -> 3
Performance speedup from flush reduction: Dota2 Vk +3% Rise of the Tomb Raider +3% Shadow of the Tomb Raider +2% Aztec Ruins Vk +2% Shooter Game Demo +2% Witcher3 +1% Fallout4 +1% Dark Souls3 +1% Total War Warhammer +1%