turnip: VK_EXT_host_image_copy
This extension will be used to accelerate uploads and downloads of tiled images, especially block-compressed images, by avoiding an extra staging buffer and copy on the GPU. It is already used in zink, dxvk, and vkd3d-proton.
In order to implement this we need an accelerated implementation of the Adreno tiling scheme. I've added the core tiling/untiling routines to fdl, inspired by isl_tiled_memcpy.c
, and it could be useful for freedreno too. There is also documentation of the reverse-engineered scheme in a comment. Note that this doesn't implement UBWC compression/decompression, only tiling, as is expected for implementations of this extension.
I have vkoverhead patches to test the performance of fd6_tiled_memcpy
. There is also a pending VK-GL-CTS CL to more thoroughly test this.
The tiling scheme depends on a parameter called the "highest bank bit" that is programmed into registers by the kernel. This means that we ideally should get the value from the kernel. This series includes a fallback which attempts to guess what value the kernel set, but there will be kernel and virgl-renderer patches to expose it to userspace, and we shouldn't land this MR without using that uABI to avoid accidentally making the value programmed by the kernel uABI. As long as we use the new uABI here, old mesa will not care about the highest bank bit whereas newer mesa will always get it from the kernel first, so it should be safe to change the value in the kernel if we need to (e.g. to fix the a650 bug).