v3d: Enable P030 format with BROADCOM_SAND128 modifier
This MR enables the P030 format at dri at gallium level, this format is used by Broadcom video decoder to export H265-10bit decoded frames with the SAND128 modifier (SAND30).
We implement the support for V3D, so a new sand30 blit is implemented that converts P030 with SAND128 modifier to an equivalent P010 format with the UIF layout. This allows sampling from H265 10-bit frames exported by the video decoder on the Raspberry Pi 4 devices.
When a DRM_FORMAT_MOD_BROADCOM_SAND128 is enabled with an imported P030 texture. The sand30 blit converts the Luma and Chroma planes to a tiled P010 format that can be sampled using gallium YUV lowerings without the interleaved 128-bytes-wide-columns.
We follow a similar approach to SAND8 blit but extracting luma and chroma components from the DRM_FORMAT_P030 format. P030 is a two plane YCbCr420 format where 3 10 bit components with 2 padding bits are packed in 4 bytes.
index 0 = Y plane, [31:0] x:Y2:Y1:Y0 2:10:10:10 little endian
index 1 = Cr:Cb plane, [63:0] x:Cr2:Cb2:Cr1:x:Cb1:Cr0:Cb0
[2:10:10:10:2:10:10:10] little endian
After the sand30_blit is done, the shadow texture is an UIF tiled texture with an R16_UNORM format for luma and R16G16_UNORM for chroma.
To reduce the number of texture-fetch operations during the blit, we read pairs of 32-bit dwords. They include 6 10-bit unorm components. And then we write 4 UNORM16 components from an uvec4 because our render targets do not support writing to UNORM16 formats.
As sampling will be done using 16bpp (luma) and 32bpp (chroma), the sand30_blit writes consider the different microtile layouts of UIF format between 64, 32 and 16 bpp.
A minor fix-up on SAND8 code is included as last commit.