tu: Implement VK_KHR_shader_integer_dot_product
Depends on !13944 (closed)
Only packed 4x8 unsigned and mixed versions are accelerated. However we should be able to do better for signed version than current NIR lowering.
Thus I separated has_dot_4x8
into has_sdot_4x8
and has_udot_4x8
.
- gen4 - has dp4acc and dp2acc, dp4acc is used to implement 4x8 dot product.
- gen3 - has dp2acc, in OpenCL blob uses dp2acc for dot product on both get3 and gen4.
- gen2 - unknown, lower everything.
- gen1 - no dp2acc, lower everything. OpenCL blob doesn't advertise
cl_qcom_dot_product8 but still generates code for it.
The assembly is more verbose and uses yet to be documented
mad32.u16
instruction.
Passes:
dEQP-VK.spirv_assembly.instruction.compute.opsdotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opudotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsudotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsdotaccsatkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsudotaccsatkhr.*