Skip to content

tu: Implement VK_KHR_shader_integer_dot_product

Depends on !13944 (closed)

Only packed 4x8 unsigned and mixed versions are accelerated. However we should be able to do better for signed version than current NIR lowering.

Thus I separated has_dot_4x8 into has_sdot_4x8 and has_udot_4x8.

  • gen4 - has dp4acc and dp2acc, dp4acc is used to implement 4x8 dot product.
  • gen3 - has dp2acc, in OpenCL blob uses dp2acc for dot product on both get3 and gen4.
  • gen2 - unknown, lower everything.
  • gen1 - no dp2acc, lower everything. OpenCL blob doesn't advertise cl_qcom_dot_product8 but still generates code for it. The assembly is more verbose and uses yet to be documented mad32.u16 instruction.

Passes:

 dEQP-VK.spirv_assembly.instruction.compute.opsdotkhr.*
 dEQP-VK.spirv_assembly.instruction.compute.opudotkhr.*
 dEQP-VK.spirv_assembly.instruction.compute.opsudotkhr.*
 dEQP-VK.spirv_assembly.instruction.compute.opsdotaccsatkhr.*
 dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.*
 dEQP-VK.spirv_assembly.instruction.compute.opsudotaccsatkhr.*

Merge request reports