Skip to content

ir3: New cat3 instructions

  • shrm - (src2 >> src1) & src3
  • shlm - (src2 << src1) & src3
  • shrg - (src2 >> src1) | src3
  • shlg - (src2 << src1) | src3
  • andg - (src2 & src1) | src3
  • dp2acc - part of qcom_dot8 from cl_qcom_dot_product8
		Given:
			SRC1 and SRC2 are i8vec2 or u8vec2 packed into low or high
			half of the respective register.
			SRC3 is a 32b integer
		Do:
			DST = dot(SRC1, SRC2) + SRC3

		SRC1 and SRC2 both should be packed either in low or high halves.

		TODO: correctly handle (signed)/(unsigned) and (low)/(high) modifiers.
			Bit 14 controls (signed)/(unsigned)
			Bit 30 controls (low)/(high)
			There is (neg) for SRC3
			There is (sat) for DST
  • wmm
		Given:
			SRC1 = (x_1, x_2, x_3, x_4) - 4 consecutive registers
			SRC2 = (y_1, y_2, y_3, y_4) - 4 consecutive registers
			SRC3 is an immidiate in range of [0, 160]

		Do:
			float y_sum = y_1 + y_2 + y_3 + y_4
			vec4 result = (x_1 * y_sum, x_2 * y_sum, x_3 * y_sum, x_4 * y_sum)

			Starting from DST reg duplicate *result* into consecutive registers
			(1 &lt;&lt; (SRC3 / 32)) times.
  • wmm.accu - same as wmm but result is added to DST registers, however the first reg in each vec4 result is overwritten instead of accumulating.

Note:

  • I'm not sure that I correctly expressed in ir3_valid_flags that SRC3 of wmm could only be immediate.
  • I didn't fully defined dp2acc

Merge request reports