ir3: New cat3 instructions
- shrm -
(src2 >> src1) & src3
- shlm -
(src2 << src1) & src3
- shrg -
(src2 >> src1) | src3
- shlg -
(src2 << src1) | src3
- andg -
(src2 & src1) | src3
- dp2acc - part of qcom_dot8 from cl_qcom_dot_product8
Given:
SRC1 and SRC2 are i8vec2 or u8vec2 packed into low or high
half of the respective register.
SRC3 is a 32b integer
Do:
DST = dot(SRC1, SRC2) + SRC3
SRC1 and SRC2 both should be packed either in low or high halves.
TODO: correctly handle (signed)/(unsigned) and (low)/(high) modifiers.
Bit 14 controls (signed)/(unsigned)
Bit 30 controls (low)/(high)
There is (neg) for SRC3
There is (sat) for DST
- wmm
Given:
SRC1 = (x_1, x_2, x_3, x_4) - 4 consecutive registers
SRC2 = (y_1, y_2, y_3, y_4) - 4 consecutive registers
SRC3 is an immidiate in range of [0, 160]
Do:
float y_sum = y_1 + y_2 + y_3 + y_4
vec4 result = (x_1 * y_sum, x_2 * y_sum, x_3 * y_sum, x_4 * y_sum)
Starting from DST reg duplicate *result* into consecutive registers
(1 << (SRC3 / 32)) times.
- wmm.accu - same as wmm but result is added to DST registers, however the first reg in each vec4 result is overwritten instead of accumulating.
Note:
- I'm not sure that I correctly expressed in
ir3_valid_flags
that SRC3 of wmm could only be immediate. - I didn't fully defined
dp2acc