# ir3: New cat3 instructions

• shrm - `(src2 >> src1) & src3`
• shlm - `(src2 << src1) & src3`
• shrg - `(src2 >> src1) | src3`
• shlg - `(src2 << src1) | src3`
• andg - `(src2 & src1) | src3`
• dp2acc - part of qcom_dot8 from cl_qcom_dot_product8
``````		Given:
SRC1 and SRC2 are i8vec2 or u8vec2 packed into low or high
half of the respective register.
SRC3 is a 32b integer
Do:
DST = dot(SRC1, SRC2) + SRC3

SRC1 and SRC2 both should be packed either in low or high halves.

TODO: correctly handle (signed)/(unsigned) and (low)/(high) modifiers.
Bit 14 controls (signed)/(unsigned)
Bit 30 controls (low)/(high)
There is (neg) for SRC3
There is (sat) for DST``````
• wmm
``````		Given:
SRC1 = (x_1, x_2, x_3, x_4) - 4 consecutive registers
SRC2 = (y_1, y_2, y_3, y_4) - 4 consecutive registers
SRC3 is an immidiate in range of [0, 160]

Do:
float y_sum = y_1 + y_2 + y_3 + y_4
vec4 result = (x_1 * y_sum, x_2 * y_sum, x_3 * y_sum, x_4 * y_sum)

Starting from DST reg duplicate *result* into consecutive registers
(1 &lt;&lt; (SRC3 / 32)) times.``````
• wmm.accu - same as wmm but result is added to DST registers, however the first reg in each vec4 result is overwritten instead of accumulating.

Note:

• I'm not sure that I correctly expressed in `ir3_valid_flags` that SRC3 of wmm could only be immediate.
• I didn't fully defined `dp2acc`