Skip to content

tu, ir3, nir: Initial support for subgroup shuffle/relative shuffle

This just always lowers them to a fallback waterfall loop. If the offset is uniform, we could do better for xor/up/down using some not-yet-documented cat6 opcodes, and in compute stages if there is shared memory available we could use it to implement the shuffle.

The waterfall loop is implemented in NIR, as a generic lowering option that could (in theory) be used by other drivers.

2d48cd59 and 3d3f76f4 are taken from !14107 (merged) because we run into the same bug here.

Edited by Connor Abbott

Merge request reports