Skip to content

turnip, ir3: VK_KHR_shader_subgroup_extended_types

The main thing needed for this extension is for subgroupBroadcast() and subgroupBroadcastFirst() to work with 16-bit types. As a refresher, these are implemented by writing to a shared register while only 1 lane is active and then reading it back.

One might think that we'd be able to use 16-bit types with shared registers, but that turns out to be a false hope. The blob does use these in the shader preamble, but there seems to be a hardware bug with writing to a shared half-register when the active lane is at least 64.

So, this series implements the dumber thing of expanding to a full register when writing to the shared register and then shrinking it again when reading. It turns out that, thanks to the way ir3_context handles fixing up half destinations, this is mostly already done for us! We just have to fixup a few other things to support READ_* having half sources and then it "just works."

In case anyone cares, here's the computerator test that I used to verify that it's a HW bug, at least on A650:

@localsize 128, 1, 1
@buf 8  ; g[0]
@invocationid(r0.x) ; r0.xyz
@branchstack 1
cmps.s.eq p0.x, r0.x, 64
mov.u16u16 hr0.x, 0x41
(rpt5)nop
br !p0.x, #l2
mov.u16u16 hr48.x, hr0.x
l2:
(jp)(rpt5)nop
mov.u16u32 r1.x, hr48.x
mov.u32u32 r0.x, 0
(rpt5)nop
stib.b.untyped.1d.u32.1.imm r1.x, r0.x, 0
l1:
(jp)nop
end
nop

This returns 0 instead of the expected 0x41, but works whenever the cmps.s.eq argument (i.e. the lane) is 63 or less.

Merge request reports