While GLSL and SPIR-V define out-of-bounds shifts to be undefined, NIR defines shifts according to SM5 semantics (using lower 5-bits as appropriate). Some backends cannot support SM5 shifts directly and currently must insert masking in the backend to compensate. This is not ideal, and on hardware backends is a performance issue (ballpark of 1% perf hit on Dolphin ubershaders on M1).
One solution would be splitting out SM5 and non-SM5 shift opcodes. The problem is that there is no obvious "canonical" representation, so lots of nir_opt_algebraic rules etc would need to be duplicated to apply properly for both kinds of backends. It would also probably require significant cross tree churn to implement.
The solution used here is a bit more crude: a flag that communicates a domain restriction. The definition of ishl/ishr/ushr in NIR does not change, we continue to define it with SM5 semantics. However, the domain is restricted, making it invalid to produce out-of-bounds shifts if the backend has opted-out. This is less heavy-handed, as it does not disrupt the SM5 majority. What it does require is auditing producers to ensure that out-of-bounds shifts are only produced for
!lower_sm5_shift backends or as a result of undefined behaviour in the original shader. This is a lot easier to do, as it only requires auditing common code. It's probably a good idea anyway.
Many of the shifts produced in common code are immediates. We can check these shifts with an assert as long as we use the
_imm versions of the shifts. Some patches here convert lots of shifts in common code to use
_imm versions, reducing the surface of the audit.
Shifts produced by GLSL-to-NIR and SPIRV-to-NIR do need SM5 semantics, they fall under the rule of "undefined behaviour in, undefined behaviour out".
Shifts produced by drivers that don't opt-out of SM5 are only used on those drivers so are fine.
Putting it together, there aren't too many changes required to opt-out of SM5 on the affected backends, and this lets those backends drop their masking code. This simplifies both backends, and winning back performance on both Asahi GL, Asahi VK, and (with the Zink change) Asahi GL-on-VK.
Ironically, Metal Shading Language is defined to use SM5 shifts.
Maybe need to deal with this in tgsi-to-nir?
This is still deeply terrible. Is there a better way to do this? I hate this.