nir: add divergence analysis pass
This pass expects the shader to be in LCSSA form. The algorithm is based on 'The Simple Divergence Analysis' from Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira. Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS)
The first two commits make nir_to_lcssa() a general NIR pass and skip loop invariant variable so that no phi is emitted for loop invariant variables. The divergence analysis returns a boolean array with one entry for each ssa-def where 1 means 'is_divergent'. A better solution might be to integrate this as metadata.
Besides this, there are two issues to be resolved before this can be upstreamed:
- are there any (potentially) non-divergent intrinsics missing?
- are tex instructions handled sufficiently?
I had to adjust the algorithm w.r.t. phis on loop entries and exits as the original algorithm only accounts for single breaks and continues:
- on loop entries, a phi is divergent if any src value is divergent or if the loop-carried sources differ and any continue is inside divergent control flow.
- on loop exits, a phi is divergent if any src value is divergent or if any break condition is inside divergent control flow. (Note that loop invariant variables should be skipped when lowering to LCSSA, otherwise they are considered divergent as soon as the loop has a divergent break.)
v2: added patch to lower clustered reductions with cluster_size >= subgroup_size into reductions.
v3: added intrinsics for amd_shader_ballot and shader_demote, removed WIP label