nir,amd: add nir_io_semantics2, fix convergent+inter-shader code motion, optimize convergent loads to P0 loads on AMD
The latest version:
- doesn't add
nir_io_semantics2
- adds
nir_intrinsic_load_per_primitive_input
, even though it's not needed in the end, it's cleaner - the new IO option flag allows mixing convergent flat inputs with interpolated inputs in the same vec4
- radeonsi and RADV change how they gather which inputs are interpolated
radeonsi passes tests.
Additional NIR change: nir_opt_vectorize_io
optionally doesn't vectorize loads/stores that have different types.