amd,radeonsi: changes in register definitions, ac/llvm, shader culling, new hw bug workarounds, etc.
This is a big MR containing many unrelated changes that I've gathered over the last couple of weeks. Some highlights:
- 2 new hw bug workarounds for Navi1x
- GS fast launch: Indexed triangle strips are re-enabled, the ES and GS thread counts are fixed, and it's disabled for small draws/instances for better performance.
- The memory coherency issue between VS/TES/GS and PS is fixed. (i.e. wait for stores before pos exports)
- NGG shader culling is improved to have a smaller shader code size. (new prefix sum computation using v_sad_u8, etc.)
- 32-bit register fields are no longer defined.
- LLVM target features moved from target machines to functions, which enables the removal of the Wave32 target machine.