radeonsi: remove prim discard CS, reduce CPU overhead of si_update_shaders, si_emit_spi_map, etc. (big MR)
These are CPU overhead improvements motivated by viewperf/snx.
Changes:
- shader variant keys are added into
si_context
and they are updated in bind & set functions instead ofsi_shader_selector_key
, reducing the overhead ofsi_update_shaders
-
si_update_shaders
is cleaned up and moved to become a C++ template in si_state_draw.cpp to reduce overhead -
si_update_spi_map
is cleaned up; its loops, memcmp, and memcpy are unrolled using 33 C++ template instantiations to reduce overhead - the primitive discard compute shader is removed
- the NGG passthrough mode is removed
-
SPI_SHADER_PGM_HI_*
registers are no longer set for every shader - etc.
Tested on gfx10 (Navi10) and gfx8 (Polaris11).
Edited by Marek Olšák