radeonsi: use a C++ template to decrease draw_vbo overhead by 13 %
With GALLIUM_THREAD=0 to disable draw merging.
piglit/drawoverhead:
- Before: DrawElements ( 1 VBO| 0 UBO| 0 ) w/ no state change, 8736
- After: DrawElements ( 1 VBO| 0 UBO| 0 ) w/ no state change, 10059
It generates unique si_draw_vbo variants for these parameters: GFX version, has_tess, has_GS, NGG, prim_discard_cs_allowed.