gallium,tc,radeonsi: add a merged bind_velems+set_vbufs call in TC, optimize set_vertex_buffers in radeonsi
The first 2 commits change st/mesa to use a merged bind_vertex_elements_state
+ set_vertex_buffers
TC call because we want to fill vertex buffers before vertex elements in st/mesa, but we want TC to bind vertex elements first. The way to do that is to add a merged TC call where the execute function can choose the order of calls because it has parameters for both.
The rest are radeonsi overhead reductions using the fact that bind_vertex_elements_state
is always called before set_vertex_buffers
.