panfrost/gallium: Optimize u_vbuf_get_minmax_index_mapped
This is bottlenecking a lot of games, but can likely be optimized heavily (e.g. via NEON). Mostly benefits panfrost, also lima.
It might not make sense to have a stupid Arm-specific optimized verison of this in core Gallium, in that case we can probably do our own thing in ~/mesa/src/panfrost/shared and lima can still derive benefit.