llvmpipe: optimise triangle setup
I took some perf traces from glretrace of the paraview pv-manysphere.trace just to look at vertex processing overheads.
Slightly surprising to me is that most of the overhead for a lot of triangle processing is in triangle setup not in draw/vertex shader execution.
This series is just a bunch of small optimisations, mostly to reduce per-triangle memory read/writes, it avoids some 64-bit memory accesses that definitely show up in the perf traces and stop showing up after this.
Edited by Dave Airlie