util: NEON optimization for format unpack (deqp perf fix)
Since our freedreno runner farm is a fixed size but we keep wanting to test more stuff, I did a bit of looking to see if we had low hanging fruit for making deqp finish faster. It turns out b8g8r8a8_unorm reads are 5-10% of the profile, and bigger bus transactions from using SIMD can be a huge win (though not nearly as large as one might hope).
-
Should we bake the generic and optimized tables together using call_once()? -
Hook it up on armv7 too. -
SSE version? (could help BXT since it's !LLC, and we're going to have BXT in CI soon) -
Does piglit have any hot unpack functions? -
Do apps have any hot pack functions for texture upload? -
fix softpipe texturing regression from the unpack row change