-
Use a single swizzled tile per colorbuf (and per thread) to avoid accumulating large amounts of cached swizzled data. Now that the SSE3 code has been merged to master, the performance delta of this change is minimal, the main benefit is reduced memory usage due to no longer keeping swizzled copies of render targets. It's clear from the performance of the in-place version of this code that there is still quite a bit of time being spent swizzling & unswizzling, but it's not clear exactly how to reduce that.
2f6d47a7