cairooverlay: Optimize premultiplication/unpremultiplication loops
Pull in video frame fields into local variables. Without this the compiler must assume that they could've changed on every use and read them from memory again. This reduces the inner loop from 6 memory reads per pixels to 4, and the number of writes stays at 3.
cairooverlay-premultiply in 5 minutes and 31 seconds (queued for 3 seconds)4 jobs for