cairooverlay: Optimize premultiplication/unpremultiplication loops

Pull in video frame fields into local variables. Without this the
compiler must assume that they could've changed on every use and read
them from memory again.

This reduces the inner loop from 6 memory reads per pixels to 4, and the
number of writes stays at 3.
4 jobs for cairooverlay-premultiply in 5 minutes and 31 seconds (queued for 3 seconds)