lima: wire up MSAA 4x support

Utgard supports MSAA 4x, so wire it up.

RSW bits were already REd by Luc, the only missing part was MSAA for
depth/stencil buffer, and it turns out that MSAA 4x isn't actually free
if you need to store depth or stencil buffers. In this case it requires
4x buffer size, and for reload it's necessary to reload each sample
individually, so it's 4x memory bandwidth for depth/stencil reload with
MSAA 4x

As a side fix, it turns out that our wb_reg definition wasn't correct,
'zero' isn't always zero, it's set if we need to swap channels, and
it goes before mrt_bits. mrt_bits actually enables multiple MRTs - blob
sets it to 0xf for depth/stencil reload with MSAA enabled, and mrt_pitch
is set to mrt_pitch (in bytes), so rename zero to flags and change its
order.

Fixes dEQP-GLES2.functional.multisample.*

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
75 jobs for !13963 with lima-msaa in 2 minutes and 57 seconds (queued for 9 seconds)
latest merge request