Utgard supports MSAA 4x, so wire it up.
RSW bits were already REd by Luc, the only remaining part was storing non-resolved buffers, reloading them (including for depth/stencil) and doing MSAA resolve.
To store non-resolved buffer we need to set mrt_pitch and mrt_bits registers in WB, and to resolve non-resolved buffer we need to reload it into individual samples and then write out with mrt_bits = 0, it's now done by lima blitter.
We also need to do resolve on transfer_map() of multi-sampled buffers, so utilize u_transfer_helper for that.
As a side fix, it turns out that our wb_reg definition wasn't correct, 'zero' isn't always zero, it's set if we need to swap channels, and it goes before mrt_bits. mrt_bits actually enables multiple MRTs, so this commit renames 'zero' to 'flags' and changes its position.
If mrt_bits == 0 and MSAA is enabled, GPU does resolve in place, to expose this functionality we set PIPE_CAP_SURFACE_SAMPLE_COUNT.