ac/surface: optimize the DCC retile map computation
This had a lot of CPU overhead when resizing windows.
The change is to:
- cache retile maps per process
- not compute retile maps when importing a buffer (it's not used)
- optimize the retile map computation in addrlib
Additionally, the DCC retile map format (uint16 or uint32) is now directly derived from the DCC sizes (this will simplify DMABUF modifiers).