Clean up and document compact 64K page table usage and uapi implications.
Some hardware requires 64K PTEs with compact page-tables for VRAM, but OTOH 64K PTEs can generally not be used for system memory since typically we're handed it in 4K granularity pages. It looks like this conflicts in the code: A BO with a possible placement in VRAM or System seems to be always bound using 64K PTEs regardless whether it's bound to system or to VRAM, which will obviously break if we're handed 4K system pages. Since compact page-tables need to be determined on a per 2M basis, systems with this requirement needs users to align each BO virtual address on a 2M boundary.
However, it seems DG2 at least (recent patch on mailing list by Matthew Auld) has the possibility to bind VRAM 64K pages in ordinary page-tables with a special bit in the PTE pointing to the first 4K chunk of the 64K page, and thus we just need to require virtual addresses to be aligned on 64K boundaries instead of 2M. We should implement this for simplicity, since it means we don't need to flip between compact- and normal page-table layout.
Finally we need to check whether this also holds for other upcoming hardware.