d3d12: Improve perf of sw winsys path
The end goal of this MR is to improve performance of D3D12 in WSL, roughly 10x in the PixMark Piano benchmark on a discrete GPU (30fps -> 300fps). The primary problem being addressed is the last patch: display target textures shouldn't be allocated as mappable, because that means putting them in system RAM instead of VRAM, and that destroys performance.
There's a secondary problem being addressed here as well, which is that the wrong type of memory was being used for reading back textures. D3D's UPLOAD
memory type uses write-combined CPU pages, which means CPU reads from them are slow, so they shouldn't be used for readback. In order to make that happen, this MR actually starts taking the pipe/PB usages into account to allocate the correct type of memory, including nonmappable buffers, which will improve performance of things like vertex buffers.