WIP: d3d12: Use lower_ubo_vec4 instead of rolling our own.
build_load_ubo_dxil() didn't handle straddling loads
nir_lower_ubo_vec4() can do for us based on the
offset fields that GL fills in. We can also avoid addressing
math on indirect loads by letting the GL frontend generate
Super untested, this is just my hacking at "this is where I think d3d12 should be going with ubo loads based on my understanding of the driver"