radeonsi: use LLVMBuildLoad2 for inter-stage outputs loads

The PS case was covered by the previous commit, so we can use f32
everywhere.

Reviewed-by: Mihai Preda <mhpreda@gmail.com>
99 jobs for !17361 with radeonsi_opaque_pointers in 33 seconds (queued for 38 seconds)
merge request