nak: implement VOTE and SHFL on SM50
Tested SHFL against dEQP-VK.subgroups.shuffle.compute.*
. I had a harder time getting the CTS ballot/vote tests to run, so I tested those against some handwritten compute shaders in vkrunner. These tests were definitely less exhaustive than CTS.
In the process of testing this I found a bug that applies to both SM50 and SM75 (mesa/mesa!26202 (closed)).