nvk,nak: Implement VK_KHR_shader_subgroup_rotate
Closes #10685 (closed)
Using shfle directly doesn't seem possible because according to the CUDA docs, it does not support wrap-around. So instead it is implemented using the lower-to-shuffle pass.
CTS results:
./deqp-vk -n dEQP-VK.subgroups.shuffle..subgrouprotate_
Test run totals: Passed: 528/1100 (48.0%) Failed: 0/1100 (0.0%) Not supported: 572/1100 (52.0%) Warnings: 0/1100 (0.0%) Waived: 0/1100 (0.0%)
./deqp-vk -n dEQP-VK.subgroups.shuffle..subgroupclusteredrotate_
Test run totals: Passed: 528/1100 (48.0%) Failed: 0/1100 (0.0%) Not supported: 572/1100 (52.0%) Warnings: 0/1100 (0.0%) Waived: 0/1100 (0.0%)