ac,radeonsi: use Wave64 for HS/GS/VS, gpu_info fix
As said in the commit, Wave64 is probably better, because:
- greater chance of L0 cache hits, because more threads are assigned to the same CU
- scalar instructions are only executed once for 64 threads instead of twice
- VGPR allocation granularity is half of Wave32, so 1 Wave64 can sometimes use fewer VGPRs than 2 Wave32
- TessMark X64 with NGG culling is faster with Wave64