aco: Emit simpler Wave64 reductions for workgroup size <= 32
When the workgroup size is <= 32, we can get away with emitting simpler reductions.
This change has two advantages:
- Slight performance improvement for small workgroups in Wave64 mode
- Easier comparison between Wave32 and Wave64 mode shader binaries for these shaders