aco: Emit simpler Wave64 reductions for workgroup size <= 32

Timur Kristóf requested to merge Venemo/mesa:aco_reduction_wg32 into main

When the workgroup size is <= 32, we can get away with emitting simpler reductions.

This change has two advantages:

  • Slight performance improvement for small workgroups in Wave64 mode
  • Easier comparison between Wave32 and Wave64 mode shader binaries for these shaders

