ac: improve LS->HS LDS allocation, tune clear/copy_buffer performance, replace "TC" with "L2", minor tweaks
- WAVES_PER_SH programming is fixed for gfx12
- LDS for LS->HS varyings is not allocated for any varyings that are passed via VGPRs. Previously, all varyings had to be passed via VGPRs for LDS not to be allocated.
- Add a DCC image store support failure condition into
ac_prepare_compute_blit
. -
ac_prepare_cs_clear_copy_buffer
is tuned to have optimal performance for the top GPUs of all generations. The difference in performance compared to what we had before is tremendous. - Change most of "TC" to "L2" in comments.