radeonsi+aco: vectorize SMEM loads up to 512 bits (!29399) · Merge requests · Mesa / mesa

Marek Olšák requested to merge mareko/mesa:vectorize-smem into main May 25, 2024

Depends on/includes !29398.

First, buffer loads that overfetch are scalarized. There are 2 types of overfetching: 1) Inner components of loaded vectors are unused, 2) A load is lowered to a power-of-two load later, loading extra components, which is not directly expressed in NIR, but can be deduced and scalarized to prevent that. The result is clean NIR where no overfetching happens.

Then, loads are (re-)vectorized by allowing either 4 bytes of overfetching due to alignment to get to the next power of two, or a 4-byte hole between 2 loads.

Initially, LLVM was supposed to get this too, but it can't reorder instructions to minimize SGPR usage and spilling.

This makes SMEM loads great again.

Admin message

radeonsi+aco: vectorize SMEM loads up to 512 bits

Merge request reports