Draft: aco: Add pass to split vectors
Adds a pass that splits vectors into individual dword components. Whenever an instruction needs vectors, they're created ad-hoc and immediately split again afterwards.
This ensures every p_create_vector/p_split_vector kills its operands, improving regalloc (esp. important for BVH traversal in RT).
I'm really not happy with this approach. Doing this as a pass as implemented here is an ok idea on paper, but there are three scenarios that quickly make it fall apart:
- Loops and spilling (RT has a lot of this. bad.). Vectors that are used in some loop but are live-through aren't considered live-through anymore, because each use produces a new set of defs from the
p_create_vector
/p_split_vector
pairs. I tried fixing this by changing the spiller (see the first commit), but I don't like this workaround either. - Scheduling and clauses. When surrounding memory instructions with
p_create_vector
/p_split_vector
pairs, reordering them in the scheduler breaks, so we lose a ton of scheduling options. I don't have a solution for this - I just made the pass do nothing on temps used in MIMG/M*BUF instrs. - Literally anything involving sub-dword stuff. I ended up just guarding every vector split to ensure it never applies to temps that are used for subdword stuff. It's not sensible to split sub-dword components into dwords, or anything else for that matter.
Most of the time I've spent on this was whack-a-mole-ing a plethora of bugs caused by these fundamental issues, and I'm not sure it's a great idea to put even more workarounds on top to band-aid the various regressions it causes, so I figured I'd put this up in case anyone has other good ideas on fixing these issues.