This part is a bit difficult, as I think the compiler is doing something
slightly different here then I expected. Before I begin, the term
"Immediate slot" here refers to a single 64 bit immediate. I've taken to
calling it a slot as the compiler may or may not use it for storing 2
32bit immediates, or a single 64 bit immediate, while additionally
attempting to reuse immediate slots whenever possible.
Mainly: it looks like that when the compiler starts assigning immediates
to immediate slots, if an instruction's stages only end up using half of
a single immediate slot, it keeps the immediate slot as "pending" and
doesn't actually attempt to assign it to a constant index until the
entire clause has finished, where it may reuse the slot if there's
another instruction later in the clause that uses one of the immediates
in the pending immediate slot, along with an additional immediate. I had
assumed this would have been done the opposite way: where we would
assign an immediate slot an index immediately (regardless of whether or
not it has more space for immediates), then potentially add another
immediate into the slot for a later instruction in order to reuse it.
Basically: it's just that the compiler is assigning immediate slots in
reverse to the order we expected.