Commit 82b3e143 authored by Connor Abbott's avatar Connor Abbott
Browse files

bifrost: Document lack of data register alignment

Turns out the alignment is not necessary, it's just a quirk of the compiler.
parent a67bc178
......@@ -343,7 +343,7 @@ The "Data Register Write Barrier" is set when the next clause writes to the data
=== Register field
A lot of variable-latency instructions have to interact with the register file in ways that would be awkward to express in the usual manner, i.e. with the per-instruction register field. For example, the STORE instruction has to read up to 4 32-bit registers, which the usual pathways for reading a register can't handle -- they're designed for reading up to three 32-bit or 64-bit registers each cycle, and it also needs to load a 64-bit address from registers. The LOAD instruction can't write to the register until the operation has finished, possibly well after the instruction executes. For cases like these, there's a "register" field in the clause header that lets the variable-latency instruction read/write one, or a sequence of, registers, using a completely different mechanism. Since there can only be one variable-latency instruction per clause, this field isn't ambiguous about which instruction it applies to. If more than one register is being read from or written to, and the register field must be aligned to the greatest power of two less than or equal to the number of registers. For example, a two-register source could be R0-R1 (if the register field is 0), R2-R3 (register field is 2), R4-R5, etc. A three-register source could be R0-R2, R2-R4, etc. Or a four-register source could be R0-R3, R4-R7, etc.
A lot of variable-latency instructions have to interact with the register file in ways that would be awkward to express in the usual manner, i.e. with the per-instruction register field. For example, the STORE instruction has to read up to 4 32-bit registers, which the usual pathways for reading a register can't handle -- they're designed for reading up to three 32-bit or 64-bit registers each cycle, and it also needs to load a 64-bit address from registers. The LOAD instruction can't write to the register until the operation has finished, possibly well after the instruction executes. For cases like these, there's a "register" field in the clause header that lets the variable-latency instruction read/write one, or a sequence of, registers, using a completely different mechanism. Since there can only be one variable-latency instruction per clause, this field isn't ambiguous about which instruction it applies to. When the variable-latency instruction is supposed to read or write more than one register (e.g. `LOAD.v4i32`), they are read or written in sequence starting with the register specified. There are no restrictions on which register to specifiy, except that the reads/writes cannot go out of bounds of the register file (so a `LOAD.v4i32` with a data register of `R63` would result in a fault, since it would try to write `R63` through `R66`). The blob compiler will only use aligned register pairs and quads, but this doesn't seem to be necessary.
=== Dependency tracking
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment