Commit 5e5cf96c authored by Connor Abbott's avatar Connor Abbott
Browse files

Merge branch 'wip/lyude-fixes' into 'master'

Cleanup + assembler discoveries

See merge request !2
parents 7cbbdc96 59bcd74c
......@@ -208,18 +208,31 @@ For any remaining constants, we simply add quadwords with two constants each, pa
The clause header mainly contains information about "variable-latency" instructions like SSBO loads/stores/atomics, texture sampling, etc. that use separate functional units. There can be at most one variable-latency instruction per clause. It also indicates when execution should stop, and has some information about branching. The format of the header is as follows:
[options="header"]
|============================
|====================================
| Field | Bits
| unknown | 17
| Data Register Write Barrier | 1
| Data Register | 6
| Scoreboard dependencies | 8
| Scoreboard entry | 3
| Instruction type | 4
| unknown | 1
| Next clause instruction type | 4
| unknown | 1
|============================
| Unknown | 11
| Back to back | 1
| Not end of shader | 1
| Unknown | 2
| Elide writes | 1
| Branch conditional | 1
| Data Register Write Barrier | 1
| Data Register | 6
| Scoreboard dependencies | 8
| Scoreboard entry | 3
| Instruction type | 4
| Unknown | 1
| Next clause instruction type | 4
| Unknown | 1
|====================================
The "Back to Back" field is set to true if the execution mask of the next clause is the same as the mask of the current clause and there is no branch instruction in this clause.
The "Elide Writes" field is set to true for fragment shaders, and is intended to implement section 7.1.5 of the GLSL ES spec: "Stores to image and buffer variables performed by helper invocations have no effect on the underlying image or buffer memory.". Helper invocations are threads (invocations) corresponding to pixels in a quad that aren't actually part of the triangle, but are included to make derivatives work correctly. They're usually turned on, but they need to be masked off for GLSL-level stores.
The "Branch Conditional" bit is always set if the "Back to Back" field is set to one. Otherwise, it's either set to one to indicate that this clause is either a conditional branch or a fallthrough branch, or zero to indicate that this clause is unconditionally executed.
The "Data Register Write Barrier" is set when the next clause writes to the data register of some previous clause.
=== Register field
......@@ -313,20 +326,22 @@ The register file has four ports, two read ports, a read/write port, and a write
The format of the register part of the instruction word is as follows:
[options="header"]
|============================
| Field | Bits
| Uniform/const | 8
| Port 2 (write) | 6
| Port 3 (read/write) | 6
| Port 0 (read) | 5
| Port 1 (read) | 6
| Control | 4
|============================
|===============================
| Field | Bits
| Uniform/const | 8
| Port 2 (write) | 6
| Port 3 (read/write FMA) | 6
| Port 0 (read) | 5
| Port 1 (read) | 6
| Control | 4
|===============================
Control is what ARM calls the "register access descriptor." To save bits, if the Control field is 0, then Port 1 is disabled, and the field for Port 1 instead contains the "real" Control field in the upper 4 bits. Bit 1 is set to 1 if Port 0 is disabled, and bit 0 is reused as the high bit of Port 0, allowing you to still access all 64 registers. If the Control field isn't 0, then both Port 0 and Port 1 are always enabled. In this way, the Control field only needs to describe how Port 2 and Port 3 are configured, except for the magic 0 value, reducing the number of bits required.
ARM has one additional trick to save a bit. Port 0 only has 5 bits, so it would seem that when Port 0 and Port 1 are in use, then both can't load a register greater than 31 at the same time. But it turns out that this isn't the case. The hardware compares the register numbers being loaded, and if Port 0 is greater than Port 1, it subtracts 63 from both numbers to get the real register. This lets software encode every possible combination of registers loaded in Port 0 and Port 1, possibly requiring it to swap Port 0 and Port 1.
Additionally, if the register control field writes to Port 2 but doesn't read or write from Port 3, the compiler appers to copy the value in Port 2 over to Port 3. The reason for this is unknown.
Before we get to the actual format of the Control field, though, we need to describe one more subtlety. Each instruction's register field contains the writes for the previous instruction, but what about the writes of the last instruction in the clause? Clauses should be entirely self-contained, so we can't look at the first instruction in the next clause. The answer turns out to be that the first instruction in the clause contains the writes for the last instruction. There are a few extra values for the control field, marked "first instruction," which are only used for the first instruction of a clause. The reads are processed normally, but the writes are delayed until the very end of the clause, after the last instruction. The list of values for the control field is below:
[options="header"]
......@@ -334,15 +349,15 @@ Before we get to the actual format of the Control field, though, we need to desc
| Value | Meaning
| 1 | Write FMA with Port 2
| 3 | Write FMA with Port 2, read with Port 3
| 4 | read with Port 3
| 4 | Read with Port 3
| 5 | Write ADD with Port 2
| 6 | Write ADD with Port 2, read with Port 3
| 8 | Nothing, first instruction
| 9 | Write FMA, first instruction
| 9 | Write FMA with Port 2, first instruction
| 11 | Nothing
| 12 | read with Port 3, first instruction
| 12 | Read with Port 3, first instruction
| 13 | Write ADD with Port 2, first instruction
| 15 | Write FMA with Port 2, write ADD with Port 3
| 15 | Write ADD with Port 2, write FMA with Port 3
|============================
Unlike the other ports, the uniform/const port always loads 64 bits at a time. If an FMA or ADD instruction only needs 32 bits of data, the high 32 bits or low 32 bits are selected later in the source field, described below.
......@@ -367,10 +382,10 @@ The uniform/const port also supports loading a few "special" 64-bit constants th
[options="header"]
|============================
| Field value | Special constant
| 00 | Always zero.
| 05 | Alpha-test data (used with ATEST)
| 06 | gl_FragCoord sample position pointer
| 08-0f | Blend descriptors 0-7 (used with BLEND to indicate which output to blend with)
| 00 | Always zero.
| 05 | Alpha-test data (used with ATEST)
| 06 | gl_FragCoord sample position pointer
| 08-0f | Blend descriptors 0-7 (used with BLEND to indicate which output to blend with)
|============================
The gl_FragCoord pointer is a pointer to an array, indexed by gl_SampleID in R61 (see the varying interpolation section), of 16-bit vec2's that when loaded with a normal LOAD instruction, gives the sample (xy) position used for calculating gl_FragCoord.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment