broadcom/compiler: implement pipelining for general TMU operations
This creates the basic infrastructure to implement TMU pipelining and applies it to general TMU. Follow-up patches will expand this to texture and image/load store operations. TMU pipelining means that we don't immediately end TMU sequences, and instead, we postpone the thread switch and LDTMU (for loads) or TMUWT (for stores) until we really need to do them. For loads, we may need to flush them if another instruction reads the result of a load operation. We can detect this because in that case ntq_get_src() will not find the definition for that ssa/reg (since we have not emitted the LDTMU instructions for it yet), so when that happens, we flush all pending TMU operations and then try again to find the definition for the source. We also need to flush pending TMU operations when we reach the end of a control flow block, to prevent the case where we emit a TMU operation in a block, but then we read the result in another block possibly under control flow. It is also required to flush across barriers and discards to honor their semantics. Since this change doesn't implement pipelining for texture and image load/store, we also need to flush outstanding TMU operations if we ever have to emit one of these. This will be corrected with follow-up patches. Finally, the TMU has 3 fifos where it can queue TMU operations. These fifos have limited capacity, depending on the number of threads used to compile the shader, so we also need to ensure that we don't have too many outstanding TMU requests and flush pending TMU operations if a new TMU operation would overflow any of these fifos. While overflowing the Input and Config fifos only leads to stalls (which we want to avoid anyway), overflowing the Output fifo is incorrect and would end up with a broken shader. This means that we need to know how many TMU register writes are required to emit a TMU operation and use that information to decide if we need to flush pending TMU operations before we emit any register writes for the new TMU operation. v2: fix TMU flushing for NIR registers reads (jasuarez) Reviewed-by:Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <mesa/mesa!8825>
- src/broadcom/compiler/nir_to_vir.c 363 additions, 129 deletionssrc/broadcom/compiler/nir_to_vir.c
- src/broadcom/compiler/v3d33_tex.c 3 additions, 0 deletionssrc/broadcom/compiler/v3d33_tex.c
- src/broadcom/compiler/v3d40_tex.c 6 additions, 0 deletionssrc/broadcom/compiler/v3d40_tex.c
- src/broadcom/compiler/v3d_compiler.h 19 additions, 0 deletionssrc/broadcom/compiler/v3d_compiler.h
- src/broadcom/compiler/vir.c 2 additions, 0 deletionssrc/broadcom/compiler/vir.c