- 25 Jan, 2020 2 commits
-
-
Erico Nunes authored
The src mask can't be calculated from the dest write_mask. Instead, it must be calculated from the swizzled operators of the src. Otherwise, liveness calculation may report incorrect live components for non-ssa registers. Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Tested-by: Marge Bot <mesa/mesa!3502> Part-of: <mesa/mesa!3502>
-
Erico Nunes authored
Those were renamed/merged some time ago but it turns out that ppir_op_undef can't be shared. It was being used for undefined ssa operations and for read-before-write operations that may happen to e.g. uninitialized registers (non-ssa) inside a loop. We really don't want to reserve a register for the undef ssa case, but we must reserve and allocate register for the unitialized register case because when it happens inside a loop it may need to hold its value across iterations. This dummy node might be eliminated with a code refactor in ppir in case we are able to emit the write and allocate the ppir_reg before we emit the read. But a major refactor we need this to keep this code to avoid apparent regressions with the new liveness analysis implementation. Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Part-of: <mesa/mesa!3502>
-
- 15 Jan, 2020 2 commits
-
-
Erico Nunes authored
The existing liveness analysis in ppir still ultimately relies on a single continuous live_in and live_out range per register and was observed to be the bottleneck for register allocation on complicated examples with several control flow blocks. The use of live_in and live_out ranges was fine before ppir got control flow, but now it ends up creating unnecessary interferences as live_in and live_out ranges may span across entire blocks after blocks get placed sequentially. This new liveness analysis implementation generates a set of live variables at each program point; before and after each instruction and beginning and end of each block. This is a global analysis and propagates the sets of live registers across blocks independently of their sequence. The resulting sets optimally represent all variables that cannot share a register at each program point, so can be directly translated as interferences to the register allocator. Special care has to be taken with non-ssa registers. In order to properly define their live range, their alive components also need to be tracked. Therefore ppir can't use simple bitsets to keep track of live registers. The algorithm uses an auxiliary set data structure to keep track of the live registers. The initial implementation used only trivial arrays, however regalloc execution time was then prohibitive (>1minute on Cortex-A53) on extreme benchmarks with hundreds of instructions, hundreds of registers and several spilling iterations, mostly due to the n^2 complexity to generate the interferences from the live sets. Since the live registers set are only a very sparse subset of all registers at each instruction, iterating only over this subset allows it to run very fast again (a couple of seconds for the same benchmark). Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Tested-by: Marge Bot <mesa/mesa!3358> Part-of: <mesa/mesa!3358>
-
Erico Nunes authored
There are some cases in shades using control flow where the varying load is cloned to every block, and then the original node is left orphan. This is not harmful for program execution, but it complicates analysis for register allocation as there is now a case of writing to a register that is never read. While ppir doesn't have a dead code elimination pass for its own optimizations and it is not hard to detect when we cloned the last load, let's remove it early. Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Part-of: <mesa/mesa!3358>
-
- 20 Dec, 2019 1 commit
-
-
Erico Nunes authored
ppir has some code that operates on all ppir_src variables, and for that uses ppir_node_get_src. lod bias support introduced a separate ppir_src that is inaccessible by that function, causing it to be missed by the compiler in some routines. Ultimately this caused, in some cases, a bug in const lowering: .../pp/lower.c:42: ppir_lower_const: Assertion `src != NULL' failed. This fix moves the ppir_srcs in ppir_load_texture_node together so they don't get missed. Fixes: 721d82cf lima/ppir: add lod-bias support Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Tested-by: Marge Bot <mesa/mesa!3185> Part-of: <mesa/mesa!3185>
-
- 20 Nov, 2019 1 commit
-
-
Signed-off-by:
Arno Messiaen <arnomessiaen@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com>
-
- 31 Oct, 2019 1 commit
-
-
lima: introduce ppir_op_load_coords_reg to differentiate between loading texture coordinates straight from a varying vs loading them from a register Signed-off-by:
Arno Messiaen <arnomessiaen@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com>
-
- 28 Oct, 2019 1 commit
-
-
Timothy Arceri authored
This makes it clear that it's a boolean test and not an action (eg. "empty the list"). Reviewed-by:
Eric Engestrom <eric@engestrom.ch>
-
- 25 Sep, 2019 1 commit
-
-
Vasily Khoruzhick authored
Currently we add dependecies in 3 cases: 1) One node consumes value produced by another node 2) Sequency dependencies 3) Write after read dependencies 2) and 3) only affect scheduler decisions since we still can use pipeline register if we have only 1 dependency of type 1). Add 3 dependency types and mark dependencies as we add them. Reviewed-by:
Qiang Yu <yuq825@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
- 13 Sep, 2019 2 commits
-
-
Add a ppir dummy node for nir_ssa_undef_instr, create a reg for it and mark it as undefined, so that regalloc can set it non-interfering to avoid register pressure. Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Vasily Khozuzhick <anarsoul@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com>
-
Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com>
-
- 04 Sep, 2019 2 commits
-
-
Vasily Khoruzhick authored
It can load value from varying directly as well. Also load_regs is the only op that has a source, so add src_num field to load node and set it accordingly. Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
Introduce common helper for creating movs to avoid code duplication Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
- 24 Aug, 2019 7 commits
-
-
Vasily Khoruzhick authored
This commit adds support for nir_jump_instr, if and loop nir_cf_nodes. Tested-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
Add better liveness analysis that was modelled after one in vc4. It uses live ranges and is aware of multiple blocks which is prerequisite for adding CF support Tested-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
Create ppir block for each corresponding NIR block and populate its successors. It will be used later in liveness analysis and in CF support Tested-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
We can get following from NIR: (1) r1 = r2 (2) r2 = ssa1 Note that r2 is read before it's assigned, so there's no node for it in comp->var_nodes. We need to create a dummy node in this case which sole purpose is to hold ppir_dest with reg in it. Tested-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
We need 'negate' modifier for branch condition to minimize branching. Idea is to generate following: current_block: { ...; if (!statement) branch else_block; } then_block: { ...; branch after_block; } else_block: { ... } after_block: { ... } Tested-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
ppir_lower_load() and ppir_lower_load_texture() assume that node is in the same block as its successors, fix it by cloning each ld_uni and ld_tex to every block. It also reduces register pressure since values never cross block boundaries and thus never appear in live_in or live_out of any block, so do it for varyings as well. Tested-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
Const nodes are now cloned for each user, i.e. const is guaranteed to have exactly one successor, so we can use ppir_do_one_node_to_instr() and drop insert_to_each_succ_instr() Tested-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Reviewed-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
- 14 Aug, 2019 3 commits
-
-
Vasily Khoruzhick authored
Get rid of most switch/case by using src accessors Reviewed-by:
Qiang Yu <yuq825@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
We'll need it if we want to walk through node sources Reviewed-by:
Qiang Yu <yuq825@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
Sometimes we need to walk through ppir_node sources, common accessor for all node types will simplify code a lot. Reviewed-by:
Qiang Yu <yuq825@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
- 06 Aug, 2019 1 commit
-
-
Erico Nunes authored
Very basic summary, loops and gpir spills:fills are not updated yet and are only there to comply with the strings to shader-db report.py regex. For now it can be used to analyze the impact of changes in instruction count in both gpir and ppir. The LIMA_DEBUG=shaderdb setting can be useful to output stats on applications other than shader-db. Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
- 04 Aug, 2019 1 commit
-
-
Erico Nunes authored
The select operation relies on the select condition coming from the result of the the alu scalar mult slot, in the same instruction. The current implementation creates a mov node to be the predecessor of select, and then relies on an exception during scheduling to ensure that both ops are inserted in the same instruction. Now that the ppir scheduler supports pipeline register dependencies, this can be simplified by making the mov explicitly output to the fmul pipeline register, and the scheduler can place it without an exception. Since the select condition can only be placed in the scalar mult slot, differently than a regular mov, define a separate op for it. Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
- 03 Aug, 2019 1 commit
-
-
Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
- 31 Jul, 2019 2 commits
-
-
Erico Nunes authored
Now that we have fsum in nir, we can move fdot lowering there. This helps reduce ppir complexity and enables the lowered ops to be part of other nir optimizations in the optimization loop. Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
Erico Nunes authored
The 'varying fetch' pp instruction deals only with coordinates, and 'texture fetch' deals only with the sampler index. Previously it was not possible to clearly map ppir_op_load_coords and ppir_op_load_texture to pp instructions as the source coordinates were kept in the ppir_op_load_texture node, making this harder to maintain. The refactor is made with the attempt to clearly map ppir_op_load_coords to the 'varying fetch' and ppir_op_load_texture to the 'texture fetch'. The coordinates are still temporarily kept in the ppir_op_load_texture node as nir has both sampler and coordinates in a single instruction and it is only possible to output one ppir node during emit. But now after lowering, the sources are transferred to the (always) created ppir_op_load_coords node, and it should be possible to directly map them to their pp instructions from there onwards. Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
- 18 Jul, 2019 1 commit
-
-
Treat gl_PointCoord as a system value and add the necessary bits for correct codegen. Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Reviewed-by:
Eric Anholt <eric@anholt.net>
-
- 15 Jul, 2019 1 commit
-
-
Vasily Khoruzhick authored
"unknown_2" field is actually a size of instruction that branch points to. If it's set to a smaller size than actual instruction branch behavior is not defined (and it usually wedges the GPU). Fix it by setting this field correctly. Fixes: af0de6b9 ("lima/ppir: implement discard and discard_if") Reviewed-by:
Qiang Yu <yuq825@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
- 24 Jun, 2019 3 commits
-
-
Andreas Baierl authored
Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
Andreas Baierl authored
Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
Andreas Baierl authored
Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
- 13 Jun, 2019 1 commit
-
-
Offset doesn't need to be 64-bit. This fixes compilation error with 64-bit off_t. Fixes: af0de6b9 lima/ppir: implement discard and discard_if Suggested-by:
Qiang Yu <yuq825@gmail.com> Signed-off-by:
Mateusz Krzak <kszaquitto@gmail.com> Reviewed-by:
Qiang Yu <yuq825@gmail.com> Tested-by:
Andreas Baierl <ichgeh@imkreisrum.de>
-
- 27 May, 2019 1 commit
-
-
Vasily Khoruzhick authored
This commit also adds codegen for branch since we need it for discard_if. Reviewed-by:
Qiang Yu <yuq825@gmail.com> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
- 02 May, 2019 1 commit
-
-
Erico Nunes authored
Support nir_op_ftrunc by turning it into a mov with a round to integer output modifier. Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
- 29 Apr, 2019 1 commit
-
-
Treat gl_FragCoord variable as a system value and lower the w component with a nir pass. Add the necessary bits for correct codegen. Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
- 11 Apr, 2019 1 commit
-
-
Qiang Yu authored
v2: - use renamed util_dynarray_grow_cap - use DEBUG_GET_ONCE_FLAGS_OPTION for debug flags - remove DRM_FORMAT_MOD_ARM_AGTB_MODE0 usage - compute min/max index in driver v3: - fix plbu framebuffer state calculation - fix color_16pc assemble - use nir_lower_all_source_mods for lowering neg/abs/sat - use float arrary for static GPU data - add disassemble comment for static shader code - use drm_find_modifier v4: - use lima_nir_lower_uniform_to_scalar v5: - remove nir_opt_global_to_local when rebase Cc: Rob Clark <robdclark@gmail.com> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by:
Eric Anholt <eric@anholt.net> Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Signed-off-by:
Arno Messiaen <arnomessiaen@gmail.com> Signed-off-by:
Connor Abbott <cwabbott0@gmail.com> Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Signed-off-by:
Heiko Stuebner <heiko@sntech.de> Signed-off-by:
Koen Kooi <koen@dominion.thruhere.net> Signed-off-by:
Marek Vasut <marex@denx.de> Signed-off-by:
marmeladema <xademax@gmail.com> Signed-off-by:
Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com> Signed-off-by:
Rob Herring <robh@kernel.org> Signed-off-by:
Rohan Garg <rohan@garg.io> Signed-off-by:
Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by:
Qiang Yu <yuq825@gmail.com>
-