1. 25 Jan, 2020 2 commits
    • Erico Nunes's avatar
      lima/ppir: fix src read mask swizzling · ae0b8ba5
      Erico Nunes authored
      
      
      The src mask can't be calculated from the dest write_mask.
      Instead, it must be calculated from the swizzled operators of the src.
      Otherwise, liveness calculation may report incorrect live components for
      non-ssa registers.
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      Tested-by: Marge Bot <mesa/mesa!3502>
      Part-of: <mesa/mesa!3502>
      ae0b8ba5
    • Erico Nunes's avatar
      lima/ppir: split ppir_op_undef into undef and dummy again · ab36523a
      Erico Nunes authored
      
      
      Those were renamed/merged some time ago but it turns out that
      ppir_op_undef can't be shared.
      It was being used for undefined ssa operations and for read-before-write
      operations that may happen to e.g. uninitialized registers (non-ssa)
      inside a loop.
      We really don't want to reserve a register for the undef ssa case, but
      we must reserve and allocate register for the unitialized register case
      because when it happens inside a loop it may need to hold its value
      across iterations.
      
      This dummy node might be eliminated with a code refactor in ppir in case
      we are able to emit the write and allocate the ppir_reg before we emit
      the read. But a major refactor we need this to keep this code to avoid
      apparent regressions with the new liveness analysis implementation.
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      Part-of: <mesa/mesa!3502>
      ab36523a
  2. 15 Jan, 2020 2 commits
    • Erico Nunes's avatar
      lima/ppir: implement full liveness analysis for regalloc · 9bf210ba
      Erico Nunes authored
      
      
      The existing liveness analysis in ppir still ultimately relies on a
      single continuous live_in and live_out range per register and was
      observed to be the bottleneck for register allocation on complicated
      examples with several control flow blocks.
      The use of live_in and live_out ranges was fine before ppir got control
      flow, but now it ends up creating unnecessary interferences as live_in
      and live_out ranges may span across entire blocks after blocks get
      placed sequentially.
      
      This new liveness analysis implementation generates a set of live
      variables at each program point; before and after each instruction and
      beginning and end of each block.
      This is a global analysis and propagates the sets of live registers
      across blocks independently of their sequence.
      The resulting sets optimally represent all variables that cannot share a
      register at each program point, so can be directly translated as
      interferences to the register allocator.
      
      Special care has to be taken with non-ssa registers. In order to
      properly define their live range, their alive components also need to be
      tracked. Therefore ppir can't use simple bitsets to keep track of live
      registers.
      
      The algorithm uses an auxiliary set data structure to keep track of the
      live registers. The initial implementation used only trivial arrays,
      however regalloc execution time was then prohibitive (>1minute on
      Cortex-A53) on extreme benchmarks with hundreds of instructions,
      hundreds of registers and several spilling iterations, mostly due to the
      n^2 complexity to generate the interferences from the live sets. Since
      the live registers set are only a very sparse subset of all registers at
      each instruction, iterating only over this subset allows it to run very
      fast again (a couple of seconds for the same benchmark).
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      Tested-by: Marge Bot <mesa/mesa!3358>
      Part-of: <mesa/mesa!3358>
      9bf210ba
    • Erico Nunes's avatar
      lima/ppir: remove orphan load node after cloning · 7e2765fd
      Erico Nunes authored
      
      
      There are some cases in shades using control flow where the varying load
      is cloned to every block, and then the original node is left orphan.
      This is not harmful for program execution, but it complicates analysis
      for register allocation as there is now a case of writing to a register
      that is never read.
      While ppir doesn't have a dead code elimination pass for its own
      optimizations and it is not hard to detect when we cloned the last load,
      let's remove it early.
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      Part-of: <mesa/mesa!3358>
      7e2765fd
  3. 20 Dec, 2019 1 commit
    • Erico Nunes's avatar
      lima/ppir: fix lod bias src · d56710ab
      Erico Nunes authored
      ppir has some code that operates on all ppir_src variables, and for that
      uses ppir_node_get_src.
      lod bias support introduced a separate ppir_src that is inaccessible by
      that function, causing it to be missed by the compiler in some routines.
      Ultimately this caused, in some cases, a bug in const lowering:
      
        .../pp/lower.c:42: ppir_lower_const: Assertion `src != NULL' failed.
      
      This fix moves the ppir_srcs in ppir_load_texture_node together so they
      don't get missed.
      
      Fixes: 721d82cf
      
       lima/ppir: add lod-bias support
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      Tested-by: Marge Bot <mesa/mesa!3185>
      Part-of: <mesa/mesa!3185>
      d56710ab
  4. 20 Nov, 2019 1 commit
  5. 31 Oct, 2019 1 commit
  6. 28 Oct, 2019 1 commit
  7. 25 Sep, 2019 1 commit
  8. 13 Sep, 2019 2 commits
  9. 04 Sep, 2019 2 commits
  10. 24 Aug, 2019 7 commits
  11. 14 Aug, 2019 3 commits
  12. 06 Aug, 2019 1 commit
    • Erico Nunes's avatar
      lima: add summary report for shader-db · e0aeee94
      Erico Nunes authored
      
      
      Very basic summary, loops and gpir spills:fills are not updated yet and
      are only there to comply with the strings to shader-db report.py regex.
      
      For now it can be used to analyze the impact of changes in instruction
      count in both gpir and ppir.
      
      The LIMA_DEBUG=shaderdb setting can be useful to output stats on
      applications other than shader-db.
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      e0aeee94
  13. 04 Aug, 2019 1 commit
    • Erico Nunes's avatar
      lima/ppir: simplify select op lowering and scheduling · fd29c4d6
      Erico Nunes authored
      
      
      The select operation relies on the select condition coming from the
      result of the the alu scalar mult slot, in the same instruction.
      The current implementation creates a mov node to be the predecessor of
      select, and then relies on an exception during scheduling to ensure that
      both ops are inserted in the same instruction.
      
      Now that the ppir scheduler supports pipeline register dependencies,
      this can be simplified by making the mov explicitly output to the fmul
      pipeline register, and the scheduler can place it without an exception.
      
      Since the select condition can only be placed in the scalar mult slot,
      differently than a regular mov, define a separate op for it.
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      Reviewed-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      fd29c4d6
  14. 03 Aug, 2019 1 commit
  15. 31 Jul, 2019 2 commits
    • Erico Nunes's avatar
      lima/ppir: lower fdot in nir_opt_algebraic · 99c956fb
      Erico Nunes authored
      
      
      Now that we have fsum in nir, we can move fdot lowering there.
      This helps reduce ppir complexity and enables the lowered ops to be part
      of other nir optimizations in the optimization loop.
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      99c956fb
    • Erico Nunes's avatar
      lima/ppir: refactor texture code to simplify scheduler · 7f8ff686
      Erico Nunes authored
      
      
      The 'varying fetch' pp instruction deals only with coordinates, and
      'texture fetch' deals only with the sampler index.
      Previously it was not possible to clearly map ppir_op_load_coords and
      ppir_op_load_texture to pp instructions as the source coordinates were
      kept in the ppir_op_load_texture node, making this harder to maintain.
      The refactor is made with the attempt to clearly map ppir_op_load_coords
      to the 'varying fetch' and ppir_op_load_texture to the 'texture fetch'.
      The coordinates are still temporarily kept in the ppir_op_load_texture
      node as nir has both sampler and coordinates in a single instruction and
      it is only possible to output one ppir node during emit. But now after
      lowering, the sources are transferred to the (always) created
      ppir_op_load_coords node, and it should be possible to directly map them
      to their pp instructions from there onwards.
      
      Signed-off-by: Erico Nunes's avatarErico Nunes <nunes.erico@gmail.com>
      Reviewed-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      7f8ff686
  16. 18 Jul, 2019 1 commit
  17. 15 Jul, 2019 1 commit
  18. 24 Jun, 2019 3 commits
  19. 13 Jun, 2019 1 commit
  20. 27 May, 2019 1 commit
  21. 02 May, 2019 1 commit
  22. 29 Apr, 2019 1 commit
  23. 11 Apr, 2019 1 commit