1. 30 May, 2016 1 commit
  2. 09 Apr, 2016 2 commits
  3. 16 Mar, 2016 1 commit
    • Eric Anholt's avatar
      vc4: Move discard handling to the condition flag. · 2b9f0dff
      Eric Anholt authored
      Now that the field exists in the instruction, we can make discards less
      special.  As a bonus, that means that we should be able to merge some more
      .sf instructions together when we get around to that.
      
      This causes some scheduling changes, as it allows tlb_color_reads to be
      delayed past the discard condition setup.  Since the tlb_color_read ends
      up later, this may mean performance improvements, but I haven't tested.
      
      total instructions in shared programs: 78114 -> 78035 (-0.10%)
      instructions in affected programs:     1922 -> 1843 (-4.11%)
      total estimated cycles in shared programs: 234318 -> 234329 (0.00%)
      estimated cycles in affected programs:     8200 -> 8211 (0.13%)
      2b9f0dff
  4. 16 Feb, 2016 1 commit
  5. 06 Jan, 2016 1 commit
    • Eric Anholt's avatar
      vc4: Replace the SSA-style SEL operators with conditional MOVs. · 71db7d3d
      Eric Anholt authored
      I'm moving away from QIR being SSA (since NIR is doing lots of SSA
      optimization for us now) and instead having QIR just be QPU operations
      with virtual registers.  By making our SELs be composed of two MOVs, we
      could potentially coalesce the registers for the MOV's src and dst and
      eliminate the MOV.
      
      total instructions in shared programs: 88448 -> 88028 (-0.47%)
      instructions in affected programs:     39845 -> 39425 (-1.05%)
      total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
      estimated cycles in affected programs:     162887 -> 162343 (-0.33%)
      71db7d3d
  6. 19 Dec, 2015 1 commit
    • Eric Anholt's avatar
      vc4: Do instruction scheduling on the QIR to hide texture fetch latency. · f1fb85e5
      Eric Anholt authored
      This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR.  Texture
      fetch can probably take as much as the rest of the cycles of the program,
      so it's important to hide our other cycles during it (which is hard to do
      after register allocation).  Also, we can queue up multiple texture
      requests before collecting the resulting samples, so that we keep the
      texture unit busy more of the time.
      
      High-settings openarena performance +2.35849% +/- 0.221154% (n=7).  Also
      about 2-3% on the multiarb demo.  8 piglit tests
      (ext_framebuffer_multisample accuracy depthstencil) go from failing in
      rendering to failing in register allocation, but hopefully I can fix that
      up with some better register pressure handling here.
      
      total instructions in shared programs: 87723 -> 88448 (0.83%)
      instructions in affected programs:     78411 -> 79136 (0.92%)
      total estimated cycles in shared programs: 276583 -> 246306 (-10.95%)
      estimated cycles in affected programs:     265691 -> 235414 (-11.40%)
      f1fb85e5