1. 01 Jul, 2019 1 commit
  2. 24 Jun, 2019 1 commit
  3. 18 Jun, 2019 1 commit
    • Iago Toral's avatar
      v3d: implement simultaneous peripheral access exceptions for V3D 4.1+ · 79a30543
      Iago Toral authored
      Shader-db results:
      
      total instructions in shared programs: 9117550 -> 9102719 (-0.16%)
      instructions in affected programs: 1752873 -> 1738042 (-0.85%)
      helped: 7076
      HURT: 478
      helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2
      helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07%
      HURT stats (abs)   min: 1 max: 7 x̄: 1.41 x̃: 1
      HURT stats (rel)   min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54%
      95% mean confidence interval for instructions value: -2.00 -1.92
      95% mean confidence interval for instructions %-change: -1.58% -1.50%
      Instructions are helped.
      
      total max-temps in shared programs: 1327774 -> 1327728 (<.01%)
      max-temps in affected programs: 1025 -> 979 (-4.49%)
      helped: 47
      HURT: 2
      helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
      helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17%
      95% mean confidence interval for max-temps value: -1.06 -0.82
      95% mean confidence interval for max-temps %-change: -8.89% -5.49%
      Max-temps are helped.
      Reviewed-by: Eric Anholt's avatarEric Anholt <eric@anholt.net>
      79a30543
  4. 14 Jun, 2019 1 commit
    • Iago Toral's avatar
      v3d: do not setup execute flags for else block in uniform control flow · 360b832c
      Iago Toral authored
      Either all channels executed the 'then' block, in which case all
      channels will directly jump to the 'endif' block at the end of the
      'then' block, or all channels execute the 'else' block (so no
      execution masking is necessary).
      
      Shader-db results:
      
      total instructions in shared programs: 9119238 -> 9117550 (-0.02%)
      instructions in affected programs: 401252 -> 399564 (-0.42%)
      helped: 855
      HURT: 77
      
      total uniforms in shared programs: 3022622 -> 3022605 (<.01%)
      uniforms in affected programs: 3566 -> 3549 (-0.48%)
      helped: 17
      HURT: 0
      
      total max-temps in shared programs: 1327762 -> 1327774 (<.01%)
      max-temps in affected programs: 619 -> 631 (1.94%)
      helped: 2
      HURT: 15
      Reviewed-by: Eric Anholt's avatarEric Anholt <eric@anholt.net>
      360b832c
  5. 13 Jun, 2019 1 commit
  6. 07 Jun, 2019 2 commits
  7. 06 Jun, 2019 1 commit
    • Iago Toral's avatar
      v3d: fix scheduling dependency tracking for ALU with small immediates · 09d230c6
      Iago Toral authored
      We were not accountint for small immediates in the B mux so the scheduler
      was interpreting these are regular register file accesses, which could
      lead to additional (incorrect) write-read dependencies.
      
      Shader-db changes:
      
      total instructions in shared programs: 9163664 -> 9137263 (-0.29%)
      instructions in affected programs: 3931035 -> 3904634 (-0.67%)
      helped: 12457
      HURT: 2563
      
      total max-temps in shared programs: 1325787 -> 1325597 (-0.01%)
      max-temps in affected programs: 5746 -> 5556 (-3.31%)
      helped: 186
      HURT: 16
      helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1
      helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28%
      HURT stats (abs)   min: 1 max: 3 x̄: 1.12 x̃: 1
      HURT stats (rel)   min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88%
      95% mean confidence interval for max-temps value: -1.04 -0.84
      95% mean confidence interval for max-temps %-change: -4.16% -3.07%
      Max-temps are helped.
      Reviewed-by: Eric Anholt's avatarEric Anholt <eric@anholt.net>
      09d230c6
  8. 05 Jun, 2019 1 commit
  9. 24 May, 2019 1 commit
  10. 10 May, 2019 1 commit
  11. 09 May, 2019 1 commit
  12. 07 May, 2019 2 commits
    • Ian Romanick's avatar
      nir: Use the flrp lowering pass instead of nir_opt_algebraic · d41cdef2
      Ian Romanick authored
      I tried to be very careful while updating all the various drivers, but I
      don't have any of that hardware for testing. :(
      
      i965 is the only platform that sets always_precise = true, and it is
      only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
      only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
      during code generation as a(1-c)+bc.  On all other platforms 64-bit
      nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
      nir_opt_algebraic method.
      
      No changes on any other Intel platforms.
      
      v2: Add panfrost changes.
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total cycles in shared programs: 188647754 -> 188647748 (<.01%)
      cycles in affected programs: 5096 -> 5090 (-0.12%)
      helped: 3
      HURT: 0
      helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
      helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      d41cdef2
    • Christian Gmeiner's avatar
      nir: nir_shader_compiler_options: drop native_integers · 4e110eca
      Christian Gmeiner authored
      Driver which do not support native integers should use a lowering
      pass to go from integers to floats.
      Signed-off-by: Christian Gmeiner's avatarChristian Gmeiner <christian.gmeiner@gmail.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      4e110eca
  13. 26 Apr, 2019 5 commits
  14. 18 Apr, 2019 2 commits
  15. 16 Apr, 2019 2 commits
    • Eric Anholt's avatar
      v3d: Always set up the qregs for CSD payload. · 697e2e1f
      Eric Anholt authored
      We were failing to set up payload[1] for use by LocalInvocationIndex/ID
      and shared variable accesses if gl_WorkGroupID/gl_GlobalInvocationID
      wasn't used (possibly because you only have one workgroup).  You're always
      going to use payload[1], and payload[0] is common enough and we have DCE
      in the backend to clean it up if it happens to not be used.
      697e2e1f
    • Eric Anholt's avatar
      v3d: Only look up the 3rd texture gather offset for non-arrays. · 1bc71e8b
      Eric Anholt authored
      Fixes assertion failures in the CTS since Karol's cleanup when NIR started
      noticing that we were reading an invalid component.
      
      Fixes: 5450f1c9 ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")
      1bc71e8b
  16. 14 Apr, 2019 1 commit
  17. 12 Apr, 2019 8 commits
  18. 11 Apr, 2019 1 commit
    • Eric Anholt's avatar
      v3d: Add an optimization pass for redundant flags updates. · 8f065596
      Eric Anholt authored
      Our exec masking introduces lots of redundant flags updates, and even
      without that there will be cases where NIR comparisons on the same sources
      for different reasons may generate the same comparison instruction before
      the selection.
      
      total instructions in shared programs: 6492930 -> 6460934 (-0.49%)
      total uniforms in shared programs: 2117460 -> 2115106 (-0.11%)
      total spills in shared programs: 4983 -> 4987 (0.08%)
      total fills in shared programs: 6408 -> 6416 (0.12%)
      8f065596
  19. 09 Apr, 2019 1 commit
  20. 07 Apr, 2019 1 commit
  21. 21 Mar, 2019 4 commits
    • Eric Anholt's avatar
      v3d: Remove some dead members of struct v3d_compile. · bfed0a70
      Eric Anholt authored
      These are more vc4 leftovers.
      bfed0a70
    • Eric Anholt's avatar
      v3d: Upload all of UBO[0] if any indirect load occurs. · 16f2770e
      Eric Anholt authored
      The idea was that we could skip uploading the constant-indexed uniform
      data and just upload the uniforms that are variably-indexed.  However,
      since the VS bin and render shaders may have a different set of uniforms
      used, this meant that we had to upload the UBO for each of them.  The
      first case is generally a fairly small impact (usually the uniform array
      is the most space, other than a couple of FSes in shader-db), while the
      second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms
      instead of 18k.
      
      Given that the optimization is of dubious value, has a big downside, and
      is quite a bit of code, just drop it.  No change in shader-db.  No change
      on 3DMMES2 (n=15).
      16f2770e
    • Eric Anholt's avatar
      v3d: Move constant offsets to UBO addresses into the main uniform stream. · 320e96ba
      Eric Anholt authored
      We'd end up with the constant offset in the uniform stream anyway, since
      they're bigger than small immediates.  Avoids the extra uniforms and adds
      in the shader in favor of just adding once on the CPU.
      
      shader-db:
      total instructions in shared programs: 6496865 -> 6494851 (-0.03%)
      total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
      320e96ba
    • Eric Anholt's avatar
      v3d: Rename v3d_tmu_config_data to v3d_unit_data. · c36d2793
      Eric Anholt authored
      I want to reuse this for encoding small constant UBO/SSBO offsets into the
      uniform stream to reduce the extra uniform loads and adds for the small
      constant offsets.
      c36d2793
  22. 12 Mar, 2019 1 commit