1. 07 May, 2019 37 commits
    • Sagar Ghuge's avatar
      intel/tools: Add unit tests for assembler · 4e828bb4
      Sagar Ghuge authored
      v1: Pass executable object from meson to test(Dylan Baker)
      v2: Ignore generated output files from git status(Matt Turner)
      Signed-off-by: Sagar Ghuge's avatarSagar Ghuge <sagar.ghuge@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      4e828bb4
    • Mika Kuoppala's avatar
      intel/tools: Initialize offset correctly for i965_asm · 1fb5ce0a
      Mika Kuoppala authored
      If we leave offset uninitialized, access to store
      will be random depending on stack value and can
      segfault.
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: Sagar Ghuge's avatarSagar Ghuge <sagar.ghuge@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      1fb5ce0a
    • Mika Kuoppala's avatar
    • Sagar Ghuge's avatar
      intel/tools: New i965 instruction assembler tool · 70308a5a
      Sagar Ghuge authored
      Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who
      mentored me through out this project.
      
      v2: Fix memory leaks and naming convention (Caio)
      v3: Fix meson changes (Dylan Baker)
      v4: Fix usage options (Matt Turner)
      Signed-off-by: Sagar Ghuge's avatarSagar Ghuge <sagar.ghuge@intel.com>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      Closes: mesa/mesa!141
      70308a5a
    • Kenneth Graunke's avatar
    • Mike Blumenkrantz's avatar
      iris: support dmabuf imports with offsets · ddd716e7
      Mike Blumenkrantz authored
      this adds support for imports where the image data begins at an offset
      from the start of the buffer, as used in h/x264
      
      fixes kwg/mesa#47Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      ddd716e7
    • Roland Scheidegger's avatar
      gallivm: fix broken 8-wide s3tc decoding · 748f6033
      Roland Scheidegger authored
      Brian noticed there was an uninitialized var for the 8-wide case and 128
      bit blocks, which made it always crash. Likewise, the 64bit block case
      had another crash bug due to type mismatch.
      Color decode (used for all s3tc formats) also had a bogus shuffle for
      this case, leading to decode artifacts.
      Fix these all up, which makes the code actually work 8-wide. Note that
      it's still not used - I've verified it works, and the generated assembly
      does look quite a bit simpler actually (20-30% less instructions for the
      s3tc decode part with avx2), however in practice it still seems to be
      sligthly slower for some unknown reason (tested with openarena) on my
      haswell box, so for now continue to split things into 4-wide vectors
      before decoding.
      Reviewed-by: Brian Paul's avatarBrian Paul <brianp@vmware.com>
      Reviewed-by: Jose Fonseca's avatarJose Fonseca <jfonseca@vmware.com>
      748f6033
    • Juan Suárez Romero's avatar
      92dba1c6
    • Juan Suárez Romero's avatar
      14a7959c
    • Vasily Khoruzhick's avatar
      lima: enable sin and cos lowering for GP · 6b46399e
      Vasily Khoruzhick authored
      GP doesn't support sin/cos natively, so we have to lower them.
      Reviewed-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      Tested-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      Signed-off-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      6b46399e
    • Vasily Khoruzhick's avatar
      nir: implement lowering for fsin and fcos · e67e4e90
      Vasily Khoruzhick authored
      Lower sin and cos using Nick's fast sin/cos approximation from
      https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648
      
      It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      Reviewed-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      Tested-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      Reviewed-by: Christian Gmeiner's avatarChristian Gmeiner <christian.gmeiner@gmail.com>
      Signed-off-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      e67e4e90
    • Rob Clark's avatar
      freedreno/ir3: move const_state to ir3_shader · b15c46e6
      Rob Clark authored
      For a6xx, we construct/emit a single VS const state used for both
      binning pass and draw pass.  So far we were mostly getting lucky that
      there were not (obvious) mismatches between the const_state (like
      different lowered immediates) between the binning and draw pass
      VS ir3_shader_variant.
      
      And I guess this situation will come up more as GS and tess is added
      into the equation.
      
      Since really everything about the const state is not specific to the
      variant, move this.  The main exception is lowered immediates, but these
      are the last to appear in the layout, and it doesn't hurt for each new
      shader variant to just append any immed's it lowers to the end of the
      immediate state.
      Signed-off-by: Rob Clark's avatarRob Clark <robdclark@chromium.org>
      b15c46e6
    • Rob Clark's avatar
      freedreno/ir3: split out const_state setup · 5690f83b
      Rob Clark authored
      Next patch moves const_state to ir3_shader, before the compile context
      is created.  So move the code around in prep to call it earlier.
      Signed-off-by: Rob Clark's avatarRob Clark <robdclark@chromium.org>
      5690f83b
    • Rob Clark's avatar
      freedreno/ir3: move immediates to const_state · 9403184d
      Rob Clark authored
      They are really part of the constant state, and it will moving things
      from ir3_shader_variant to ir3_shader if we combine them.
      Signed-off-by: Rob Clark's avatarRob Clark <robdclark@chromium.org>
      9403184d
    • Rob Clark's avatar
      freedreno/ir3: consolidate const state · 23e7a344
      Rob Clark authored
      Combine the offsets of differenet parts of the constant space with (what
      was formerly known as) ir3_driver_const_layout.  Bunch of churn, but no
      functional change.
      Signed-off-by: Rob Clark's avatarRob Clark <robdclark@chromium.org>
      23e7a344
    • Rob Clark's avatar
      freedreno/ir3: move ir3_pointer_size() · ef3eecd6
      Rob Clark authored
      Move to ir3_compiler so it doesn't depend on the compile context.  Prep
      work for moving constant state from variant (where we have compile
      context) to shader (where we do not).
      Signed-off-by: Rob Clark's avatarRob Clark <robdclark@chromium.org>
      ef3eecd6
    • Lionel Landwerlin's avatar
      vulkan/overlay-layer: fix cast errors · 2d292793
      Lionel Landwerlin authored
      Not quite sure what version of GCC/Clang produces errors (8.3.0
      locally was fine).
      
      v2: also fix an integer literal issue (Karol)
      Signed-off-by: Lionel Landwerlin's avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
      Reviewed-by: Eric Engestrom's avatarEric Engestrom <eric.engestrom@intel.com>
      2d292793
    • Samuel Iglesias Gonsálvez's avatar
      anv: fix alphaToCoverage when there is no color attachment · bc66cebc
      Samuel Iglesias Gonsálvez authored
      There are tests in CTS for alpha to coverage without a color attachment
      that are failing. This happens because we remove the shader color
      outputs when we don't have a valid color attachment for them, but when
      alpha to coverage is enabled we still want to preserve the the output
      at location 0 since we need the alpha component. In that case we will
      also need to create a null render target for RT 0.
      
      v2:
        - We already create a null rt when we don't have any, so reuse that
          for this case (Jason)
        - Simplify the code a bit (Iago)
      
      v3:
        - Take alpha to coverage from the key and don't tie this to depth-only
          rendering only, we want the same behavior if we have multiple render
          targets but the one at location 0 is not used. (Jason).
        - Rewrite commit message (Iago)
      
      v4:
        - Make sure we take into account the array length of the shader outputs,
          which we were no handling correctly either and make sure we also
          create null render targets for any invalid array entries too.
      
      v5:
        - Simplify removal of unused outputs by using rt_used[] so we don't have
          to special case alpha to coverage there too.
      
      Fixes the following CTS tests:
      dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*
      Signed-off-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Signed-off-by: Iago Toral's avatarIago Toral Quiroga <itoral@igalia.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      bc66cebc
    • Ian Romanick's avatar
      intel/compiler: Don't always require precise lowering of flrp · c8665005
      Ian Romanick authored
      No changes on any other Intel platforms.
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
      instructions in affected programs: 3271235 -> 3242419 (-0.88%)
      helped: 13636
      HURT: 90
      helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
      helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
      HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
      HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
      95% mean confidence interval for instructions value: -2.13 -2.07
      95% mean confidence interval for instructions %-change: -1.16% -1.13%
      Instructions are helped.
      
      total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
      cycles in affected programs: 70415766 -> 70282014 (-0.19%)
      helped: 12563
      HURT: 515
      helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
      helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
      HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
      HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
      95% mean confidence interval for cycles value: -10.56 -9.90
      95% mean confidence interval for cycles %-change: -0.47% -0.45%
      Cycles are helped.
      
      LOST:   0
      GAINED: 13
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      c8665005
    • Ian Romanick's avatar
      nir/algebraic: Reassociate open-coded flrp(1, b, c) · ab869261
      Ian Romanick authored
      In a previous verion of this patch, Jason commented,
      
         "Re-associating based on whether or not something has a constant
         value of 1.0 seems a bit sneaky.  I think it's well within the rules
         but it seems like something that could bite you."
      
      That is possibly true.  The reassociation will generate different
      results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
      fabs(c) approaches 0.
      
      However, i965 has done this same reassociation indirectly for years.
      We would previously allow nir_op_flrp on all pre-Gen11 hardware even
      though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
      nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
      b, c).  On Gen7+, the hardware performs the same arithmetic as
      a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
      Gen5, we would lower LRP to a sequence of instructions that implement
      a(1-c)+bc.  The lowering happens after all constant folding, so we
      would litterally generate a 1+(-1) instruction sequence in this
      scenario: one instruction to load either 1 or -1 in a register, and
      another instruction to add either -1 or 1 to it.
      
      This patch just cuts out the middle man.  Do the reassociation that
      we've always done, but do it explicitly at a time when we can benefit
      from other optimizations.
      
      A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
      ±1, c) differently" are restored by this patch.  This includes a few
      shaders in ET:QW.
      
      I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
      instructions on 35 shaders for ILK without helping any.  The helped /
      hurt cycles was about even.
      
      No changes on any other Intel platforms.
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
      instructions in affected programs: 1089851 -> 1082198 (-0.70%)
      helped: 3285
      HURT: 64
      helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
      helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
      95% mean confidence interval for instructions value: -2.32 -2.25
      95% mean confidence interval for instructions %-change: -1.16% -1.09%
      Instructions are helped.
      
      total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
      cycles in affected programs: 20004922 -> 19966558 (-0.19%)
      helped: 3012
      HURT: 477
      helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
      helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
      HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
      HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
      95% mean confidence interval for cycles value: -11.38 -10.62
      95% mean confidence interval for cycles %-change: -0.46% -0.41%
      Cycles are helped.
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      ab869261
    • Ian Romanick's avatar
      nir/flrp: Lower flrp(a, b, #c) differently · c995d1ca
      Ian Romanick authored
      This doesn't help on Intel GPUs now because we always take the
      "always_precise" path first.  It may help on other GPUs, and it does
      prevent a bunch of regressions in "intel/compiler: Don't always require
      precise lowering of flrp".
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      c995d1ca
    • Ian Romanick's avatar
      nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists · ae02622d
      Ian Romanick authored
      There is little effect on Intel GPUs now because we almost always take
      the "always_precise" path first.  It may help on other GPUs, and it does
      prevent a bunch of regressions in "intel/compiler: Don't always require
      precise lowering of flrp".
      
      No changes on any other Intel platforms.
      
      GM45 and Iron Lake had similar results. (Iron Lake shown)
      total cycles in shared programs: 188852500 -> 188852484 (<.01%)
      cycles in affected programs: 14612 -> 14596 (-0.11%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
      helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11%
      95% mean confidence interval for cycles value: -4.00 -4.00
      95% mean confidence interval for cycles %-change: -0.13% -0.09%
      Cycles are helped.
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      ae02622d
    • Ian Romanick's avatar
      nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists · 6698d861
      Ian Romanick authored
      This doesn't help on Intel GPUs now because we always take the
      "always_precise" path first.  It may help on other GPUs, and it does
      prevent a bunch of regressions in "intel/compiler: Don't always require
      precise lowering of flrp".
      
      No changes on any Intel platform.  Before a number of large rebases this
      helped cycles in a couple shaders on Iron Lake and GM45.
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      6698d861
    • Ian Romanick's avatar
      nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently · 5b908db6
      Ian Romanick authored
      No changes on any other Intel platforms.
      
      v2: Rebase on 424372e5 ("nir: Use the flrp lowering pass instead of
      nir_opt_algebraic")
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total instructions in shared programs: 8189888 -> 8153912 (-0.44%)
      instructions in affected programs: 1199037 -> 1163061 (-3.00%)
      helped: 4124
      HURT: 10
      helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9
      helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02%
      HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
      HURT stats (rel)   min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06%
      95% mean confidence interval for instructions value: -8.84 -8.56
      95% mean confidence interval for instructions %-change: -5.12% -4.77%
      Instructions are helped.
      
      total cycles in shared programs: 188606710 -> 188426964 (-0.10%)
      cycles in affected programs: 27505596 -> 27325850 (-0.65%)
      helped: 4026
      HURT: 77
      helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46
      helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85%
      HURT stats (abs)   min: 2 max: 376 x̄: 17.79 x̃: 6
      HURT stats (rel)   min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04%
      95% mean confidence interval for cycles value: -44.75 -42.87
      95% mean confidence interval for cycles %-change: -2.44% -2.17%
      Cycles are helped.
      
      LOST:   3
      GAINED: 35
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      5b908db6
    • Ian Romanick's avatar
      nir/flrp: Lower flrp(#a, #b, c) differently · 23c5501b
      Ian Romanick authored
      If the magnitudes of #a and #b are such that (b-a) won't lose too much
      precision, lower as a+c(b-a).
      
      No changes on any other Intel platforms.
      
      v2: Rebase on 424372e5 ("nir: Use the flrp lowering pass instead of
      nir_opt_algebraic")
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total instructions in shared programs: 8192503 -> 8192383 (<.01%)
      instructions in affected programs: 18417 -> 18297 (-0.65%)
      helped: 68
      HURT: 0
      helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1
      helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43%
      95% mean confidence interval for instructions value: -2.48 -1.05
      95% mean confidence interval for instructions %-change: -1.56% -0.63%
      Instructions are helped.
      
      total cycles in shared programs: 188662536 -> 188661956 (<.01%)
      cycles in affected programs: 744476 -> 743896 (-0.08%)
      helped: 62
      HURT: 0
      helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6
      helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06%
      95% mean confidence interval for cycles value: -12.37 -6.34
      95% mean confidence interval for cycles %-change: -0.48% -0.06%
      Cycles are helped.
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      23c5501b
    • Ian Romanick's avatar
      intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5 · dd7135d5
      Ian Romanick authored
      Previously lower_flrp32 was only set for vertex shaders.  Fragment
      shaders performed a(1-c)+bc lowering during code generation.
      
      The shaders with loops hurt are SIMD8 and SIMD16 shaders for a
      text-identical fragment shader.
      
      v2: Rebase on 26391cce ("intel/compiler: Lower ffma on Gen4 and
      Gen5").
      
      v3: Rebase on a004e95d ("radeonsi/nir: create si_nir_opts() helper")
      
      Iron Lake
      total instructions in shared programs: 8211385 -> 8185974 (-0.31%)
      instructions in affected programs: 2503898 -> 2478487 (-1.01%)
      helped: 9936
      HURT: 921
      helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2
      helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11%
      HURT stats (abs)   min: 1 max: 12 x̄: 3.24 x̃: 2
      HURT stats (rel)   min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89%
      95% mean confidence interval for instructions value: -2.43 -2.25
      95% mean confidence interval for instructions %-change: -1.41% -1.33%
      Instructions are helped.
      
      total cycles in shared programs: 188523186 -> 188401198 (-0.06%)
      cycles in affected programs: 71541604 -> 71419616 (-0.17%)
      helped: 11649
      HURT: 1871
      helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6
      helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25%
      HURT stats (abs)   min: 2 max: 138 x̄: 13.38 x̃: 8
      HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17%
      95% mean confidence interval for cycles value: -9.42 -8.63
      95% mean confidence interval for cycles %-change: -0.54% -0.50%
      Cycles are helped.
      
      total loops in shared programs: 852 -> 856 (0.47%)
      loops in affected programs: 0 -> 4
      helped: 0
      HURT: 4
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
      95% mean confidence interval for loops value: 1.00 1.00
      95% mean confidence interval for loops %-change: 0.00% 0.00%
      Loops are HURT.
      
      LOST:   3
      GAINED: 12
      
      GM45
      total instructions in shared programs: 5046407 -> 5033694 (-0.25%)
      instructions in affected programs: 1303584 -> 1290871 (-0.98%)
      helped: 5010
      HURT: 464
      helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2
      helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08%
      HURT stats (abs)   min: 1 max: 75 x̄: 3.39 x̃: 2
      HURT stats (rel)   min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87%
      95% mean confidence interval for instructions value: -2.45 -2.20
      95% mean confidence interval for instructions %-change: -1.40% -1.28%
      Instructions are helped.
      
      total cycles in shared programs: 128889476 -> 128812366 (-0.06%)
      cycles in affected programs: 44845402 -> 44768292 (-0.17%)
      helped: 6079
      HURT: 940
      helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8
      helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25%
      HURT stats (abs)   min: 2 max: 138 x̄: 16.01 x̃: 8
      HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17%
      95% mean confidence interval for cycles value: -11.63 -10.34
      95% mean confidence interval for cycles %-change: -0.58% -0.52%
      Cycles are helped.
      
      total loops in shared programs: 633 -> 635 (0.32%)
      loops in affected programs: 0 -> 2
      helped: 0
      HURT: 2
      
      total spills in shared programs: 60 -> 69 (15.00%)
      spills in affected programs: 54 -> 63 (16.67%)
      helped: 0
      HURT: 1
      
      total fills in shared programs: 92 -> 105 (14.13%)
      fills in affected programs: 80 -> 93 (16.25%)
      helped: 0
      HURT: 1
      
      LOST:   15
      GAINED: 15
      
      Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
      Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
      dd7135d5
    • Ian Romanick's avatar
      nir: Use the flrp lowering pass instead of nir_opt_algebraic · d41cdef2
      Ian Romanick authored
      I tried to be very careful while updating all the various drivers, but I
      don't have any of that hardware for testing. :(
      
      i965 is the only platform that sets always_precise = true, and it is
      only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
      only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
      during code generation as a(1-c)+bc.  On all other platforms 64-bit
      nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
      nir_opt_algebraic method.
      
      No changes on any other Intel platforms.
      
      v2: Add panfrost changes.
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total cycles in shared programs: 188647754 -> 188647748 (<.01%)
      cycles in affected programs: 5096 -> 5090 (-0.12%)
      helped: 3
      HURT: 0
      helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
      helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      d41cdef2
    • Ian Romanick's avatar
      nir/flrp: Add new lowering pass for flrp instructions · 158370ed
      Ian Romanick authored
      This pass will soon grow to include some optimizations that are
      difficult or impossible to implement correctly within nir_opt_algebraic.
      It also include the ability to generate strictly correct code which the
      current nir_opt_algebraic lowering lacks (though that could be changed).
      
      v2: Document the parameters to nir_lower_flrp.  Rebase on top of
      37663349 ("compiler/nir: add lowering for 16-bit flrp")
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      158370ed
    • Ian Romanick's avatar
      nir/algebraic: Pull common multiplication out of flrp arguments · dc566a03
      Ian Romanick authored
      All Intel platforms had similar results. (Skylake shown)
      total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
      instructions in affected programs: 217456 -> 212466 (-2.29%)
      helped: 1539
      HURT: 1
      helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
      helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
      95% mean confidence interval for instructions value: -3.39 -3.09
      95% mean confidence interval for instructions %-change: -3.24% -2.96%
      Instructions are helped.
      
      total cycles in shared programs: 355734320 -> 355728237 (<.01%)
      cycles in affected programs: 1851555 -> 1845472 (-0.33%)
      helped: 835
      HURT: 575
      helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
      helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
      HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
      HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
      95% mean confidence interval for cycles value: -8.50 -0.13
      95% mean confidence interval for cycles %-change: 0.48% 1.62%
      Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      dc566a03
    • Ian Romanick's avatar
      nir/algebraic: Pull common addition out of flrp arguments · a83a6e96
      Ian Romanick authored
      v2: Augment the late optimization patterns with a couple pre-ffma pass
      patterns.
      
      All Gen7+ platforms had similar results. (Skylake shown)
      total instructions in shared programs: 15342982 -> 15342485 (<.01%)
      instructions in affected programs: 56304 -> 55807 (-0.88%)
      helped: 235
      HURT: 0
      helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1
      helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74%
      95% mean confidence interval for instructions value: -2.31 -1.92
      95% mean confidence interval for instructions %-change: -1.46% -1.09%
      Instructions are helped.
      
      total cycles in shared programs: 355734740 -> 355734320 (<.01%)
      cycles in affected programs: 1028807 -> 1028387 (-0.04%)
      helped: 134
      HURT: 104
      helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8
      helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61%
      HURT stats (abs)   min: 1 max: 203 x̄: 29.06 x̃: 8
      HURT stats (rel)   min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46%
      95% mean confidence interval for cycles value: -8.51 4.98
      95% mean confidence interval for cycles %-change: -0.35% 0.39%
      Inconclusive result (value mean confidence interval includes 0).
      
      Sandy Bridge
      total instructions in shared programs: 10886815 -> 10886390 (<.01%)
      instructions in affected programs: 36883 -> 36458 (-1.15%)
      helped: 147
      HURT: 0
      helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3
      helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23%
      95% mean confidence interval for instructions value: -3.12 -2.67
      95% mean confidence interval for instructions %-change: -1.83% -1.38%
      Instructions are helped.
      
      total cycles in shared programs: 154188360 -> 154186902 (<.01%)
      cycles in affected programs: 388094 -> 386636 (-0.38%)
      helped: 90
      HURT: 58
      helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15
      helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83%
      HURT stats (abs)   min: 1 max: 684 x̄: 31.97 x̃: 10
      HURT stats (rel)   min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51%
      95% mean confidence interval for cycles value: -22.62 2.92
      95% mean confidence interval for cycles %-change: -0.68% 0.05%
      Inconclusive result (value mean confidence interval includes 0).
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total instructions in shared programs: 8221239 -> 8220357 (-0.01%)
      instructions in affected programs: 54560 -> 53678 (-1.62%)
      helped: 186
      HURT: 0
      helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3
      helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17%
      95% mean confidence interval for instructions value: -5.21 -4.28
      95% mean confidence interval for instructions %-change: -2.23% -1.72%
      Instructions are helped.
      
      total cycles in shared programs: 188654442 -> 188650364 (<.01%)
      cycles in affected programs: 1454384 -> 1450306 (-0.28%)
      helped: 204
      HURT: 0
      helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18
      helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22%
      95% mean confidence interval for cycles value: -22.38 -17.60
      95% mean confidence interval for cycles %-change: -0.67% -0.46%
      Cycles are helped.
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      a83a6e96
    • Christian Gmeiner's avatar
      glsl_to_nir: drop supports_ints · e00fa99b
      Christian Gmeiner authored
      At initial nir level all drivers are supporting ints.
      Signed-off-by: Christian Gmeiner's avatarChristian Gmeiner <christian.gmeiner@gmail.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      e00fa99b
    • Christian Gmeiner's avatar
      nir: nir_shader_compiler_options: drop native_integers · 4e110eca
      Christian Gmeiner authored
      Driver which do not support native integers should use a lowering
      pass to go from integers to floats.
      Signed-off-by: Christian Gmeiner's avatarChristian Gmeiner <christian.gmeiner@gmail.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      4e110eca
    • Alyssa Rosenzweig's avatar
      panfrost: Refactor blend descriptors · 050b934a
      Alyssa Rosenzweig authored
      This commit does a fairly large cleanup of blend descriptors, although
      there should not be any functional changes. In particular, we split
      apart the Midgard and Bifrost blend descriptors, since they are
      radically different. From there, we can identify that the Midgard
      descriptor as previously written was really two render targets'
      descriptors stuck together. From this observation, we split the Midgard
      descriptor into what a single RT actually needs. This enables us to
      correctly dump blending configuration for MRT samples on Midgard. It
      also allows the Midgard and Bifrost blend code to peacefully coexist,
      with runtime selection rather than a #ifdef. So, as a bonus, this will
      help the future Bifrost effort, eliminating one major source of
      compile-time architectural divergence.
      Signed-off-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      050b934a
    • Vasily Khoruzhick's avatar
      d4a249aa
    • Vasily Khoruzhick's avatar
      lima/gpir: implement nir_op_fmov · f4659bea
      Vasily Khoruzhick authored
      Reviewed-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      Signed-off-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      f4659bea
    • Vasily Khoruzhick's avatar
      lima: use int_to_float lowering pass · cf1ab4b9
      Vasily Khoruzhick authored
      Neither GP nor PP in Mali4x0 support integers, so utilize new pass
      and set native_integers to true for now until this flag is dropped.
      Reviewed-by: Qiang Yu's avatarQiang Yu <yuq825@gmail.com>
      Signed-off-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      cf1ab4b9
    • Vasily Khoruzhick's avatar
      nir: add int_to_float lowering pass · 443c5a3c
      Vasily Khoruzhick authored
      This new pass lowers ints and bools to floats. It allows hardware
      that doesn't have native integers (e.g. Mali4x0) use the same
      code paths as modern hardware.
      
      It uses newly introduced pass to gather SSA types and should be
      used as late as possible.
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: Christian Gmeiner's avatarChristian Gmeiner <christian.gmeiner@gmail.com>
      Signed-off-by: Vasily Khoruzhick's avatarVasily Khoruzhick <anarsoul@gmail.com>
      443c5a3c
  2. 06 May, 2019 3 commits