1. 07 May, 2019 9 commits
    • Sagar Ghuge's avatar
      intel/tools: Add unit tests for assembler · 4e828bb4
      Sagar Ghuge authored
      v1: Pass executable object from meson to test(Dylan Baker)
      v2: Ignore generated output files from git status(Matt Turner)
      Signed-off-by: Sagar Ghuge's avatarSagar Ghuge <sagar.ghuge@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      4e828bb4
    • Mika Kuoppala's avatar
      intel/tools: Initialize offset correctly for i965_asm · 1fb5ce0a
      Mika Kuoppala authored
      If we leave offset uninitialized, access to store
      will be random depending on stack value and can
      segfault.
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: Sagar Ghuge's avatarSagar Ghuge <sagar.ghuge@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      1fb5ce0a
    • Mika Kuoppala's avatar
    • Sagar Ghuge's avatar
      intel/tools: New i965 instruction assembler tool · 70308a5a
      Sagar Ghuge authored
      Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who
      mentored me through out this project.
      
      v2: Fix memory leaks and naming convention (Caio)
      v3: Fix meson changes (Dylan Baker)
      v4: Fix usage options (Matt Turner)
      Signed-off-by: Sagar Ghuge's avatarSagar Ghuge <sagar.ghuge@intel.com>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      Closes: mesa/mesa!141
      70308a5a
    • Samuel Iglesias Gonsálvez's avatar
      anv: fix alphaToCoverage when there is no color attachment · bc66cebc
      Samuel Iglesias Gonsálvez authored
      There are tests in CTS for alpha to coverage without a color attachment
      that are failing. This happens because we remove the shader color
      outputs when we don't have a valid color attachment for them, but when
      alpha to coverage is enabled we still want to preserve the the output
      at location 0 since we need the alpha component. In that case we will
      also need to create a null render target for RT 0.
      
      v2:
        - We already create a null rt when we don't have any, so reuse that
          for this case (Jason)
        - Simplify the code a bit (Iago)
      
      v3:
        - Take alpha to coverage from the key and don't tie this to depth-only
          rendering only, we want the same behavior if we have multiple render
          targets but the one at location 0 is not used. (Jason).
        - Rewrite commit message (Iago)
      
      v4:
        - Make sure we take into account the array length of the shader outputs,
          which we were no handling correctly either and make sure we also
          create null render targets for any invalid array entries too.
      
      v5:
        - Simplify removal of unused outputs by using rt_used[] so we don't have
          to special case alpha to coverage there too.
      
      Fixes the following CTS tests:
      dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*
      Signed-off-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
      Signed-off-by: Iago Toral's avatarIago Toral Quiroga <itoral@igalia.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      bc66cebc
    • Ian Romanick's avatar
      intel/compiler: Don't always require precise lowering of flrp · c8665005
      Ian Romanick authored
      No changes on any other Intel platforms.
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
      instructions in affected programs: 3271235 -> 3242419 (-0.88%)
      helped: 13636
      HURT: 90
      helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
      helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
      HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
      HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
      95% mean confidence interval for instructions value: -2.13 -2.07
      95% mean confidence interval for instructions %-change: -1.16% -1.13%
      Instructions are helped.
      
      total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
      cycles in affected programs: 70415766 -> 70282014 (-0.19%)
      helped: 12563
      HURT: 515
      helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
      helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
      HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
      HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
      95% mean confidence interval for cycles value: -10.56 -9.90
      95% mean confidence interval for cycles %-change: -0.47% -0.45%
      Cycles are helped.
      
      LOST:   0
      GAINED: 13
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      c8665005
    • Ian Romanick's avatar
      intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5 · dd7135d5
      Ian Romanick authored
      Previously lower_flrp32 was only set for vertex shaders.  Fragment
      shaders performed a(1-c)+bc lowering during code generation.
      
      The shaders with loops hurt are SIMD8 and SIMD16 shaders for a
      text-identical fragment shader.
      
      v2: Rebase on 26391cce ("intel/compiler: Lower ffma on Gen4 and
      Gen5").
      
      v3: Rebase on a004e95d ("radeonsi/nir: create si_nir_opts() helper")
      
      Iron Lake
      total instructions in shared programs: 8211385 -> 8185974 (-0.31%)
      instructions in affected programs: 2503898 -> 2478487 (-1.01%)
      helped: 9936
      HURT: 921
      helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2
      helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11%
      HURT stats (abs)   min: 1 max: 12 x̄: 3.24 x̃: 2
      HURT stats (rel)   min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89%
      95% mean confidence interval for instructions value: -2.43 -2.25
      95% mean confidence interval for instructions %-change: -1.41% -1.33%
      Instructions are helped.
      
      total cycles in shared programs: 188523186 -> 188401198 (-0.06%)
      cycles in affected programs: 71541604 -> 71419616 (-0.17%)
      helped: 11649
      HURT: 1871
      helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6
      helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25%
      HURT stats (abs)   min: 2 max: 138 x̄: 13.38 x̃: 8
      HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17%
      95% mean confidence interval for cycles value: -9.42 -8.63
      95% mean confidence interval for cycles %-change: -0.54% -0.50%
      Cycles are helped.
      
      total loops in shared programs: 852 -> 856 (0.47%)
      loops in affected programs: 0 -> 4
      helped: 0
      HURT: 4
      HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
      HURT stats (rel)   min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
      95% mean confidence interval for loops value: 1.00 1.00
      95% mean confidence interval for loops %-change: 0.00% 0.00%
      Loops are HURT.
      
      LOST:   3
      GAINED: 12
      
      GM45
      total instructions in shared programs: 5046407 -> 5033694 (-0.25%)
      instructions in affected programs: 1303584 -> 1290871 (-0.98%)
      helped: 5010
      HURT: 464
      helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2
      helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08%
      HURT stats (abs)   min: 1 max: 75 x̄: 3.39 x̃: 2
      HURT stats (rel)   min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87%
      95% mean confidence interval for instructions value: -2.45 -2.20
      95% mean confidence interval for instructions %-change: -1.40% -1.28%
      Instructions are helped.
      
      total cycles in shared programs: 128889476 -> 128812366 (-0.06%)
      cycles in affected programs: 44845402 -> 44768292 (-0.17%)
      helped: 6079
      HURT: 940
      helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8
      helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25%
      HURT stats (abs)   min: 2 max: 138 x̄: 16.01 x̃: 8
      HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17%
      95% mean confidence interval for cycles value: -11.63 -10.34
      95% mean confidence interval for cycles %-change: -0.58% -0.52%
      Cycles are helped.
      
      total loops in shared programs: 633 -> 635 (0.32%)
      loops in affected programs: 0 -> 2
      helped: 0
      HURT: 2
      
      total spills in shared programs: 60 -> 69 (15.00%)
      spills in affected programs: 54 -> 63 (16.67%)
      helped: 0
      HURT: 1
      
      total fills in shared programs: 92 -> 105 (14.13%)
      fills in affected programs: 80 -> 93 (16.25%)
      helped: 0
      HURT: 1
      
      LOST:   15
      GAINED: 15
      
      Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
      Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
      dd7135d5
    • Ian Romanick's avatar
      nir: Use the flrp lowering pass instead of nir_opt_algebraic · d41cdef2
      Ian Romanick authored
      I tried to be very careful while updating all the various drivers, but I
      don't have any of that hardware for testing. :(
      
      i965 is the only platform that sets always_precise = true, and it is
      only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
      only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
      during code generation as a(1-c)+bc.  On all other platforms 64-bit
      nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
      nir_opt_algebraic method.
      
      No changes on any other Intel platforms.
      
      v2: Add panfrost changes.
      
      Iron Lake and GM45 had similar results. (Iron Lake shown)
      total cycles in shared programs: 188647754 -> 188647748 (<.01%)
      cycles in affected programs: 5096 -> 5090 (-0.12%)
      helped: 3
      HURT: 0
      helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
      helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      d41cdef2
    • Christian Gmeiner's avatar
      nir: nir_shader_compiler_options: drop native_integers · 4e110eca
      Christian Gmeiner authored
      Driver which do not support native integers should use a lowering
      pass to go from integers to floats.
      Signed-off-by: Christian Gmeiner's avatarChristian Gmeiner <christian.gmeiner@gmail.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      4e110eca
  2. 03 May, 2019 3 commits
  3. 02 May, 2019 1 commit
  4. 30 Apr, 2019 2 commits
  5. 29 Apr, 2019 8 commits
  6. 26 Apr, 2019 2 commits
    • Jason Ekstrand's avatar
      anv/descriptor_set: Don't fully destroy sets in pool destroy/reset · 934f1783
      Jason Ekstrand authored
      In 105002bd, we fixed a memory leak bug where we weren't properly
      destroying descriptor when destroying/resetting a descriptor pool.
      However, the only real leak that happened was that we we take a
      reference to the descriptor set layout in the descriptor set and we
      weren't dropping our reference.  Everything else in the descriptor set
      is tied to the pool itself and doesn't need to be freed on a per-set
      basis.  This commit changes the destroy/reset functions to only bother
      walking the list of sets to unref the layouts and otherwise we just
      assume that the whole-pool destroy/reset takes care of the rest.
      
      Now that we're doing more non-trivial things with descriptor sets such
      as allocating things with util_vma_heap, per-set destruction is starting
      to show up on perf traces.  This takes reset back to where it's supposed
      to be as a cheap whole-pool operation.
      Reviewed-by: Lionel Landwerlin's avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      934f1783
    • Jason Ekstrand's avatar
      anv: Better handle 32-byte alignment of descriptor set buffers · baf4802e
      Jason Ekstrand authored
      In c520f4de, we chose to align the sizes of descriptor set buffers to
      32 bytes.  We have to align the descriptor set buffer to 32B so that
      it's valid for using with push constants.  We align the size as well so
      we don't leave lots of holes with util_vma_heap_alloc.  Unfortunately,
      we were only aligning it for alloc and not for free so we were still
      creating piles of holes when we delete descriptor sets.  This causes
      terrible perf for the allocator once we've deleted piles of descriptor
      sets.
      
      This commit reworks the code so that we align the descriptor set buffer
      size to 32B for both alloc and free.  The result is that it takes the
      new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds.
      
      Fixes: c520f4de "anv: Add a concept of a descriptor buffer"
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497Reviewed-by: Lionel Landwerlin's avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      baf4802e
  7. 25 Apr, 2019 4 commits
  8. 24 Apr, 2019 9 commits
  9. 23 Apr, 2019 2 commits