Skip to content
Snippets Groups Projects
  1. Apr 10, 2015
  2. Apr 09, 2015
  3. Apr 08, 2015
  4. Apr 07, 2015
  5. Apr 06, 2015
    • Iago Toral's avatar
      i965: Do not render primitives in non-zero streams then TF is disabled · 2042a2f9
      Iago Toral authored and Kenneth Graunke's avatar Kenneth Graunke committed
      Haswell hardware seems to ignore Render Stream Select bits from
      3DSTATE_STREAMOUT packet when the SOL stage is disabled even if
      the PRM says otherwise. Because of this, all primitives are sent
      down the pipeline for rasterization, which is wrong. If SOL is
      enabled, Render Stream Select is honored and primitives bound to
      non-zero streams are discarded after stream output.
      
      Since the only purpose of primives sent to non-zero streams is to
      be recorded by transform feedback, we can simply discard all geometry
      bound to non-zero streams then transform feedback is disabled
      to prevent it from ever reaching the rasterization stage.
      
      Notice that this patch introduces a small change in the behavior we
      get when a geometry shader emits more vertices than the maximum declared:
      before, a vertex that was emitted to a non-zero stream when TF was
      disabled would still count for the purposes of checking that we don't
      exceed the maximum number of output vertices declared by the shader. With
      this change, these vertices are completely ignored and won't increase
      the output vertex count, making more room for other (hopefully more
      useful) vertices.
      
      Fixes piglit test arb_gpu_shader5-emitstreamvertex_nodraw on Haswell
      and Broadwell.
      
      v2 (Ken): Drop is_haswell check in favor of doing this unconditionally.
      Broadwell needs the workaround as well, and it doesn't hurt to do it in
      general.  Also tweak comments - the Haswell PRM does actually mention
      this ("Command Reference: Instructions" page 797).
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83962
      
      
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Cc: mesa-stable@lists.freedesktop.org
      2042a2f9
    • Kenneth Graunke's avatar
      i965: Add forgotten multi-stream code to Gen8 SOL state. · f368d0fa
      Kenneth Graunke authored
      
      Fixes Piglit's arb_gpu_shader5-xfb-streams-without-invocations.
      
      Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: default avatarChris Forbes <chrisf@ijw.co.nz>
      Cc: mesa-stable@lists.freedesktop.org
      f368d0fa
    • Kenneth Graunke's avatar
      i965: Fix instanced geometry shaders on Gen8+. · f9e5dc0a
      Kenneth Graunke authored
      
      Jordan added this in commit 741782b5 for
      Gen7 platforms.  I missed this when adding the Broadwell code.
      
      Fixes Piglit's spec/arb_gpu_shader5/invocation-id-{basic,in-separate-gs}
      with MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5 set.
      
      Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: default avatarJordan Justen <jordan.l.justen@intel.com>
      Reviewed-by: default avatarChris Forbes <chrisf@ijw.co.nz>
      Cc: mesa-stable@lists.freedesktop.org
      f9e5dc0a
    • Kenneth Graunke's avatar
      i965: Free dead GLSL IR one last time. · a09c5b85
      Kenneth Graunke authored
      
      While working on NIR's memory allocation model, I realized the GLSL IR
      memory model was broken.
      
      During glCompileShader, we allocate everything out of the
      _mesa_glsl_parse_state context, and reparent it to gl_shader at the end.
      
      During glLinkProgram, we allocate everything out of a temporary context,
      then reparent it to the exec_list containing the linked IR.
      
      But during brw_link_shader - the driver's final opportunity to do
      lowering and optimization - we just allocated everything out of the
      permanent context given to us by the linker.  That memory stayed
      forever.
      
      Notably, passes like brw_fs_channel_expressions cause us to churn the
      majority of the code, so we really want to free dead IR here.
      
      Saves 125MB of memory when replaying a Dota 2 trace on Broadwell.
      
      Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: default avatarJason Ekstrand <jason.ekstrand@intel.com>
      a09c5b85
    • Kenneth Graunke's avatar
      i965: Implement SIMD16 texturing on Gen4. · 797d6061
      Kenneth Graunke authored
      
      This allows SIMD16 mode to work for a lot more programs.  Texturing is
      also more efficient in SIMD16 mode than SIMD8.  Several messages don't
      actually exist in SIMD8 mode, so we did SIMD16 messages and threw away
      half of the data.  Now we compute real data in both halves.
      
      Also, the SIMD16 "sample" message doesn't require all three coordinate
      components to exist (like the SIMD8 one), so we can shorten the message
      lengths, cutting register usage a bit.
      
      I chose to implement the visitor functionality in a separate function,
      since mixing true SIMD16 with SIMD8 code that uses SIMD16 fallbacks
      seemed like a mess.  The new code bails on a few cases where we'd
      have to do two SIMD8 messages - we just fall back to SIMD8 for now.
      
      Improves performance in "Shadowrun: Dragonfall - Director's Cut" by
      about 20% on GM45 (measured with LIBGL_SHOW_FPS=1 while standing around
      in the first mission).
      
      v2: Add ir_txf to the has_lod case (caught by Jordan Justen).
      
      Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: default avatarJordan Justen <jordan.l.justen@intel.com>
      797d6061
    • Kenneth Graunke's avatar
      i965: Use SIMD16 instead of SIMD8 on Gen4 when possible. · 8aee87fe
      Kenneth Graunke authored
      
      Gen5+ systems allow you to specify multiple shader programs - both SIMD8
      and SIMD16 - and the hardware will automatically dispatch to the most
      appropriate one, given the number of subspans to be processed.
      
      However, that is not the case on Gen4.  Instead, you program a single
      shader.  If you enable multiple dispatch modes (SIMD8 and SIMD16), the
      shader is supposed to contain a series of jump instructions at the
      beginning.  The hardware will launch the shader at a small offset,
      hitting one of the jumps.
      
      We've always thought that sounds like a pain, and weren't clear how it
      affected performance - is it worth having multiple shader types?  So,
      we never bothered with SIMD16 until now.
      
      This patch takes a simpler approach: try and compile a SIMD16 shader.
      If possible, set the no_8 flag, telling the hardware to just use the
      SIMD16 variant all the time.
      
      Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: default avatarJordan Justen <jordan.l.justen@intel.com>
      8aee87fe
    • Kenneth Graunke's avatar
      i965: Respect the no_8 flag on Gen4-5. · 108b92b1
      Kenneth Graunke authored
      
      This flag means to ignore the SIMD8 program and only use the SIMD16 one.
      It was originally meant for repdata clear shaders, but I plan to use it
      for other things on Gen4 as well.
      
      Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: default avatarJordan Justen <jordan.l.justen@intel.com>
      108b92b1
Loading