Skip to content
Snippets Groups Projects
  1. May 19, 2015
  2. May 18, 2015
    • Faith Ekstrand's avatar
      i965: Use NIR by default for vertex shaders on GEN8+ · 42298b05
      Faith Ekstrand authored
      
      GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
      
         total instructions in shared programs: 2742062 -> 2681339 (-2.21%)
         instructions in affected programs:     1514770 -> 1454047 (-4.01%)
         helped:                                5813
         HURT:                                  1120
      
      The gained programs are ARB vertext programs that were previously going
      through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
      programs can go through the scalar backend so they show up as "gained" in
      the shader-db results.
      
      Acked-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Acked-by: default avatarMatt Turner <mattst88@gmail.com>
      42298b05
    • Rob Clark's avatar
      freedreno: fence fix · e6f912f0
      Rob Clark authored
      
      A fence can outlive the ctx, so we shouldn't deref the ctx to get at the
      screen.  We need some updates in libdrm_freedreno API to completely
      handle fences properly, but this is at least an improvement.
      
      Signed-off-by: default avatarRob Clark <robclark@freedesktop.org>
      e6f912f0
    • Ben Widawsky's avatar
      i965: Add gen8 blend state · 8427ad91
      Ben Widawsky authored
      
      OLD:
      0x00007340:      0x00800000:    BLEND:
      0x00007344:      0x84202100:    BLEND:
      
      NEW:
      0x00007340:      0x00800000:    BLEND: Alpha blend/test
      0x00007344:      0x0000000b84202100: BLEND_ENTRY00:
                              Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha)
                              function ADD,ADD (color, alpha), Disables: ----
      0x0000734c:      0x0000000b84202100: BLEND_ENTRY01:
                              Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha)
                              function ADD,ADD (color, alpha), Disables: ----
      0x00007354:      0x0000000b84202100: BLEND_ENTRY02:
                              Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha)
                              function ADD,ADD (color, alpha), Disables: ----
      0x0000735c:      0x0000000b84202100: BLEND_ENTRY03:
                              Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha)
                              function ADD,ADD (color, alpha), Disables: ----
      0x00007364:      0x0000000b84202100: BLEND_ENTRY04:
                              Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha)
                              function ADD,ADD (color, alpha), Disables: ----
      0x0000736c:      0x0000000b84202100: BLEND_ENTRY05:
                              Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha)
                              function ADD,ADD (color, alpha), Disables: ----
      0x00007374:      0x0000000b84202100: BLEND_ENTRY06:
                              Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha)
                              function ADD,ADD (color, alpha), Disables: ----
      0x0000737c:      0x0000000b84202100: BLEND_ENTRY07:
                              Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha)
                              function ADD,ADD (color, alpha), Disables: ----
      
      v2: Line length fixes, and const usage (Topi)
      Safer initialization of name string (Topi)
      
      Signed-off-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: default avatarTopi Pohjolainen <topi.pohjolainen@intel.com>
      8427ad91
    • Ben Widawsky's avatar
      i965: Add renderbuffer surface indexes to debug · fa284d6f
      Ben Widawsky authored
      
      This patch is optional in the series. It does make the output much cleaner, but
      there is some risk.
      
      Sample output (v3):
      0x00007e80:      0x231d7000:  SURF000: 2D R8G8B8A8_UNORM  VALIGN4 HALIGN4 Y-tiled
      0x00007e84:      0x05000000:  SURF000: MOCS: 0x5 Base MIP: 0.0 (0 mips) Surface QPitch: 0
      0x00007e88:      0x009f009f:  SURF000: 160x160 [AUX_NONE]
      0x00007e8c:      0x0000027f:  SURF000: 1 slices (depth), pitch: 640
      0x00007e90:      0x00000000:  SURF000: min array element: 0, array extent 1, MULTISAMPLE_1
      0x00007e94:      0x00000000:  SURF000: x,y offset: 0,0, min LOD: 0
      0x00007e98:      0x00000000:  SURF000: AUX pitch: 0 qpitch: 0
      0x00007e9c:      0x09770000:  SURF000: Clear color: R(0)G(0)B(0)A(0)
      0x00007ea0:      0x00001000:  SURF000: 0x00001000
      0x00007ea4:      0x00000000:  SURF000: 0x00000000
      0x00007ea8:      0x00000000:  SURF000: 0x00000000
      0x00007eac:      0x00000000:  SURF000: 0x00000000
      0x00007e40:      0x234df000:  SURF001: 2D R11G11B10_FLOAT  VALIGN4 HALIGN16 Y-tiled
      0x00007e44:      0x09000000:  SURF001: MOCS: 0x9 Base MIP: 0.0 (0 mips) Surface QPitch: 0
      0x00007e48:      0x009f009f:  SURF001: 160x160 [AUX_CCS_D (Uncompressed, MULTISAMPLE_COUNT=1)]
      0x00007e4c:      0x0000027f:  SURF001: 1 slices (depth), pitch: 640
      0x00007e50:      0x00000000:  SURF001: min array element: 0, array extent 1, MULTISAMPLE_1
      0x00007e54:      0x00000000:  SURF001: x,y offset: 0,0, min LOD: 0
      0x00007e58:      0x00000001:  SURF001: AUX pitch: 0 qpitch: 0
      0x00007e5c:      0x09770000:  SURF001: Clear color: R(0)G(0)B(0)A(0)
      0x00007e60:      0x0002b000:  SURF001: 0x0002b000
      0x00007e64:      0x00000000:  SURF001: 0x00000000
      0x00007e68:      0x0002a000:  SURF001: 0x0002a000
      0x00007e6c:      0x00000000:  SURF001: 0x00000000
      
      v2: Rebased on Topi's recent series which changed around some of the gen8
      surface setup code.
      
      v3: Use ralloc_asprintf instead of asprintf to be more friendly to non-GNU
      platforms.
      
      Signed-off-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      fa284d6f
    • Ben Widawsky's avatar
      i965: Add Gen9 surface state decoding · c14bb072
      Ben Widawsky authored
      
      Gen9 surface state is very similar to the previous generation. The important
      changes here are aux mode, and the way clear colors work.
      
      NOTE: There are some things intentionally left out of this decoding.
      
      v2: Redo the string for the aux buffer type to address compressed variants.
      
      v3: Use the shift for compression enable (instead of compression mode) (Topi)
      
      Signed-off-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: default avatarTopi Pohjolainen <topi.pohjolainen@intel.com>
      c14bb072
    • Ben Widawsky's avatar
      i965: Add gen8 surface state debug info · 313abbb8
      Ben Widawsky authored
      
      AFAICT, none of the old data was wrong (the gen7 decoder), but it wa smissing a
      bunch of stuff.
      
      Adds a tick (') to denote the beginning of the surface state for easier reading.
      This will be replaced later with some better, but more risky code.
      
      OLD:
      0x00007980:      0x23016000:     SURF: 2D BRW_SURFACEFORMAT_B8G8R8A8_UNORM
      0x00007984:      0x18000000:     SURF: offset
      0x00007988:      0x00ff00ff:     SURF: 256x256 size, 0 mips, 1 slices
      0x0000798c:      0x000003ff:     SURF: pitch 1024, tiled
      0x00007990:      0x00000000:     SURF: min array element 0, array extent 1
      0x00007994:      0x00000000:     SURF: mip base 0
      0x00007998:      0x00000000:     SURF: x,y offset: 0,0
      0x0000799c:      0x09770000:     SURF:
      0x00007940:      0x231d7000:     SURF: 2D BRW_SURFACEFORMAT_R8G8B8A8_UNORM
      0x00007944:      0x78000000:     SURF: offset
      0x00007948:      0x001f001f:     SURF: 32x32 size, 0 mips, 1 slices
      0x0000794c:      0x0000007f:     SURF: pitch 128, tiled
      0x00007950:      0x00000000:     SURF: min array element 0, array extent 1
      0x00007954:      0x00000000:     SURF: mip base 0
      0x00007958:      0x00000000:     SURF: x,y offset: 0,0
      0x0000795c:      0x09770000:     SURF:
      
      NEW (v1):
      0x00007980:      0x23016000:    SURF': 2D B8G8R8A8_UNORM  VALIGN4 HALIGN4 X-tiled
      0x00007984:      0x18000000:     SURF: MOCS: 0x18 Base MIP: 0.0 (0 mips) Surface QPitch: 0
      0x00007988:      0x00ff00ff:     SURF: 256x256 [AUX_NONE]
      0x0000798c:      0x000003ff:     SURF: 1 slices (depth), pitch: 1024
      0x00007990:      0x00000000:     SURF: min array element: 0, array extent 1, MULTISAMPLE_1
      0x00007994:      0x00000000:     SURF: x,y offset: 0,0, min LOD: 0
      0x00007998:      0x00000000:     SURF: AUX pitch: 0 qpitch: 0
      0x0000799c:      0x09770000:     SURF: Clear color: ----
      0x00007940:      0x231d7000:    SURF': 2D R8G8B8A8_UNORM  VALIGN4 HALIGN4 Y-tiled
      0x00007944:      0x78000000:     SURF: MOCS: 0x78 Base MIP: 0 (0 mips) Surface QPitch: ff0000
      0x00007948:      0x001f001f:     SURF: 32x32 [AUX_NONE]
      0x0000794c:      0x0000007f:     SURF: 1 slices (depth), pitch: 128
      0x00007950:      0x00000000:     SURF: min array element: 0, array extent 1, MULTISAMPLE_1
      0x00007954:      0x00000000:     SURF: x,y offset: 0,0, min LOD: 0
      0x00007958:      0x00000000:     SURF: AUX pitch: 0 qpitch: 0
      0x0000795c:      0x09770000:     SURF: Clear color: ----
      0x00007920:      0x00007980:    BIND0: surface state address
      0x00007924:      0x00007940:    BIND1: surface state address
      
      v2: Style cleanups (Matt)
      Fix aux mode dword 7->6 (Topi)
      Use exp2 instead of pow (Matt)
      Add dwords 8-12 to the dump
      
      v3: Needed to update the surface format name getter for the change in the first
      patch in the series
      
      Signed-off-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      Cc: Matt Turner <mattst88@gmail.com>
      Reviewed-by: default avatarTopi Pohjolainen <topi.pohjolainen@intel.com>
      313abbb8
    • Ben Widawsky's avatar
      i965: Add gen7+ sampler state to batch debug · 7f0c7a5f
      Ben Widawsky authored
      
      OLD:
      0x00007e00:      0x10000000: WM SAMP0: filtering
      0x00007e04:      0x000d0000: WM SAMP0: wrapping, lod
      0x00007e08:      0x00000000: WM SAMP0: default color pointer
      0x00007e0c:      0x00000090: WM SAMP0: chroma key, aniso
      
      NEW:
      0x00007e00:      0x10000000: SAMPLER_STATE 0: Disabled = no, Base Mip: 0.0, Mip/Mag/Min Filter: NONE/NEAREST/NEAREST, LOD Bias: 0.0
      0x00007e04:      0x000d0000: SAMPLER_STATE 0: Min LOD: 0.0, Max LOD: 13.0
      0x00007e08:      0x00000000: SAMPLER_STATE 0: Border Color
      0x00007e0c:      0x00000090: SAMPLER_STATE 0: Max aniso: RATIO 2:1, TC[XYZ] Address Control: CLAMP|CLAMP|WRAP
      
      v2: Move GET_BITS macro to here (with paren protection) Ben/Topi
      Add const to the sampler pointer (Topi)
      
      Signed-off-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: default avatarTopi Pohjolainen <topi.pohjolainen@intel.com>
      7f0c7a5f
    • Ben Widawsky's avatar
      i965: Add viewport extents (gen8) to batch decode · 1fa0789a
      Ben Widawsky authored
      
      0x00007da0:      0xc1da740e: SF_CLIP VP: guardband xmin = -27.306667
      0x00007da4:      0x41da740e: SF_CLIP VP: guardband xmax = 27.306667
      0x00007da4:      0x41da740e: SF_CLIP VP: guardband ymin = -23.405714
      0x00007da8:      0xc1bb3ee7: SF_CLIP VP: guardband ymax = 23.405714
      0x00007db0:      0x00000000: SF_CLIP VP: Min extents: 0.00x0.00
      0x00007db8:      0x00000000: SF_CLIP VP: Max extents: 299.00x349.00
      
      While here, fix the wrong offsets for the guardband (I didn't check if it used
      to be valid on GEN4).
      
      v2: Remove leftover GET_BITS which belongs later in the series. (Topi)
      
      Signed-off-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: default avatarTopi Pohjolainen <topi.pohjolainen@intel.com>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      1fa0789a
    • Ben Widawsky's avatar
      i965: Add all surface types to the batch decode · e45a2925
      Ben Widawsky authored
      
      It's true that not all surfaces apply for every gen, but for the most part this
      is what we want. (The unfortunate case is when we use a valid surface, but not
      for the specific GEN).
      
      This was automated with a vim macro.
      
      v2: Shortened common forms such as R8G8B8A8->RGBA8. Note that this makes some of
      the sample output in subsequent commits slightly incorrect.
      
      v3: Use the name from the table (Ken). This requires declaring the surface
      format array as extern, and declaring the struct in the .h file.
      
      v4: Move the struct back and create a helper function to obtain the name (Ken)
      Get rid of the now useless helper in the state_dump.c
      
      Signed-off-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v3)
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      e45a2925
    • Ben Widawsky's avatar
    • Matt Turner's avatar
      i965/fs: Implement integer multiply without mul/mach. · f7df169b
      Matt Turner authored
      
      Ivybridge and Baytrail can't use mach with 2Q quarter control, so just
      do it without the accumulator. Stupid accumulator.
      
      Reviewed-by: default avatarJason Ekstrand <jason.ekstrand@intel.com>
      f7df169b
    • Matt Turner's avatar
      i965/fs: Rework compression control selection. · 0a9e3a01
      Matt Turner authored
      
      The next commit uses an add(16) with a UW destination with a stride of
      2, which needs compression control since it's writing two registers. The
      old code would have failed to set compression control correctly.
      
      Reviewed-by: default avatarJason Ekstrand <jason.ekstrand@intel.com>
      0a9e3a01
    • Matt Turner's avatar
      i965/fs: Support integer multiplication in SIMD16 on Haswell. · 4ec09c77
      Matt Turner authored
      
      Ivybridge (and presumably Baytrail) have a bug that prevents this from
      working.
      
      Reviewed-by: default avatarJason Ekstrand <jason.ekstrand@intel.com>
      4ec09c77
    • Matt Turner's avatar
      i965/fs: Add set_sechalf() method. · 0592ee45
      Matt Turner authored
      
      Used in the next commit.
      
      Reviewed-by: default avatarJason Ekstrand <jason.ekstrand@intel.com>
      0592ee45
    • Matt Turner's avatar
      i965/fs: Unrestrict constant propagation into integer multiply. · 81deefc4
      Matt Turner authored
      
      Gen8+'s MUL instruction doesn't ignore the high 16-bits of one source
      like on earlier platforms, so we can constant propagate into it without
      worry. Integer multiplies (not into the accumulator, which is done for
      imul_high) are lowered in lower_integer_multiplication(), so it's safe
      there as well.
      
      On Broadwell, fragment shaders only:
      total instructions in shared programs: 4377769 -> 4377451 (-0.01%)
      instructions in affected programs:     48064 -> 47746 (-0.66%)
      helped:                                156
      
      On Broadwell, vertex shaders only:
      total instructions in shared programs: 2858885 -> 2856313 (-0.09%)
      instructions in affected programs:     26380 -> 23808 (-9.75%)
      helped:                                134
      
      On Broadwell, vertex shaders only (with INTEL_USE_NIR=1):
      total instructions in shared programs: 2911688 -> 2865984 (-1.57%)
      instructions in affected programs:     1421715 -> 1376011 (-3.21%)
      helped:                                6186
      
      Reviewed-by: default avatarJason Ekstrand <jason.ekstrand@intel.com>
      81deefc4
    • Matt Turner's avatar
      i965/fs: Lower integer multiplication after optimizations. · 1e4e17fb
      Matt Turner authored
      
      32-bit x 32-bit integer multiplication requires multiple instructions
      until Broadwell. This patch just lets us treat the MUL instruction in
      the FS backend like it operates on Broadwell, and after optimizations
      we lower it into a sequence of instructions on older platforms.
      
      Doing this will allow us to some extra optimization on integer
      multiplies.
      
      Reviewed-by: default avatarJason Ekstrand <jason.ekstrand@intel.com>
      1e4e17fb
    • Ilia Mirkin's avatar
      gk110/ir: switch to gk104-style sched codes rather than all-in-one · ae405d42
      Ilia Mirkin authored
      
      Matches change to envydis/envyas tools.
      
      Signed-off-by: default avatarIlia Mirkin <imirkin@alum.mit.edu>
      ae405d42
    • Tapani Pälli's avatar
      glsl: add stage references for UBO uniforms · 9f4eaba3
      Tapani Pälli authored
      
      Patch marks uniforms inside UBO properly referenced by stages.
      
      Signed-off-by: default avatarTapani Pälli <tapani.palli@intel.com>
      Reviewed-by: default avatarSamuel Iglesias Gonsalvez <siglesias@igalia.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90397
      9f4eaba3
    • Iago Toral's avatar
      i965: Fix textureSize for Lod > 0 with non-mipmap filters · 845ad266
      Iago Toral authored
      
      Currently, when the MinFilter is GL_LINEAR or GL_NEAREST we hide the
      actual miplevel count from the hardware (and we avoid re-creating
      the miptree structure with all the levels), since we don't expect
      levels other than the base level to be needed. Unfortunately,
      GLSL's textureSize() function is an exception to this rule. This
      function takes a lod parameter that we need to use to return the
      size of the appropriate miplevel (if it exists). The spec only
      requires that the miplevel exists, so even if the sampler is
      configured with a linear or nearest MinFilter, as far as the user
      has uploaded miplevels for the texture, textureSize() should return
      the appropriate sizes.
      
      This patch fixes this by exposing the actual miplevel count for all
      sampling engine textures while keeping the original implementation
      for render targets (for render targets textures we do not provide
      the miplevel count but the actual LOD we are wrting to, so we
      want to make sure that we make this the base level).
      
      Fixes 28 dEQP tests in the following category:
      dEQP-GLES3.functional.shaders.texture_functions.texturesize.*
      
      Reviewed-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      845ad266
  3. May 16, 2015
Loading