1. 18 Jul, 2015 9 commits
    • Ilia Mirkin's avatar
      nvc0/ir: fix txq on indirect samplers · 20e484af
      Ilia Mirkin authored
      
      
      Signed-off-by: Ilia Mirkin's avatarIlia Mirkin <imirkin@alum.mit.edu>
      Cc: mesa-stable@lists.freedesktop.org
      20e484af
    • Abdiel Janulgue's avatar
      i965: Disable resource streamer in BLORP · 670914ea
      Abdiel Janulgue authored
      
      
      Switch off hardware-generated binding tables and gather push
      constants in the blorp. Blorp requires only a minimal set of
      simple constants. There is no need for the extra complexity
      to program a gather table entry into the pipeline.
      
      Cc: kenneth@whitecape.org
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
      670914ea
    • Abdiel Janulgue's avatar
      i965: Upload binding tables in hw-generated binding table format. · fc65b6eb
      Abdiel Janulgue authored
      
      
      When hardware-generated binding tables are enabled, use the hw-generated
      binding table format when uploading binding table state.
      
      Normally, the CS will will just consume the binding table pointer commands
      as pipelined state. When the RS is enabled however, the RS flushes whatever
      edited surface state entries of our on-chip binding table to the binding
      table pool before passing the command on to the CS.
      
      Note that the the binding table pointer offset is relative to the binding table
      pool base address when resource streamer instead of the surface state base address.
      
      v2: Fix possible buffer overflow when allocating a chunk out of the
          hw-binding table pool (Ken).
      v3: Remove extra newline and add missing brace around if-statement (Matt).
      v4: Fix broken INTEL_DEBUG=shader_time for hw-generated binding tables.
          Document PRM WaStateBindingTableOverfetch workaround.
      
      Cc: kenneth@whitecape.org
      Cc: mattst88@gmail.com
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
      fc65b6eb
    • Abdiel Janulgue's avatar
      i965: Implement interface to edit binding table entries · 2133980b
      Abdiel Janulgue authored
      
      
      Unlike normal software binding tables where the driver has to manually
      generate and fill a binding table array which are then uploaded to the
      hardware, the resource streamer instead presents the driver with an option
      to fill out slots for individual binding table indices. The hardware
      accumulates the state for these combined edits which it then automatically
      flushes to a binding table pool when the binding table pointer state
      command is invoked.
      
      v2: Clarify binding table edit bit aligment (Topi).
      v3: Make comments and function names more clearer (Ken).
      
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
      2133980b
    • Abdiel Janulgue's avatar
      i965: Enable hardware-generated binding tables on render path. · 19075648
      Abdiel Janulgue authored
      
      
      This patch implements the binding table enable command which is also
      used to allocate a binding table pool where where hardware-generated
      binding table entries are flushed into. Each binding table offset in
      the binding table pool is unique per each shader stage that are
      enabled within a batch.
      
      Also insert the required brw_tracked_state objects to enable
      hw-generated binding tables in normal render path.
      
      v2: - Use MOCS in binding table pool alloc for GEN8
          - Fix spurious offset when allocating binding table pool entry
            and start from zero instead.
      v3: - Include GEN8 fix for spurious offset above.
      v4: - Fixup wrong packet length in enable/disable hw-binding table
            for GEN8 (Ville).
          - Don't invoke HW-binding table disable command when we dont
            have resource streamer (Chris).
      v5: - Reorder the state cache invalidate flush so it happens in-between
            enabling hw-generated binding tables and the previous sw-binding
            table GPU state (Chris).
      v6: - Do the same fix in v5 for gen7_disable_hw_binding_tables().
          - Adhere to coding guidelines and make comments more informative.
      
      Cc: kenneth@whitecape.org
      Cc: syrjala@sci.fi
      Cc: chris@chris-wilson.co.uk
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
      19075648
    • Abdiel Janulgue's avatar
      i965: Enable resource streamer for the batchbuffer · 090529af
      Abdiel Janulgue authored
      
      
      Check first if the hardware and kernel supports resource streamer. If this
      is allowed, tell the kernel to enable the resource streamer enable bit on
      MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER
      execbuffer flags.
      
      v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel
            supports RS (Ken).
          - Add brw_device_info::has_resource_streamer and toggle it for
            Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken).
      v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel.
      v4: - Always inspect the getparam.value (Chris Wilson).
      v5: - Fold redundant devinfo->has_resource_streamer check in context create
            into init screen.
      
      Cc: kenneth@whitecape.org
      Cc: chris@chris-wilson.co.uk
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
      090529af
    • Abdiel Janulgue's avatar
      i965: Define HW-binding table and resource streamer control opcodes · ccf9598a
      Abdiel Janulgue authored
      
      
      v2: Use macros for HW binding table edits (Topi)
      v3: Add Broadwell support.
      v4: Make hardware binding table bit definitions even more clearer (Ken)
      
      Cc: kenneth@whitecape.org
      Reviewed-by: Topi Pohjolainen's avatarTopi Pohjolainen <topi.pohjolainen@intel.com>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
      ccf9598a
    • Emma Anholt's avatar
      vc4: Switch to using a separate ioctl for making shaders. · ff7896a3
      Emma Anholt authored
      This gives the kernel a chance to validate and lock down the data,
      without having to deal with mmap zapping.
      
      With this, GLBenchmark stops on a texture relocations, because we'd
      recycled a shader BO as another shader and failed to revalidate, since we
      weren't clearing the cached validation state on mmap faults.
      ff7896a3
    • Roland Scheidegger's avatar
      mesa: fix up some texture error checks · e42cfe5d
      Roland Scheidegger authored
      
      
      In particular, we were incorrectly accepting s3tc (and lots of others)
      for CompressedTexSubImage3D (but not CompressedTexImage3D) calls with 3d
      targets. At this time, the only allowed formats for these calls are the
      bptc ones, since none of the specific extensions allow it (astc hdr would).
      Also, fix up a bug in _mesa_target_can_be_compressed - 3d target needs to
      be allowed for bptc formats.
      
      Reviewed-by: Brian Paul's avatarBrian Paul <brianp@vmware.com>
      e42cfe5d
  2. 17 Jul, 2015 15 commits
  3. 16 Jul, 2015 16 commits
    • Ben Widawsky's avatar
      Revert "i965: Push miptree tiling request into flags" · ef42352f
      Ben Widawsky authored
      This reverts commit 51e8d549.
      ef42352f
    • Ben Widawsky's avatar
      i965: Push miptree tiling request into flags · 51e8d549
      Ben Widawsky authored
      
      
      With the last few patches a way was provided to influence lower layer miptree
      layout and allocation decisions via flags (replacing bools). For simplicity, I
      chose not to touch the tiling requests because the change was slightly less
      mechanical than replacing the bools.
      
      The goal is to organize the code so we can continue to add new parameters and
      tiling types while minimizing risk to the existing code, and not having to
      constantly add new function parameters.
      
      v2: Rebased on Anuj's recent Yf/Ys changes
      Fix non-msrt MCS allocation (was only happening in gen8 case before)
      
      v3: small fix in assertion requested by Chad
      
      Signed-off-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v2)
      Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v2)
      Reviewed-by: Chad Versace <chad.versace@intel.com> (v2)
      51e8d549
    • Francisco Jerez's avatar
      i965/fs: Factor out universally broken calculation of the register component size. · 4bddd82b
      Francisco Jerez authored
      
      
      This in principle simple calculation was being open-coded in a number
      of places (in a series I haven't yet sent for review there will be a
      couple more), all of them were subtly broken in one way or another:
      None of them were handling the HW_REG case correctly as pointed out by
      Connor, and fs_inst::regs_read() was handling the stride=0 case rather
      naively.  This patch solves both problems and factors out the
      calculation as a new fs_reg method.
      
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason.ekstrand@intel.com>
      4bddd82b
    • Francisco Jerez's avatar
      i965: Implement nir_op_uadd_carry and _usub_borrow without accumulator. · b00cd6e4
      Francisco Jerez authored
      
      
      This gets rid of two no16() fall-backs and should allow better
      scheduling of the generated IR.  There are no uses of usubBorrow() or
      uaddCarry() in shader-db so no changes are expected.  However the
      "arb_gpu_shader5/execution/built-in-functions/fs-usubBorrow" and
      "arb_gpu_shader5/execution/built-in-functions/fs-uaddCarry" piglit
      tests go from 40 to 28 instructions.  The reason is that the plain ADD
      instruction can easily be CSE'ed with the original addition, and the
      b2i negation can easily be propagated into the source modifier of
      another instruction, so effectively both operations are performed with
      just one instruction.
      
      v2: Rely on carry_to_arith() and borrow_to_arith() to lower these
          (Ilia Mirkin).
      
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      b00cd6e4
    • Francisco Jerez's avatar
      i965: Implement b2f and b2i using negation. · 3ee2daf2
      Francisco Jerez authored
      
      
      Booleans are represented as 0/-1 on modern hardware which means we can
      just negate them to convert them into a numeric type.  Negation has
      the benefit that it can be implemented using a source modifier which
      can easily be propagated into some other instruction.  shader-db
      results on HSW:
      
      total instructions in shared programs: 6349082 -> 6346693 (-0.04%)
      instructions in affected programs:     40948 -> 38559 (-5.83%)
      helped:                                123
      HURT:                                  1
      GAINED:                                1
      LOST:                                  0
      
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      3ee2daf2
    • Marek Olšák's avatar
      8fba933c
    • Marek Olšák's avatar
      gallium: add interface for writable shader images · 05a12c53
      Marek Olšák authored
      
      
      PIPE_CAPs will be added some other time.
      
      Reviewed-by: Ilia Mirkin's avatarIlia Mirkin <imirkin@alum.mit.edu>
      05a12c53
    • Marek Olšák's avatar
    • Marek Olšák's avatar
      gallium: add BIND flags for R/W buffers and images · f9f79d29
      Marek Olšák authored
      
      
      PIPE_CAPs and TGSI support will be added later. The TGSI support should be
      straightforward. We only need to split TGSI_FILE_RESOURCE into TGSI_FILE_IMAGE
      and TGSI_FILE_BUFFER, though duplicating all opcodes shouldn't be necessary.
      
      The idea is:
      * ARB_shader_image_load_store should use set_shader_images.
      * ARB_shader_storage_buffer_object should use set_shader_buffers(slots 0..M-1)
        if M shader storage buffers are supported.
      * ARB_shader_atomic_counters should use set_shader_buffers(slots M..N)
        if N-M+1 atomic counter buffers are supported.
      
      PIPE_CAPs can describe various constraints for early DX11 hardware.
      
      Reviewed-by: Ilia Mirkin's avatarIlia Mirkin <imirkin@alum.mit.edu>
      f9f79d29
    • Marek Olšák's avatar
      26222932
    • Francisco Jerez's avatar
      i965/gen9: Use custom MOCS entries set up by the kernel. · af768922
      Francisco Jerez authored
      Instead of relying on hardware defaults the i915 kernel driver is
      going program custom MOCS tables system-wide on Gen9 hardware.  The
      "WT" entry previously used for renderbuffers had a number of problems:
      It disabled caching on eLLC, it used a reserved L3 cacheability
      setting, and it used to override the PTE controls making renderbuffers
      always WT on LLC regardless of the kernel's setting.  Instead use an
      entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE,
      L3CC=WB.
      
      The "WB" entry previously used for anything other than renderbuffers
      has moved to a different index in the new MOCS tables but it should
      have the same caching semantics as the old entry.
      
      Even though the corresponding kernel change ("drm/i915: Added
      Programming of the MOCS") is in a way an ABI break it doesn't seem
      necessary to check that the kernel is recent enough because the change
      should only affect Gen9 which is still unreleased hardware.
      
      v2: Update MOCS values for the new Android-incompatible tables
          introduced in v7 of the kernel patch.
      
      Cc: 10.6 <mesa-stable@lists.freedesktop.org>
      Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-July/071080.html
      
      
      Reviewed-by: Ben Widawsky's avatarBen Widawsky <ben@bwidawsk.net>
      af768922
    • Serge Martin's avatar
      clover: little OpenCL status code logging clean · 7e0180d5
      Serge Martin authored and Francisco Jerez's avatar Francisco Jerez committed
      
      
      s/build_error/compile_error in order to match the stored OpenCL status code.
      Make program::build catch and log every OpenCL error.
      Make tgsi error triggering uniform with the llvm one.
      
      Reviewed-by: Francisco Jerez's avatarFrancisco Jerez <currojerez@riseup.net>
      7e0180d5
    • Renaud Gaubert's avatar
      glsl: avoid compiler's segfault when processing operators with void arguments · 7b9ebf87
      Renaud Gaubert authored and Samuel Iglesias Gonsálvez's avatar Samuel Iglesias Gonsálvez committed
      This is done by returning an rvalue of type void in the
      ast_function_expression::hir function instead of a void expression.
      
      This produces (in the case of the ternary) an hir with a call
      to the void returning function and an assignment of a void variable
      which will be optimized out (the assignment) during the optimization
      pass.
      
      This fix results in having a valid subexpression in the many
      different cases where the subexpressions are functions whose
      return values are void.
      
      Thus preventing to dereference NULL in the following cases:
        * binary operator
        * unary operators
        * ternary operator
        * comparison operators (except equal and nequal operator)
      
      Equal and nequal had to be handled as a special case because
      instead of segfaulting on a forbidden syntax it was now accepting
      expressions with a void return value on either (or both) side of
      the expression.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85252
      
      
      
      Signed-off-by: default avatarRenaud Gaubert <renaud@lse.epita.fr>
      Reviewed-by: Gabriel Laskar's avatarGabriel Laskar <gabriel@lse.epita.fr>
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsalvez <siglesias@igalia.com>
      7b9ebf87
    • Roland Scheidegger's avatar
      r200: fix some potential big endian issues · 779cabfc
      Roland Scheidegger authored
      
      
      The formats chosen (both by texture format choser, fbo storage allocation)
      are different for big endian not just for rgba8 but also lower bit width
      formats (why I don't actually know). Even the function to test for renderable
      formats used different formats, however the actual colorbuffer setup did not.
      And the blitter did not take that into account neither.
      Untested (what could possibly go wrong...).
      Same as for r100.
      
      Acked-by: default avatarMarek Olšák <marek.olsak@amd.com>
      779cabfc
    • Roland Scheidegger's avatar
      radeon: fix some potential big endian issues · d21320f6
      Roland Scheidegger authored
      
      
      The formats chosen (both by texture format choser, fbo storage allocation)
      are different for big endian not just for rgba8 but also lower bit width
      formats (why I don't actually know). Even the function to test for renderable
      formats used different formats, however the actual colorbuffer setup did not.
      And the blitter did not take that into account neither.
      Untested (what could possibly go wrong...).
      
      Acked-by: default avatarMarek Olšák <marek.olsak@amd.com>
      d21320f6
    • Roland Scheidegger's avatar
      radeon/r200: mark state atoms as dirty after blits · 882476fe
      Roland Scheidegger authored
      
      
      Blit submits lots of packets which are usually handled by state atoms, so
      these must be dirtied.
      Not sure if this fixes anything, but it was a concern raised by bug 51658
      (with this all issues there seen as actual bugs should be fixed, with the
      exception of the patch to upload non-used texenv state atoms which I just
      don't understand).
      
      Acked-by: default avatarMarek Olšák <marek.olsak@amd.com>
      882476fe