1. 21 Jul, 2015 6 commits
    • Adam Jackson's avatar
      r600/sb: Fix an &/&& mistake · 5b4a7ec8
      Adam Jackson authored
      gcc says:
          sb/sb_sched.cpp: In member function 'bool r600_sb::alu_group_tracker::try_reserve(r600_sb::alu_node*)':
          sb/sb_sched.cpp:492:7: warning: suggest parentheses around operand of '!' or change '&' to '&&' or '!' to '~' [-Wparentheses]
            if (!trans & fbs)
      It happens to be harmless; if fbs is ever non-zero, it will be VEC_210,
      which is 5, so (!trans & 5) == 1 and the branch works as expected.  But
      logical AND is clearly what was meant.
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: Adam Jackson's avatarAdam Jackson <ajax@redhat.com>
    • Anuj Phogat's avatar
      Revert "i965/gen9: Plugin the code for selecting YF/YS tiling on skl+" · 545dec5b
      Anuj Phogat authored
      Commit c9dbdc08 introduced some dead code which is supposed to be used
      once we have Yf/Ys tiling working and performing better. Ken reported
      the issue that static analysis tool now shows warnings due to the dead
      code. To fix these warnings, this patch reverts the changes made in
      commit c9dbdc08
      It'll be better to add the Yf/Ys tiling selection code later, when we
      are ready to use it.
      Signed-off-by: Anuj Phogat's avatarAnuj Phogat <anuj.phogat@gmail.com>
      Acked-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
    • Francisco Jerez's avatar
      i965: Fix stride field for the result of emit_uniformize(). · fadf3477
      Francisco Jerez authored
      This is essentially the same problem fixed in an earlier patch for
      immediates.  Setting the stride to zero will be particularly useful
      for my future SIMD lowering pass, because we will be able to just
      check whether the stride of a source register is zero and skip
      emitting the copies required to unzip it in that case.
      Instead of setting stride to zero in every caller of emit_uniformize()
      I've changed the function to return the result as its return value
      (previously it was being written into a caller-provided destination
      register), because this way we can enforce that the result is used with
      the correct regioning from the function itself.
      The changes to the prototype of its VEC4 counterpart are mainly for
      the sake of symmetry, VEC4 registers don't have stride.
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
    • Francisco Jerez's avatar
      i965/fs: Fix stride field for uniforms. · 9383664a
      Francisco Jerez authored
      This fixes essentially the same problem as for immediates.  Registers
      of the UNIFORM file are typically accessed according to the formula:
       read_uniform(r, channel_index, array_index) =
          read_element(r, channel_index * 0 + array_index * 1)
      Which matches the general direct addressing formula for stride=0:
       read_direct(r, channel_index, array_index) =
          read_element(r, channel_index * stride +
                          array_index * max{1, stride * width})
      In either case if reladdr is present the access will be according to
      the composition of two register regions, the first one determining the
      per-channel array_index used for the second, like:
       read_indirect(r, channel_index, array_index) =
          read_direct(r, channel_index,
                      read(r.reladdr, channel_index, array_index))
       read(r, channel_index, array_index) = if r.reladdr == NULL
          then read_direct(r, channel_index, array_index)
          else read_indirect(r, channel_index, array_index)
      In conclusion we can handle uniforms consistently with the other
      register files if we set stride to zero.  After lowering to a GRF
      using VARYING_PULL_CONSTANT_LOAD in demote_pull_constant_loads() the
      stride of the source is set to one again because the result of
      VARYING_PULL_CONSTANT_LOAD is generally non-uniform.
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
    • Francisco Jerez's avatar
      i965/fs: Fix stride for immediate registers. · 5f8d9ae5
      Francisco Jerez authored
      When the width field was removed from fs_reg the BROADCAST handling
      code in opt_algebraic() started to miss a number of trivial
      optimization cases resulting in the ugly indirect-addressing sequence
      to be emitted unnecessarily for some variable-indexed texturing and
      UBO loads regardless of one of the sources of BROADCAST being
      immediate.  Apparently the reason was that we were setting the stride
      field to one for immediates even though they are typically uniform.
      Width used to be set to one too which is why this optimization used to
      work previously until the "reg.width == 1" check was removed.
      The stride field of vector immediates is intentionally left equal to
      one, because they are strictly speaking not uniform.  The assertion in
      fs_generator makes sure that immediates have the expected stride as
      consistency check.
      Reviewed-by: Samuel Iglesias Gonsálvez's avatarSamuel Iglesias Gonsálvez <siglesias@igalia.com>
    • Iago Toral's avatar
      i965/vec4: Fix liveness analysis with BRW_OPCODE_SEL · b298311d
      Iago Toral authored
      We only consider a vgrf defined by a given block if the block writes to it
      unconditionally. So far we have been checking this by testing that the
      instruction is not predicated, however, in the case of BRW_OPCODE_SEL,
      the predication is used to select the value to write, not to decide if
      the write is actually done. The consequence of this was increased life
      spans for affected vgrfs, which could lead to additional register pressure.
      Since NIR generates selects for conditional writes this was causing massive
      register pressure in a handful of piglit and dEQP tests that had a large
      number of select operations with the NIR-vec4 backend.
      Fixes the following piglit tests with the NIR-vec4 backend:
      Fixes 80 dEQP tests with the NIR-vec4 backend in the following category:
      Reviewed-by: Francisco Jerez's avatarFrancisco Jerez <currojerez@riseup.net>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
  2. 20 Jul, 2015 10 commits
  3. 18 Jul, 2015 11 commits
    • Ilia Mirkin's avatar
      gm107/ir: fix indirect txq emission · 8c8a71f0
      Ilia Mirkin authored
      Signed-off-by: Ilia Mirkin's avatarIlia Mirkin <imirkin@alum.mit.edu>
      Cc: mesa-stable@lists.freedesktop.org
    • Ilia Mirkin's avatar
      nvc0/ir: don't worry about sampler in txq handling · 346ce0b9
      Ilia Mirkin authored
      There's no need to deal with samplers for texture size queries. That
      code also was accidentally setting an invalid sIndirectSrc position, but
      it can now just be removed.
      Signed-off-by: Ilia Mirkin's avatarIlia Mirkin <imirkin@alum.mit.edu>
      Cc: mesa-stable@lists.freedesktop.org
    • Ilia Mirkin's avatar
      nvc0/ir: fix txq on indirect samplers · 20e484af
      Ilia Mirkin authored
      Signed-off-by: Ilia Mirkin's avatarIlia Mirkin <imirkin@alum.mit.edu>
      Cc: mesa-stable@lists.freedesktop.org
    • Abdiel Janulgue's avatar
      i965: Disable resource streamer in BLORP · 670914ea
      Abdiel Janulgue authored
      Switch off hardware-generated binding tables and gather push
      constants in the blorp. Blorp requires only a minimal set of
      simple constants. There is no need for the extra complexity
      to program a gather table entry into the pipeline.
      Cc: kenneth@whitecape.org
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
    • Abdiel Janulgue's avatar
      i965: Upload binding tables in hw-generated binding table format. · fc65b6eb
      Abdiel Janulgue authored
      When hardware-generated binding tables are enabled, use the hw-generated
      binding table format when uploading binding table state.
      Normally, the CS will will just consume the binding table pointer commands
      as pipelined state. When the RS is enabled however, the RS flushes whatever
      edited surface state entries of our on-chip binding table to the binding
      table pool before passing the command on to the CS.
      Note that the the binding table pointer offset is relative to the binding table
      pool base address when resource streamer instead of the surface state base address.
      v2: Fix possible buffer overflow when allocating a chunk out of the
          hw-binding table pool (Ken).
      v3: Remove extra newline and add missing brace around if-statement (Matt).
      v4: Fix broken INTEL_DEBUG=shader_time for hw-generated binding tables.
          Document PRM WaStateBindingTableOverfetch workaround.
      Cc: kenneth@whitecape.org
      Cc: mattst88@gmail.com
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
    • Abdiel Janulgue's avatar
      i965: Implement interface to edit binding table entries · 2133980b
      Abdiel Janulgue authored
      Unlike normal software binding tables where the driver has to manually
      generate and fill a binding table array which are then uploaded to the
      hardware, the resource streamer instead presents the driver with an option
      to fill out slots for individual binding table indices. The hardware
      accumulates the state for these combined edits which it then automatically
      flushes to a binding table pool when the binding table pointer state
      command is invoked.
      v2: Clarify binding table edit bit aligment (Topi).
      v3: Make comments and function names more clearer (Ken).
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
    • Abdiel Janulgue's avatar
      i965: Enable hardware-generated binding tables on render path. · 19075648
      Abdiel Janulgue authored
      This patch implements the binding table enable command which is also
      used to allocate a binding table pool where where hardware-generated
      binding table entries are flushed into. Each binding table offset in
      the binding table pool is unique per each shader stage that are
      enabled within a batch.
      Also insert the required brw_tracked_state objects to enable
      hw-generated binding tables in normal render path.
      v2: - Use MOCS in binding table pool alloc for GEN8
          - Fix spurious offset when allocating binding table pool entry
            and start from zero instead.
      v3: - Include GEN8 fix for spurious offset above.
      v4: - Fixup wrong packet length in enable/disable hw-binding table
            for GEN8 (Ville).
          - Don't invoke HW-binding table disable command when we dont
            have resource streamer (Chris).
      v5: - Reorder the state cache invalidate flush so it happens in-between
            enabling hw-generated binding tables and the previous sw-binding
            table GPU state (Chris).
      v6: - Do the same fix in v5 for gen7_disable_hw_binding_tables().
          - Adhere to coding guidelines and make comments more informative.
      Cc: kenneth@whitecape.org
      Cc: syrjala@sci.fi
      Cc: chris@chris-wilson.co.uk
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
    • Abdiel Janulgue's avatar
      i965: Enable resource streamer for the batchbuffer · 090529af
      Abdiel Janulgue authored
      Check first if the hardware and kernel supports resource streamer. If this
      is allowed, tell the kernel to enable the resource streamer enable bit on
      execbuffer flags.
      v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel
            supports RS (Ken).
          - Add brw_device_info::has_resource_streamer and toggle it for
            Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken).
      v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel.
      v4: - Always inspect the getparam.value (Chris Wilson).
      v5: - Fold redundant devinfo->has_resource_streamer check in context create
            into init screen.
      Cc: kenneth@whitecape.org
      Cc: chris@chris-wilson.co.uk
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
    • Abdiel Janulgue's avatar
      i965: Define HW-binding table and resource streamer control opcodes · ccf9598a
      Abdiel Janulgue authored
      v2: Use macros for HW binding table edits (Topi)
      v3: Add Broadwell support.
      v4: Make hardware binding table bit definitions even more clearer (Ken)
      Cc: kenneth@whitecape.org
      Reviewed-by: Topi Pohjolainen's avatarTopi Pohjolainen <topi.pohjolainen@intel.com>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
    • Emma Anholt's avatar
      vc4: Switch to using a separate ioctl for making shaders. · ff7896a3
      Emma Anholt authored
      This gives the kernel a chance to validate and lock down the data,
      without having to deal with mmap zapping.
      With this, GLBenchmark stops on a texture relocations, because we'd
      recycled a shader BO as another shader and failed to revalidate, since we
      weren't clearing the cached validation state on mmap faults.
    • Roland Scheidegger's avatar
      mesa: fix up some texture error checks · e42cfe5d
      Roland Scheidegger authored
      In particular, we were incorrectly accepting s3tc (and lots of others)
      for CompressedTexSubImage3D (but not CompressedTexImage3D) calls with 3d
      targets. At this time, the only allowed formats for these calls are the
      bptc ones, since none of the specific extensions allow it (astc hdr would).
      Also, fix up a bug in _mesa_target_can_be_compressed - 3d target needs to
      be allowed for bptc formats.
      Reviewed-by: Brian Paul's avatarBrian Paul <brianp@vmware.com>
  4. 17 Jul, 2015 13 commits