- Oct 09, 2015
-
-
Lina Versace authored
Some assertions in gen8_surface_state.c checked for gen < 8. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
-
Lina Versace authored
The (gen < 9) check in brw_clear() was too broad. It disabled all types of fast color clears: a. singlesample rep clears b. singlesample MCS fast clears c. multisample MCS fast clears The MCS clears are still buggy, but the rep clear works well. So let's enable it. Reviewed-by: Neil Roberts <neil@linux.intel.com>
-
Lina Versace authored
Fast color clears are disabled for gen9 (see the checks in brw_meta_fast_clear), so there is no reason to allocate the MCS and track its clear/resolve state. Reviewed-by: Neil Roberts <neil@linux.intel.com>
-
Roland Scheidegger authored
-
Marek Olšák authored
They didn't do anything useful. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
The translate functions is split into two: - translation to TGSI - creating the variant (TGSI transformations only) Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
The samplers for DrawPixels data and the pixel map are assigned to slots which don't overlap with the existing sampler slots. The texture coordinates for the user texture are uploaded as a constant. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
- there is no connection to user fragment shaders, so having these as shader variants makes no sense - don't use Mesa IR, use TGSI - don't create gl_fragment_program, just create the shader CSO v2: generate exactly the same shader as before to fix llvmpipe Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
The other variables can't be moved. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
No other shader stage has a "prepare" function. This will allow removing some variables from st_vertex_program. Also, prepare_fragment_program was a dead prototype. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
v2: get it from declarations, not instructions
-
Marek Olšák authored
st/mesa will use this, but drivers can use it too. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Marek Olšák authored
Drivers weren't notified about this at all. This allows disabling on-demand compilation in drivers. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com>
-
Rob Clark authored
First step towards inverting the dependency between glsl and nir (so nir can be used without glsl). Also solves this issue with 'make distclean' Making distclean in mesa make[2]: Entering directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa' Makefile:2486: ../glsl/.deps/shader_enums.Plo: No such file or directory make[2]: *** No rule to make target '../glsl/.deps/shader_enums.Plo'. Stop. make[2]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa' Makefile:684: recipe for target 'distclean-recursive' failed make[1]: *** [distclean-recursive] Error 1 make[1]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src' Makefile:615: recipe for target 'distclean-recursive' failed make: *** [distclean-recursive] Error 1 Reported-by: Andy Furniss <adf.lists@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Rob Clark <robclark@freedesktop.org>
-
Francisco Jerez authored
The point is to avoid having to re-validate all image units when _NEW_TEXTURE is flagged, which can be expensive if the driver exposes a large number of image units. This has been reported to fix a 36% performance regression in the Synmark2 Multithread benchmark on the i965 driver which exposes 192 image units. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91788 Reported-by: Wendy Wang <wendy.wang@intel.com> Tested-by: Ye Tian <yex.tian@intel.com> CC: "11.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
-
Francisco Jerez authored
gl_image_unit::_Valid will be removed in a future commit. Tested-by: Ye Tian <yex.tian@intel.com> CC: "11.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
-
Francisco Jerez authored
The call to _mesa_test_texobj_completeness() is unnecessary if the texture is already known to be complete. If the texture object is dirtied in the meantime _BaseComplete and _MipmapComplete will be reset to false. _mesa_is_image_unit_valid() will start to be called more frequently in a future commit, so it seems desirable to avoid the unnecessary work. Tested-by: Ye Tian <yex.tian@intel.com> CC: "11.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
-
Francisco Jerez authored
A future commit will remove all texture object-dependent derived state from the image unit struct to make validation unnecessary on texture state changes. Instead of checking gl_image_unit::_Valid drivers will be required to call this function when needed to find out whether an image unit is in a valid state and whether access from the shader is allowed. Tested-by: Ye Tian <yex.tian@intel.com> CC: "11.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
-
Francisco Jerez authored
The hardware documentation relating to the UAV HW-assisted coherency mechanism and UAV access enable bits is scarce and sometimes contradictory, and there's quite some guesswork behind this commit, so let me summarize the background first: HSW and later hardware have infrastructure to support a stricter form of data coherency between shader invocations from separate primitives. The mechanism is controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and _PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on the 3DPRIMITIVE command. Regardless of whether "UAV Coherency Required" is set, the hardware fixed-function units will increment a per-stage semaphore for each request received if "Accesses UAV" is set for the same or any lower stage. An implicit DC flush is emitted by the lowermost stage with "Accesses UAV" set once it's done processing the request, this also happens regardless of the value of "UAV Coherency Required". The completion of the DC flush will cause the same stage and all previous ones to decrement the semaphore, marking the UAV accesses for the primitive as coherent with L3. The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline stall before any threads are dispatched for the first FF stage with "Accesses UAV" set until the semaphore is cleared for the same stage. Effectively this guarantees that UAV memory accesses performed by previous primitives from any stage will be strictly ordered (and thanks to the implicit DC flush visible in memory) with UAV accesses from the following primitives. None of this is required by the usual image, atomic counter and SSBO GL APIs which have very relaxed cross-primitive coherency and ordering requirements, so we don't actually ever set the "UAV Coherency Required" bit -- Ordering with respect to shader invocations from previous stages on the same primitive where there is a data dependency is of course already guaranteed as the spec requires, regardless of this mechanism being enabled. We do set the "Accesses UAV" bits though since my commit ac7664e4 (which this patch partially reverts), mainly because of comments like the following from the BDW PRM: > 3DSTATE_GS >[...] > 12 Accesses UAV > Format: Enable > This field must be set when GS has a UAV access. There are similar comments in the documentation for the other 3DSTATE_*S commands. The "must" part is misleading and unjustified AFAIK. Most of the "Accesses UAV" bits don't seem to have any side effects other than the implicit DC flushes and the related book-keeping in anticipation for a subsequent primitive with "UAV Coherency Required" set, so in most cases they are unnecessary and may incur a performance penalty. There is an exception though. On Gen8+ the PS_EXTRA UAV access bit influences the calculation of the PS UAV-only and ThreadDispatchEnable signals which on previous generations were set explicitly by the driver, so we cannot always avoid enabling it on the PS stage. The primary motivation for this change is that in fact the hardware coherency mechanism is buggy and will cause a rather non-deterministic hang on Gen8 when VS is the only stage with "Accesses UAV" set and the processing of a request terminates immediately after the implicit DC flush is sent for a previous primitive with no additional vertices being emitted for the second primitive, what will cause the hardware to skip sending a second DC flush and cause the VS to stall indefinitely waiting for a response from the DC (BDWGFX HSD 1912017). This hardware bug can be reproduced on current master with the spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit subtest (if you have the patience to run it a few dozen times). The proposed workaround is to insert CS STALLs speculatively between 3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage only. Because this would affect one of the hottest paths in the driver and likely decrease performance even further due to the unnecessary serialization, and because we don't actually need the implicit DC flushes, it seems better to just disable them. Cc: 11.0 <mesa-stable@lists.freedesktop.org>
-
Connor Abbott authored
This was originally added to nir_instrs_equal() instead of nir_instr_can_cse() incorrectly, but this was fixed when moving to the instruction set API (as it had to be, otherwise hashing wouldn't work). Now, this is dead code since instr_can_rewrite() will only return true for texture instructions that use an index, so we can turn the check into an assert. This also means that now nir_instrs_equal(instr, instr) will always return true unless it assert-fails. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
-
Connor Abbott authored
This was previously tied to CSE, since it would only work for instructions where nir_can_cse() (now instr_can_rewrite()) returned true. Now that CSE uses the instruction set abstraction which only uses this internally, we can make it local to nir_instr_set.c. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
-
Connor Abbott authored
This replaces an O(n^2) algorithm with an O(n) one, while allowing us to import most of the infrastructure required for GVN. The idea is to walk the dominance tree depth-first, similar when converting to SSA, and remove the instructions from the set when we're done visiting the sub-tree of the dominance tree so that the only instructions in the set are the instructions that dominate the current block. No piglit regressions. No shader-db changes. Compilation time for full shader-db: Difference at 95.0% confidence -35.826 +/- 2.16018 -6.2852% +/- 0.378975% (Student's t, pooled s = 3.37504) v2: - rebase on start_block removal - remove useless state struct - change commit message Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
-
Connor Abbott authored
This will replace direct usage of nir_instrs_equal() in the CSE pass, which reduces an O(n^2) algorithm with an effectively O(n) one. It'll also be useful for implementing GVN on top of GCM. v2: - Add texture support. - Add more comments. - Rename instr_can_hash() to instr_can_rewrite() since it's really more about whether its uses can be rewritten, and it's implicitly used by nir_instrs_equal() as well. - Rename nir_instr_set_add() to nir_instr_set_add_or_rewrite() (Jason). - Make the HASH() macro less magical (Topi). - Rewrite the commit message. v3: - For sorting phi sources, use a VLA, store pointers to the sources, and compare the predecessor pointer directly (Jason). Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
-
Connor Abbott authored
v2: rebase, don't constify nir_srcs_equal() as it's pass-by-value anyways Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
-
Connor Abbott authored
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
-
Connor Abbott authored
Right now nir_instrs_equal() is tied pretty tightly to CSE, but we're going to introduce the idea of an instruction set and tie it to that instead. In anticipation of that, move this into its own file where we'll add the rest of the instruction set implementation later. v2: Rebase on texture support. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
-
Neil Roberts authored
If a non-const sample number is given to interpolateAtSample it will now generate an indirect send message with the sample ID similar to how non-const sampler array indexing works. Previously non-const values were ignored and instead it ended up using a constant 0 value. The generator will try to determine if the sample ID is dynamically uniform via nir_src_is_dynamically_uniform. If not it will query the pixel interpolator in a loop, once for each different live sample number. The next live sample number is found using emit_uniformize. If multiple live channels have the same sample number then they will be handled in a single iteration of the loop. The loop is necessary because the indirect send message doesn't seem to have a way to specify a different value for each fragment. This fixes the following two Piglit tests: arb_gpu_shader5-interpolateAtSample-nonconst arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform v2: Handle dynamically non-uniform sample ids. v3: Remove the BREAK instruction and predicate the WHILE directly. Make the tokens arrays const. (Matt Turner) v4: Iterate over the live channels instead of each possible sample number. v5: Don't special case immediate values in brw_pixel_interpolator_query. Make a better wrapper for the function to set up the PI send instruction. Ensure that the SHL instructions are scalar. (Francisco Jerez). Reviewed-by: Francisco Jerez <currojerez@riseup.net>
-
Neil Roberts authored
It is possible to directly predicate the WHILE instruction. In this case there will be a second successor block because the execution can resume from the instruction after the loop. This will be used in a subsequent patch. Reviewed-by: Matt Turner <mattst88@gmail.com>
-