- Jul 13, 2018
-
-
Dylan Baker authored
-
Dylan Baker authored
-
At 232ed898 "i965/fs: Register allocator shoudn't use grf127 for sends dest" we didn't take into account the case of SEND instructions that are not send_from_grf. But since Gen7+ although the backend still uses MRFs internally for sends they are finally assigned to a GRFs. In the case of unspills the backend assigns directly as source its destination because it is suppose to be available. So we always have a source-destination overlap. If the reg_allocator assigns registers that include the grf127 we fail the validation rule that affects Gen8+ "r127 must not be used for return address when there is a src and dest overlap in send instruction." So this patch activates the grf127_send_hack_node for Gen8+ and if we have any register spilled we add interferences to the destination of the unspill operations. We also need to avoid that opt_bank_conflicts() optimization, that runs after the register allocation, doesn't move things around, causing the grf127 to be used in the condition we were avoiding. Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test and some shader-db crashed because of the grf127 validation rule.. v2: make sure that opt_bank_conflicts() optimization doesn't change the use of grf127. (Caio) Found by Caio Marcelo de Oliveira Filho Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193 Fixes: 232ed898 "i965/fs: Register allocator shoudn't use grf127 for sends dest" Cc: 18.1 <mesa-stable@lists.freedesktop.org> Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (cherry picked from commit 62f37ee5)
-
- Jul 10, 2018
-
-
Drawable and readable need to either both be None or both be non-None. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Eric Anholt <eric@anholt.net> (cherry picked from commit d257ec01)
-
Implement at brw_eu_validate the restriction from Intel Broadwell PRM, vol 07, section "Instruction Set Reference", subsection "EUISA Instructions", Send Message (page 990): "r127 must not be used for return address when there is a src and dest overlap in send instruction." v2: Style fixes (Matt Turner) Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: 18.1 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 0e47ecb2)
-
Since Gen8+ Intel PRM states that "r127 must not be used for return address when there is a src and dest overlap in send instruction." This patch implements this restriction creating new grf127_send_hack_node at the register allocator. This node has a fixed assignation to grf127. For vgrf that are used as destination of send messages we create node interfereces with the grf127_send_hack_node. So the register allocator will never assign to these vgrf a register that involves grf127. If dispatch_width > 8 we don't create these interferences to the because all instructions have node interferences between sources and destination. That is enough to avoid the r127 restriction. This fixes CTS tests that raised this issue as they were executed as SIMD8: dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom Shader-db results on Skylake: total instructions in shared programs: 7686798 -> 7686797 (<.01%) instructions in affected programs: 301 -> 300 (-0.33%) helped: 1 HURT: 0 total cycles in shared programs: 337092322 -> 337091919 (<.01%) cycles in affected programs: 22420415 -> 22420012 (<.01%) helped: 712 HURT: 588 Shader-db results on Broadwell: total instructions in shared programs: 7658574 -> 7658625 (<.01%) instructions in affected programs: 19610 -> 19661 (0.26%) helped: 3 HURT: 4 total cycles in shared programs: 340694553 -> 340676378 (<.01%) cycles in affected programs: 24724915 -> 24706740 (-0.07%) helped: 998 HURT: 916 total spills in shared programs: 4300 -> 4311 (0.26%) spills in affected programs: 333 -> 344 (3.30%) helped: 1 HURT: 3 total fills in shared programs: 5370 -> 5378 (0.15%) fills in affected programs: 274 -> 282 (2.92%) helped: 1 HURT: 3 v2: Avoid duplicating register classes without grf127. Let's use a node with a fixed assignation to grf127 and create interferences to send message vgrf destinations. (Eric Anholt) v3: Update reference to CTS VK_KHR_8bit_storage failing tests. (Jose Maria Casanova) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: 18.1 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 232ed898)
-
Zhaowei Yuan authored
"sampler2DRect" and "sampler2DRectShadow" are specified as reserved from GLSL 1.1 and GLSL ES 1.0 Signed-off-by: zhaowei yuan <zhaowei.yuan@samsung.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106906 Reviewed-by: Eric Anholt <eric@anholt.net> Fixes: 34f7e761 ("glsl/parser: Track built-in types using the glsl_type directly") (cherry picked from commit 73ec4376)
-
- Jul 09, 2018
-
-
Faith Ekstrand authored
When we don't have PLN (gen4 and gen11+), we implement LINTERP as either LINE+MAC or a pair of MADs. In both cases, the accumulator is written by the first of the two instructions and read by the second. Even though the accumulator value isn't actually ever used from a logical instruction perspective, it is trashed so we need to make the scheduler aware. Otherwise, the scheduler could end up re-ordering instructions and putting a LINTERP between another an instruction which writes the accumulator and another which tries to use that result. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com> (cherry picked from commit 566e6abd) Rebased version provided by Jason
-
Fixes: 7987d041 ("i965/surface_state: Emit the clear color address instead of value.") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit 420bf14e)
-
Marek Olšák authored
Ported from i965 including the comment. This fixes: dEQP-EGL.functional.reusable_sync.valid.wait_server Cc: 18.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (cherry picked from commit 0eaf0696)
-
Fixes new piglit tests: - glsl-1.10/execution/fs-sign-neg-abs.shader_test - glsl-1.10/execution/fs-sign-sat-neg-abs.shader_test - glsl-1.10/execution/vs-sign-neg-abs.shader_test - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (cherry picked from commit 88bd37c0)
-
This is achived by copying the sign(abs(x)) optimization from the FS backend. On Gen7 an earlier platforms, this fixes new piglit tests: - glsl-1.10/execution/vs-sign-neg-abs.shader_test - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (cherry picked from commit 9626ea49)
-
At the time of commit 7bc6e455 (i965: Add support for saturating immediates.) we thought mixed type saturates would be impossible. We were only thinking about type converting moves from D to F, for example. However, type converting moves w/saturate from F to DF are definitely possible. This change minimally relaxes the restriction to allow cases that I have been able trigger via piglit tests. Fixes new piglit tests: - arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test - arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (cherry picked from commit f8e54d02)
-
fold_assoc() called from fold_alu_op3() can lower the number of src to 2, which then leads to an invalid access to n.src[2]->gvalue(). This didn't seem to have caused much harm in the past, but on Fedora 28 it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used, although with libstdc++ 4.8.5 this didn't do anything, -D_GLIBCXX_DEBUG was needed to show the issue). An alternative fix would be to instead call fold_alu_op2() from within fold_assoc() when the number of src is reduced and return always TRUE from fold_assoc() in this case, with the only actual difference being the return value from fold_alu_op3() then. I'm not sure what the return value actually should be in this case (or whether it even can make a difference). https://bugs.freedesktop.org/show_bug.cgi?id=106928 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit 817efd89)
-
- Jul 06, 2018
-
-
For merged shaders, VS as HS for example. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit 85865dbe)
-
- Jul 05, 2018
-
-
In 6f5abf31 this code was fixed to calculate the maximum size of an attribute in a seperate pass and then allocate the registers to that size. However this wasn’t taking into account ranges that overlap but don’t have the same starting location. For example: layout(location = 0, component = 0) out float a[4]; layout(location = 2, component = 1) out float b[4]; Previously, if ‘a’ was processed first then it would allocate a register of size 4 for location 0 and it wouldn’t allocate another register for location 2 because it would already be covered by the range of 0. Then if something tries to write to b[2] it would try to write past the end of the register allocated for ‘a’ and it would hit an assert. This patch changes it to scan for any overlapping ranges that start within each range to calculate the maximum extent and allocate that instead. Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/ vs-fs-array-interleave-range.shader_test Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: 6f5abf31 "i965: Fix output register sizes when multiple variables share a slot." (cherry picked from commit 2d5ddbe9)
-
The current code causes: /usr/include/c++/8/debug/safe_iterator.h:207: Error: attempt to copy from a singular iterator. This is due to the iterators getting invalidated, fix the reverse iterator to use the return value from erase, and cast it properly. (used Mathias suggestion) Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de> (cherry picked from commit 8c51caab)
-
- Jul 03, 2018
-
-
We need to add loop terminators to the list in the order we come across them otherwise if two or more have the same exit condition we will select that last one rather than the first one even though its unreachable. This fix is for simple unrolls where we only have a single exit point. When unrolling these type of loops the unreachable terminators and their unreachable branch are removed prior to unrolling. Because of the logic change we also switch some list access in the complex unrolling logic to avoid breakage. Fixes: 6772a17a ("nir: Add a loop analysis pass") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit 463f8490)
-
Otherwise we can incorrectly cmod propagate in situations like add(8) g10<1>.xD g2<0>.xD -16D ... cmp.ge.f0(8) null<1>D g2<0>.xD 16D ... (+f0) sel(8) g21<1>.xyUD g14<4>.xyyyUD g18<4>.xyyyUD Sadly, this change hurts quite a few shaders. v2: Refactor writemask compatibility check into a separate function. Suggested by Caio. Ivy Bridge and Haswell had similar results. (Haswell shown) total instructions in shared programs: 12968489 -> 12968738 (<.01%) instructions in affected programs: 60679 -> 60928 (0.41%) helped: 0 HURT: 249 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.22% max: 0.81% x̄: 0.46% x̃: 0.44% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.44% 0.48% Instructions are HURT. total cycles in shared programs: 409171965 -> 409172317 (<.01%) cycles in affected programs: 260056 -> 260408 (0.14%) helped: 0 HURT: 176 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.34% x̄: 0.17% x̃: 0.17% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: 0.16% 0.18% Cycles are HURT. Sandy Bridge total instructions in shared programs: 10423577 -> 10423753 (<.01%) instructions in affected programs: 40667 -> 40843 (0.43%) helped: 0 HURT: 176 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.29% max: 0.79% x̄: 0.48% x̃: 0.42% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.46% 0.51% Instructions are HURT. total cycles in shared programs: 146097503 -> 146097855 (<.01%) cycles in affected programs: 503990 -> 504342 (0.07%) helped: 0 HURT: 176 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.02% max: 0.36% x̄: 0.12% x̃: 0.11% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: 0.11% 0.13% Cycles are HURT. No changes on any other platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Fixes: cd635d14 i965/vec4: Propagate conditional modifiers from compares to adds Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (cherry picked from commit 995d9937)
-
Marek Olšák authored
Shaders that need special code for external samplers were broken if they were loaded from the cache. Cc: 18.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (cherry picked from commit 99c6cae2)
-
If we have to re-emit push constant data, we need to re-emit all of it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> CC: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 198a7222)
-
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> CC: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 6a1d8350)
-
Every time we emit a new state base address we will need to re-emit our binding tables, since they might have been emitted with a different base state adress. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> CC: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 1b548246)
-
Faith Ekstrand authored
Previously, we just hashed the entire descriptor set layout verbatim. This meant that a bunch of extra stuff such as pointers and reference counts made its way into the cache. It also meant that we weren't properly hashing in the Y'CbCr conversion information information from bound immutable samplers. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (cherry picked from commit d1c778b3)
-
Marek Olšák authored
Cc: 18.1 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 41f80373) Conflicts fixed by Dylan Conflicts: src/gallium/drivers/radeonsi/si_blit.c
-
The spec allows adding scalars with a vector or matrix. In this case the opt was losing swizzle and size information. This fixes a bug with Doom (2016) shaders. Fixes: 34ec1a24 ("glsl: Optimize (x + y cmp 0) into (x cmp -y).") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit 2a5121bf)
-
Previously, TargetNVC0::insnCanLoadOffset() returned whether the offset could be set to a specific value. The IndirectPropagation pass expected it to return whether the offset could be increased by a specific value, which is what TargetNV50::insnCanLoadOffset() does. Fixes: 37b67db6 ("nvc0/ir: be careful about propagating very large offsets into const load") Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> (cherry picked from commit 6bb0f87c)
-
Faith Ekstrand authored
Commit 0d905597 fixed an issue with the placement of the zip and unzip instructions. However, as a side-effect, it reversed the order in which we were emitting the split instructions so that they went from high group to low instead of low to high. This is fine for most things like texture instructions and the like but certain render target writes really want to be emitted low to high. This commit just switches the order back around to be low to high. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: 0d905597 "intel/fs: Be more explicit about our placement of [un]zip" (cherry picked from commit d5b617a2)
-
There is a parallel make build issue in src/egl/drivers/dri2/ for wayland builds. Can be reproduced with: $ rm src/egl/drivers/dri2/*.h src/egl/drivers/dri2/platform_wayland.lo $ make -C src/egl/ drivers/dri2/platform_wayland.lo ../../../mesa-18.1.2/src/egl/drivers/dri2/platform_wayland.c:50:10: fatal error: linux-dmabuf-unstable-v1-client-protocol.h: No such file or directory This patch adds the missing dependency. Fixes: 02cc3593 "egl/wayland: Use linux-dmabuf interface for buffers" Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> [Eric: fixed up the commit title] Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> (cherry picked from commit d7c4ce1d)
-
- Jun 29, 2018
-
-
Dylan Baker authored
-
Dylan Baker authored
-
Dylan Baker authored
-
- Jun 27, 2018
-
-
Dylan Baker authored
-
The Vulkan spec says: "pipelineBindPoint is a VkPipelineBindPoint indicating whether the descriptors will be used by graphics pipelines or compute pipelines. There is a separate set of bind points for each of graphics and compute, so binding one does not disturb the other." CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit 7a57c827)
-
- Jun 26, 2018
-
-
Dylan Baker authored
-
Faith Ekstrand authored
Otherwise, if what gets passed into the function call is a deref chain longer than just a variable deref, we would use the type of the entire variable rather than the type of the thing being dereferenced. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106980 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (Unique to 18.1)
-
Faith Ekstrand authored
Even though they don't have regular sources, they do have derefs and those may have implied sources that should be handled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106980 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (Unique to 18.1)
-
While XFB has been enabled for cache, we did not serialize enough data for the whole API to work (such as glGetProgramiv). Fixes: 6d830940 "Allow shader cache usage with transform feedback" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106907 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (cherry picked from commit ab2643e4)
-
Andrii Simiklit authored
We can not use the VUE Dereference flags combination for EOT message under ILK and SNB because the threads are not initialized there with initial VUE handle unlike Pre-IL. So to avoid GPU hangs on SNB and ILK we need to avoid usage of the VUE Dereference flags combination. (Was tested only on SNB but according to the specification SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 the ILK must behave itself in the similar way) v2: Approach to fix this issue was changed. Instead of different EOT flags in the program end we will create VUE every time even if GS produces no output. v3: Clean up the patch. Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Tested-by: Mark Janes <mark.a.janes@intel.com> (cherry picked from commit 232c5d75)
-
From the Vulkan spec: "If this is a primary command buffer, then this value is ignored." CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit ba5e25ed)
-