- 27 Jun, 2019 25 commits
-
-
Dylan Baker authored
Clang (like LLVM), very annoyingly refuses to provide pkg-config, and only provides cmake (unlike LLVM which at least provides llvm-config, even if llvm-config is terrible). Meson has gained the ability to use cmake to find dependencies, and can successfully find Clang. This change attempts to use cmake to find clang instead of a bunch of library searches, when paired with -Dcmake_prefix_path we can much more reliably use cmake to control which clang we're getting. This is only enabled for meson >= 0.51, which adds the required options. Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Dylan Baker authored
Meson has support for using cmake as a finder for some dependencies, including LLVM. Using cmake has a lot of advantages: it needs less meson maintenance to keep working (even for llvm updates); it works more sanely for cross compiles (as llvm-config is a compiled binary not a shell script). Meson 0.51.0 also has a new generic variable getter that can be used to get information from either cmake, pkg-config, or config-tools dependencies, which is needed for cmake. We continue to support using llvm-config if you don't have cmake installed, or if cmake cannot find a suitable version. Fixes: 0d594594 ("meson: Force the use of config-tool for llvm") Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Kenneth Graunke authored
We need to pitch these on context destroy.
-
Kenneth Graunke authored
Need to pitch these on context destroy.
-
Kenneth Graunke authored
We use persistent maps so this does nothing.
-
Lionel Landwerlin authored
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more obvious what is going on and also sets the group each of the 4 threads in the groups are supposed to execute. Fixes the following CTS tests : dEQP-VK.glsl.derivate.dfdyfine.dynamic_* Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Co-Authored-by:
Jason Ekstrand <jason@jlekstrand.net> Reviewed-by:
Matt Turner <mattst88@gmail.com> Fixes: 2134ea38 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")
-
Eric Engestrom authored
Signed-off-by:
Eric Engestrom <eric.engestrom@intel.com> Reviewed-by:
Dylan Baker <dylan@pnwbakers.com>
-
Eric Engestrom authored
Signed-off-by:
Eric Engestrom <eric.engestrom@intel.com> Reviewed-by:
Dylan Baker <dylan@pnwbakers.com>
-
Eric Engestrom authored
s/otions/options/, and while here let's give the full path to xmlpool.h since `../` won't be true in the generated file. Signed-off-by:
Eric Engestrom <eric.engestrom@intel.com> Reviewed-by:
Dylan Baker <dylan@pnwbakers.com>
-
Kenneth Graunke authored
We were at least cleaning up this reference, but we were failing to pin it in iris_restore_compute_saved_bos.
-
Kenneth Graunke authored
Today, we stream the compute shader thread IDs simply because they're (annoyingly) relative to dynamic state base address. We could upload them once at compile time, but we'd need a separate non-streaming uploader for IRIS_MEMZONE_DYNAMIC, and I'm not sure it's worth it. stream_state pins the buffer for use in the current batch, but also returns a reference to the pipe_resource. We dropped this reference on the floor, leaking a reference basically every time we dispatched a compute shader after switching to a new one. The reason it returns a reference is so that we can hold on to it and re-pin it in iris_restore_compute_saved_bos, which we were also failing to do. So if we actually filled up a batch with repeated dispatches to the same compute shader, and flushed, then continued dispatching, we would fail to pin it and likely GPU hang.
-
Kenneth Graunke authored
We were unconditionally uploading the new data, but then conditionally using it with MEDIA_CURBE_LOAD. If we're not going to emit the command, there's no point in uploading the data.
-
Kenneth Graunke authored
We only use push the compute shader thread IDs, not any actual constant buffer data. So we should track the compute shader variant changing, not constbuf changes.
-
Kenneth Graunke authored
Compute doesn't use UBO ranges (annoyingly), so this is dead code.
-
Kenneth Graunke authored
MEDIA_INTERFACE_DESCRIPTOR's Interface Descriptor Data Start Address field's docs say: "This bit specifies the 64-byte aligned address..." And we were doing 32. Superfluous thread ID uploading was apparently saving us from GPU hangs in most cases.
-
Andrii Simiklit authored
v2: standard 'bool' can be used ( Eric Engestrom <eric.engestrom@intel.com> ) Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com> Signed-off-by:
Andrii Simiklit <andrii.simiklit@globallogic.com>
-
Tomeu Vizoso authored
When the fault_pointer field in the header is set, we can get some idea of which descriptor the HW isn't happy with if we know their addresses. Signed-off-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Tomeu Vizoso authored
Then we can get some information back about any exception that might have happened. Signed-off-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Tomeu Vizoso authored
Arm's kernel driver mentions how to decode this field, which makes a bit clearer what had happened. Signed-off-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Tomeu Vizoso authored
Signed-off-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Samuel Pitoiset authored
The only exception is the GS copy shader which emits them unconditionally. Totals from affected shaders: SGPRS: 71320 -> 71008 (-0.44 %) VGPRS: 54372 -> 54240 (-0.24 %) Code Size: 2952628 -> 2941368 (-0.38 %) bytes Max Waves: 9689 -> 9723 (0.35 %) This helps Dota2, Doom, GTAV and Hitman 2. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
This doesn't fix anything known, but it's likely going to break if layerCount is ~0U. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Kenneth Graunke authored
Leave it to NIR instead, like i965 does. Thanks to Tim Arceri for noticing that I'd left this enabled by accident. shader-db results on Skylake: total instructions in shared programs: 15522628 -> 15521642 (<.01%) instructions in affected programs: 94008 -> 93022 (-1.05%) helped: 34 HURT: 33 helped stats (abs) min: 12 max: 48 x̄: 33.82 x̃: 42 helped stats (rel) min: 0.06% max: 22.14% x̄: 9.86% x̃: 10.89% HURT stats (abs) min: 1 max: 16 x̄: 4.97 x̃: 3t HURT stats (rel) min: 0.82% max: 3.77% x̄: 1.73% x̃: 1.53% 95% mean confidence interval for instructions value: -20.08 -9.35 95% mean confidence interval for instructions %-change: -5.95% -2.36% Instructions are helped. total cycles in shared programs: 367105221 -> 367074230 (<.01%) cycles in affected programs: 10017660 -> 9986669 (-0.31%) helped: 266 HURT: 184 helped stats (abs) min: 1 max: 9556 x̄: 151.35 x̃: 12 helped stats (rel) min: 0.08% max: 59.91% x̄: 4.66% x̃: 1.67% HURT stats (abs) min: 1 max: 1716 x̄: 50.37 x̃: 6 HURT stats (rel) min: <.01% max: 24.40% x̄: 2.42% x̃: 0.85% 95% mean confidence interval for cycles value: -133.90 -3.84 95% mean confidence interval for cycles %-change: -2.44% -1.10% Cycles are helped. Reviewed-by:
Eric Anholt <eric@anholt.net> Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com>
-
Kenneth Graunke authored
This patch changes the code which sets EmitNoIndirectSampler to check the core profile GLSL version, rather than the ARB_gpu_shader5 extension enable. st/mesa exposes ARB_gpu_shader5 if GLSLVersion (in core profiles) or GLSLVersionCompat (in compat profiles) >= 400. The Intel drivers do not currently expose ARB_gpu_shader5 in compat profiles. But the backend can absolutely handle indirect samplers. Looking at the core profile version number should be a good indication of what the driver supports. Reviewed-by:
Eric Anholt <eric@anholt.net> Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com>
-
Kenneth Graunke authored
Nothing uses this, it must be a remnant from an earlier approach.
-
- 26 Jun, 2019 15 commits
-
-
Caio Marcelo de Oliveira Filho authored
The helpers are needed so we can use the syntax `instr(cond)` in the algebraic rules. Add simple rule for dropping a pair of mul-div of the same value when wrapping is guaranteed to not happen. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Caio Marcelo de Oliveira Filho authored
When handling the specified ALU operations, check for the decorations and set nir_alu_instr no_signed_wrap and no_unsigned_wrap flags accordingly. v2: Add a glsl_base_type_is_unsigned_integer() helper. (Karol) v3: Rename helper to glsl_base_type_is_uint(). v4: Use two flags, so we don't need the helper anymore. (Connor) v5: Pass alu directly to handle function. (Jason) Reviewed-by: Karol Herbst <kherbst@redhat.com> [v3] Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Caio Marcelo de Oliveira Filho authored
They indicate the operation does not cause overflow or underflow. This is motivated by SPIR-V decorations NoSignedWrap and NoUnsignedWrap. Change the storage of `exact` to be a single bit, so they pack together. v2: Handle no_wrap in nir_instr_set. (Karol) v3: Use two separate flags, since the NIR SSA values and certain instructions are typeless, so just no_wrap would be insufficient to know which one was referred to. (Connor) v4: Don't use nir_instr_set to propagate the flags, unlike `exact`, consider the instructions different if the flags have different values. Fix hashing/comparing. (Jason) Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1] Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Dylan Baker authored
This is an emergency release due to a critical bug.
-
Dylan Baker authored
-
Dylan Baker authored
-
Jonathan Marek authored
There doesn't seem to be any reason to keep these opcodes around: * fnot/fxor are not used at all. * fand/for are only used in lower_alu_to_scalar, but easily replaced Signed-off-by:
Jonathan Marek <jonathan@marek.ca> Reviewed-by:
Eric Anholt <eric@anholt.net> Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com>
-
Jonathan Marek authored
We can vectorize instructions with different constant sources by creating a new load_const and using that. Signed-off-by:
Jonathan Marek <jonathan@marek.ca> Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
In Midgard, a bundle consists of a few ALU instructions. Within the bundle, there is room for an optional 128-bit constant; this constant is shared across all instructions in the bundle. Unfortunately, many instructions want a 128-bit constant all to themselves (how selfish!). If we run out of space for constants in a bundle, the bundle has to be broken up, incurring a performance and space penalty. As an optimization, the scheduler now analyzes the constants coming in per-instruction and attempts to merge shared components, adjusting the swizzle accessing the bundle's constants appropriately. Concretely, given the GLSL: (a * vec4(1.5, 0.5, 0.5, 1.0)) + vec4(1.0, 2.3, 2.3, 0.5) instead of compiling to the naive two bundles: vmul.fmul [temp], [a], r26 fconstants 1.5, 0.5, 0.5, 1.0 vadd.fadd [out], [temp], r26 fconstants 1.0, 2.3, 2.3, 0.5 The scheduler can now fuse into a single (pipelined!) bundle: vmul.fmul [temp], [a], r26.xyyz vadd.fadd [out], [temp], r26.zwwy fconstants 1.5, 0.5, 1.0, 2.3 Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Fixes: 3e6c6bb0 ("panfrost: Merge checksum buffer with main BO") Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Kenneth Graunke authored
In the past, each query object had their own BO. Checking if the batch referenced that BO was an easy way to check if commands were still queued to compute the query value. If so, we needed to flush. More recently (c24a574e), we started using an u_upload_mgr for query objects, placing multiple queries in the same BO. One side-effect is that iris_batch_references is a no longer a reasonable way to check if commands are still queued for our query. Ours might be done, but a later query that happens to be in the same BO might be queued. We don't want to flush in that case. Instead, check if the current batch's signalling syncpt is the one we referenced when ending the query. We know the syncpt can't have been reused because our query is holding a reference, so a simple pointer comparison should suffice. Removes all batch flushing caused by query objects in Shadow of Mordor. Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk>
-
Kenneth Graunke authored
This returns a pointer to the signalling syncpt, without incrementing the reference count. This can be useful for comparisons. Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk>
-
Boris Brezillon authored
The ss local var is guaranteed to be != NULL. Get rid of this useless check. Signed-off-by:
Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-