- Aug 14, 2019
-
-
Rohan Garg authored
We mmap the import BO as required now, so we can drop the TODO as well as mmap'ing it right after the import. Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
-
Alyssa Rosenzweig authored
Multiple spill moves share a single spill slot. Issue found in Krita. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This allows us to have multiple spill moves, whereas otherwise for N spill moves, the first N-1 would be clobbered. Issue found in Krita. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
It doesn't... make a ton of sense to need to assert and this routine is hotter than you might expect. Doesn't matter for release builds, of course. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Marek Olšák authored
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Eric Engestrom authored
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
-
Eric Engestrom authored
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
-
Christian Gmeiner authored
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>
-
Christian Gmeiner authored
Fixes: 797a2e4f ("etnaviv: update logic to determine uniform limits") Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>
-
Ian Romanick authored
v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
-
Ian Romanick authored
A common thing in many shaders: uniform vs { vec4 bones[...]; }; ... x = some_calculation(bones[i + 0]); y = some_calculation(bones[i + 1]); z = some_calculation(bones[i + 2]); This turns into stuff like vec1 32 ssa_12 = iadd ssa_11, ssa_0 vec1 32 ssa_13 = ishl ssa_12, ssa_3 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_15 = iadd ssa_11, ssa_1 vec1 32 ssa_16 = ishl ssa_15, ssa_3 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_18 = iadd ssa_11, ssa_2 vec1 32 ssa_19 = ishl ssa_18, ssa_3 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) By reassociating the shift and the add, we can reduce this to vec1 32 ssa_12 = ishl ssa_11, ssa_3 vec1 32 ssa_13 = iadd ssa_12, ssa_0 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_16 = iadd ssa_12, ssa_1 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_19 = iadd ssa_12, ssa_2 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) v2: Add some commentary from Rhys Perry's nearly identical patch. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16277758 -> 16250704 (-0.17%) instructions in affected programs: 1440284 -> 1413230 (-1.88%) helped: 4920 HURT: 6 helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4 helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79% HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3 HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55% 95% mean confidence interval for instructions value: -5.67 -5.31 95% mean confidence interval for instructions %-change: -2.26% -2.16% Instructions are helped. total cycles in shared programs: 367118526 -> 365895358 (-0.33%) cycles in affected programs: 93504145 -> 92280977 (-1.31%) helped: 2754 HURT: 1269 helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16 helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12% HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9 HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75% 95% mean confidence interval for cycles value: -387.31 -220.78 95% mean confidence interval for cycles %-change: -2.11% -1.68% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
-
copy_deref for wildcard dereferences requires the same arrays lengths otherwise it leads to a crash in optimizations like 'nir_opt_copy_prop_vars' because these optimizations expect 'copy_deref' just for arrays with the same lengths. v2: check was moved to 'try_match_deref' to fix aoa cases (Jason Ekstrand <jason@jlekstrand.net>) v3: -fixed comment -the condition merged with other one (Jason Ekstrand <jason@jlekstrand.net>) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111286 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
We wire through some shader-db-style stats on the current shader in the disassemble so we can get a quick estimate of shader complexity from a trace. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Suggested-by: Rob Clark <robdclark@chromium.org>
-
Alyssa Rosenzweig authored
We'll want to dump some stats after the shader, and I refuse to use one teensy little goto. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Ian Romanick authored
Tested-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>
-
Christian Gmeiner authored
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>
-
Christian Gmeiner authored
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>
-
Christian Gmeiner authored
Also this adds the missing impl for etna_dump_shader_nir(..). Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>
-
Christian Gmeiner authored
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>
-
Christian Gmeiner authored
Have a correct answer to GL_MAX_FRAGMENT_UNIFORM_VECTORS and GL_MAX_VERTEX_UNIFORM_VECTORS. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
-
Christian Gmeiner authored
Taken 1:1 from the header file. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
-
Christian Gmeiner authored
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
-
The flush callback may be called on the same pipe context, and thus the same stream, from two different threads of execution. However, etna_cmd_stream_flush{,2}() must not be called on the same stream from two different threads of execution as that would mess up the etna_bo refcounting and likely have other ugly side effects. Fix this by using a reentrant screen lock around the flush callback. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
-
Add Valgrind support for etnaviv to track BO leaks. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
-
Use hash table instead of ad-hoc arrays. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
-
The following situation can happen in a multithreaded OpenGL application. A BO is submitted from etna_cmd_stream #1 with flags set for read. A BO is submitted from etna_cmd_stream #2 with flags set for write. This triggers a flush on stream #1 and clears the BO's current_stream pointer. If at this point, stream #2 attempts to queue BO again, which does happen, the BO will be added to the submit list twice. The Linux kernel driver correctly detects this and warns about it with "BO at index %u already on submit list" kernel message. However, when cleaning the BO cache in etna_bo_cache_free(), the BO which was submitted twice will also be free()d twice, this triggering a glibc double free detector. The fix is easy, even if the BO does not have current_stream set, iterate over current streams' list of BOs before adding the BO to it and verify that the BO is not yet there. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
-
Roman Stratiienko authored
Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> Reviewed-by: Rob Herring <robh@kernel.org>
-
Enables and passes piglits: spec/ARB_drivative_control/ dfdx-coarse dfdx-dfdy dfdx-fine dfdy-coarse dfdy-fine Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
-
Vasily Khoruzhick authored
Now we have an accessors for ppir src, so it's possible to easily print all srcs and dests while dumping ppir representation. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
Get rid of most switch/case by using src accessors Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
We'll need it if we want to walk through node sources Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
-
Vasily Khoruzhick authored
Sometimes we need to walk through ppir_node sources, common accessor for all node types will simplify code a lot. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
-
- Aug 13, 2019
-
-
Jordan Justen authored
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Jordan Justen authored
The DRI interface for modifiers with aux data treats the aux data as a separate plane of the main surface. When the dri layer requests the plane associated with the aux data, we save the required information into the dri aux plane image. Later when the image is used, the dri plane image will be available in the pipe_resource structure's `next` field. Therefore in iris, we reconstruct the aux setup from this separate dri plane image when the image is used. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
We need to ensure that the DRI image format supports CCS. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
-
Jordan Justen authored
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Jordan Justen authored
Reworks: * If the aux-state is not ISL_AUX_STATE_AUX_INVALID, then use memset even when memset_value is zero. The hiz buffer initial aux-state will be set to invalid, and therefore we can skip the memset. But, for CCS it will be set to ISL_AUX_STATE_PASS_THROUGH, and therefore the aux data must be cleared to 0 with the memset. Previously we would use BO_ALLOC_ZEROED with the CCS aux data, so this memset wasn't required. Now, the CCS aux data may be part of the main surface. We prefer to not use BO_ALLOC_ZEROED excessively, so the memset is needed for the CCS case. (Nanley) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Jordan Justen authored
This is not currently required because the hiz buffer is in a separate buffer, and therefore the offset is 0. If we combine the aux buffer with the main surface buffer, then the hiz offset may become non-zero. Suggested-by: Nanley Chery <nanley.g.chery@intel.com> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Marek Olšák authored
Nine uses GENERIC slots > 31. Trivial.
-