Commits · lima-vertex-fixes-2 · Andreas Baierl / mesa

Dec 04, 2019

Some fixes · ce4bb4f4
Andreas Baierl authored 5 years ago

ce4bb4f4

lima: split draw calls on 64k vertices · f07e49cc

Erico Nunes authored 5 years ago and

Andreas Baierl committed 5 years ago


The Mali400 only supports draws with up to 64k vertices per command.
To handle this, break the draw_vbo call into multiple commands.
Indexed drawing is left to a separate code path.
This implementation was ported from vc4_draw_vbo.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

f07e49cc

vc4: move the draw splitting routine to shared code · 12755df4

Erico Nunes authored 5 years ago and

Andreas Baierl committed 5 years ago


This can also be useful for other hardware which has similar limitations
on vertex count per single draw.
The Mali400 has a similar limitation and can reuse this.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

12755df4

lima: refactor indexed draw indices upload · accee424

Erico Nunes authored 5 years ago and

Andreas Baierl committed 5 years ago


As of this commit this is just a refactor in preparation to enable
support for more than 64k vertices.
To support splitting the draw_vbo call, indices shouldn't be re-uploaded
every time.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

accee424

lima: allocate separate bo to store varyings · 5e3c2f18

Erico Nunes authored 5 years ago and

Andreas Baierl committed 5 years ago


The current strategy using the suballocator with fixed size doesn't
scale and causes some programs with large number of vertices (like some
glmark2 scenes) to crash.
Change it to dynamically allocate a separate bo to accomodate for
arbitrary number of vertices.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

5e3c2f18

lima: enable tiling · 3f807fee

Vasily Khoruzhick authored 5 years ago and

Andreas Baierl committed 5 years ago

Now that we have tiled format modifier merged into linux we can enable tiling.

That should improve overall performance and also workaround broken mipmapping
for linear textures since now we prefer tiled textures.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>

3f807fee

gitlab-ci: Run piglit glslparser & quick_shader tests separately · 5585b8ea

Michel Dänzer authored 5 years ago and

Michel Dänzer committed 5 years ago


And only use --process-isolation false for the quick_gl tests.

This will hopefully avoid variance in the test results that we've been
seeing lately. But even if it doesn't, it should at least help narrow
down the cause of the variance.

Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

5585b8ea

intel/perf: fix improper pointer access · ddacd3d4

Lionel Landwerlin authored 5 years ago


This expression was unused by the macro, probably why it didn't
register in the compilation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

ddacd3d4

intel/perf: simplify the processing of OA reports · 8c0b0582

Lionel Landwerlin authored 5 years ago


This is a more accurate description of what happens in processing the
OA reports.

Previously we only had a somewhat difficult to parse state machine
tracking the context ID.

What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :

   * whether the r0 is tagged with the context ID relevant to us

   * if r0 is not tagged with our context ID and r1 is: does r0 have a
     invalid context id? If not then we're in a case where i915 has
     resubmitted the same context for execution through the execlist
     submission port

v2: Update comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

8c0b0582

intel/perf: take into account that reports read can be fairly old · b364e920

Lionel Landwerlin authored 5 years ago


If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

b364e920

intel/perf: set read buffer len to 0 to identify empty buffer · 9d0a5c81

Lionel Landwerlin authored 5 years ago


We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.

We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

9d0a5c81

intel/perf: fix invalid hw_id in query results · acea59db

Lionel Landwerlin authored 5 years ago


Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5f ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

acea59db

radeonsi: display cs blit count for AMD_DEBUG=testdma · a7bbebcf
Pierre-Eric Pelloux-Prayer authored 5 years ago
```
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
```
a7bbebcf
radeonsi: implement sdma for GFX9 · 082d1c16
Pierre-Eric Pelloux-Prayer authored 5 years ago
```
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
```
082d1c16

radv/gfx10: fix the vertex order for triangle strips emitted by a GS · 4cacba0c

Samuel Pitoiset authored 5 years ago


My fix wasn't totally correct as pointed out by Marek.
Ported from RadeonSI.

Fixes: deafe4cc ("radv/gfx10: fix primitive indices orientation for NGG GS")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

4cacba0c

radv: simplify a check in radv_fixup_vertex_input_fetches() · dac6bd29

Samuel Pitoiset authored 5 years ago


The number of loaded channels should always be > 0 now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

dac6bd29

radv: remove dead shader input/output variables · 3b51259f

Samuel Pitoiset authored 5 years ago


No pipeline-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

3b51259f

iris: Stop setting up fake params · 0604768a

Faith Ekstrand authored 5 years ago

In d1c4e64a, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push. Iris doesn't want to push anything so it gives a bogus number
of parameters and trusts the back-end compiler to dead-code all of them.
Now that we can tell the back-end compiler to stop re-arranging things,
delete the hack and enable the new simpler code path.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

0604768a

gallium/scons: fix graw-xlib build on OSX. · 71363676

Dave Airlie authored 5 years ago


Fixes: 44a6b010 (gallivm: add nir->llvm translation (v2))

Tested-by: Vinson Lee <vlee@freedesktop.org>

71363676

llvmpipe: enable texcoord semantics · 3263c982

Dave Airlie authored 5 years ago


To make NIR transitioning easier, move the driver to using
texcoord semantics.

Reviewed-by: Eric Anholt <eric@anholt.net>

3263c982

Dec 03, 2019

anv: Respect the always_flush_cache driconf option · 178a2946
Faith Ekstrand authored 5 years ago
```
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
```
178a2946

gallium/swr: Fix crash when use GL_TDFX_texture_compression_FXT1 format. · 07adc474

Krzysztof Raszkowski authored 5 years ago


Reject the new formats in swr to prevent crashes because it doesn't
know how to handle the new formats.

Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>

07adc474

gitlab-ci: disable junit results for deqp · b31637c4

Rob Clark authored 5 years ago


They don't seem to be hugely useful, and seem to be bogging down gitlab.

Signed-off-by: Rob Clark <robdclark@chromium.org>

b31637c4

anv: Set up SBE_SWIZ properly for gl_Viewport · b1f37688

Faith Ekstrand authored 5 years ago


gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

b1f37688

gitlab-ci: Update to current ci-templates master · 0c88d595
Michel Dänzer authored 5 years ago and Michel Dänzer committed 5 years ago
```
Fixes skopeo copy failures.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
```
0c88d595

ac/llvm: fix atomic var operations if source isn't a deref · f63a3132

Samuel Pitoiset authored 5 years ago


Fixes some CTS regressions.

Fixes: e61a826f ("ac/llvm: fix pointer type for global atomics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

f63a3132

Add support for T820 CI Jobs · dde73403

Neil Armstrong authored 5 years ago and

Tomeu Vizoso committed 5 years ago


Tomeu: - Small rebase fixups

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

dde73403

gallivm/llvmpipe: add support for front facing in sysval. · 502548a0

Dave Airlie authored 5 years ago


This wires up the front facing value as a sysval, I'd like to
remove the other facing code but I'd need to confirm VMware
don't use it first.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

502548a0

llvmpipe/images: handle undefined atomic without crashing · f52cdaa5
Dave Airlie authored 5 years ago
```
just return 0 for unbound atomic operations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
```
f52cdaa5
panfrost: Remove blend shader hack · 71dd52e0
Alyssa Rosenzweig authored 5 years ago and Tomeu Vizoso committed 5 years ago
```
This is no longer used.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
```
71dd52e0

gitlab-ci: Test Panfrost on T720 GPUs · c707b4d0

Tomeu Vizoso authored 5 years ago and

Tomeu Vizoso committed 5 years ago


Now that the Mali T720 GPU is supoprted at the same level as the T760,
test it on PINE64 H64 boards.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

c707b4d0

gitlab-ci: Remove non-default skips from Panfrost · 6d05e38a

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago


During the past months, Panfrost has matured considerably and several
tests stopped being flaky or failing at all.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

6d05e38a

panfrost: White list the Mali T720 · b655be72

Tomeu Vizoso authored 5 years ago and

Tomeu Vizoso committed 5 years ago

Support for this GPU is equal now to that of T760, so whitelist it.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

b655be72

pan/midgard: Splatter on fragment out · 8555bffa

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago


Make sure that the fragment is complete when writing it out.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

8555bffa

panfrost: Simplify shader patching · ab81a23d

Tomeu Vizoso authored 5 years ago and

Tomeu Vizoso committed 5 years ago


We need to always upload anyway.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

ab81a23d

panfrost: Simplify draw_flags · 6ddaa555

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago

Fixes dEQP-GLES3.functional.primitive_restart.*. Note the 0x18000 value
is accidentally somehow enabling primitive restart for some reason.
I'm not sure where this value came from but let's not.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

6ddaa555

panfrost: Implement pan_tiler for non-hierarchy GPUs · 9fb09047

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago


The algorithm is as described. Nothing fancy here, just need to add some
new code paths depending on which model we're running on.

Tomeu:
- Also disable tiling when !hierarchy and !vertex_count
- Avoid creating polygon lists smaller than the minimum when
  vertex_count > 0 but tile size smaller than 16 byte
- Take into account tile size when calculating polygon list size for
  !hierarchy
- Allow 0-sized tiles in a single dimension

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

9fb09047

panfrost: Add information about T720 tiling · 63cd5b81

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago


We've figured out most of the big pieces, and though it looks faintly
like other Midgards, it's much simpler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

63cd5b81

panfrost: Add quirks system to cmdstream · 6887ff4e

Tomeu Vizoso authored 5 years ago and

Tomeu Vizoso committed 5 years ago


Similarly to how it's already done in the compiler, add a way to express
differences between GPU models that need to be taken into account when
assembling the cmdstream.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

6887ff4e

nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select · fbd5359a

Ian Romanick authored 5 years ago


Reviewed-by: Matt Turner <mattst88@gmail.com>

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14660366 -> 14653437 (-0.05%)
instructions in affected programs: 316166 -> 309237 (-2.19%)
helped: 905
HURT: 10
helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6
helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97%
95% mean confidence interval for instructions value: -7.91 -7.23
95% mean confidence interval for instructions %-change: -4.46% -3.99%
Instructions are helped.

total cycles in shared programs: 228571646 -> 228549759 (<.01%)
cycles in affected programs: 56239919 -> 56218032 (-0.04%)
helped: 681
HURT: 216
helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10
helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65%
HURT stats (abs)   min: 1 max: 320 x̄: 42.09 x̃: 14
HURT stats (rel)   min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49%
95% mean confidence interval for cycles value: -41.51 -7.29
95% mean confidence interval for cycles %-change: -0.80% -0.49%
Cycles are helped.

LOST:   1
GAINED: 0

fbd5359a

Admin message

Admin message