Commits · lima-gpir-branch-opt-v4 · Connor Abbott / mesa

Dec 01, 2019
- fix 3 · c4b19ff7
  Connor Abbott authored 5 years ago
  
  c4b19ff7
- fix 2 · 8e4cf087
  Connor Abbott authored 5 years ago
  
  8e4cf087
Nov 24, 2019

fix · 116b8360
Connor Abbott authored 5 years ago

116b8360

lima/gpir: Prevent infinite spill/unspill loops · 1618dada

Connor Abbott authored 5 years ago

While it's guaranteed that we can always eventually schedule any node,
there were some rare cases where we keep unspilling other nodes so that
we never actually get anything done. Particularly in a scenario like
this:

1. The only fully-ready node would increase value register pressure by 2,
   but there are 11 live value registers (the other 10 are partially
   ready).
2. We try to speculatively schedule the fully-ready node, but we can only
   spill one other node so it fails.
3. Because there's now one slot free, we schedule a register store.
4. Now we're back to 2.

Fix this by disallowing any register stores from when spilling fails
until the next node is scheduled.

1618dada

Nov 23, 2019

lima/gpir: Rewrite register allocation for value registers · 481e219c

Connor Abbott authored 5 years ago

The usual linear-scan register allocation algorithm can't handle
preallocated registers, since we might be forced to choose a color for
a non-preallocated variable that overlaps with a pre-allocated variable.
But in such cases we can simply split the live range of the offending
variable when we reach the beginning of the pre-allocated variable's
live range. This is still optimal in the sense that it always finds a
coloring whenever one is possible, but we may not insert the smallest
possible number of moves. However, since it's actually the scheduler
which splits live ranges afterwards, we can simply fold in the move
while keeping its fake dependencies, and then everything still works! In
other words, inserting a live range split for a value register during
register allocation is pretty much free.

This means that we can split register allocation in two. First globally
allocate the cross-block registers accessed through load_reg and
store_reg instructions, which is still done via graph coloring, and then
run a linear scan algorithm over each block, treating the load_reg and
store_reg nodes as referring to pre-allocated registers. This makes the
existing RA more complicated, but it has two benefits: first, using
round-robin with the linear scan allocator results in much fewer fake
dependencies, resulting in around 15 less instructions in the glmark2
jellyfish shader and fixing a regression in instruction count since
branching support went in. Second, it will simplify handling spilling.
With just graph coloring for everything, every time we spill a node, we
have to create new value registers which become new nodes in the graph
and re-run RA. This is worsened by the fact that when writing a value to
a temporary, we need to have an extra register available to load the
write address with a load_const node. With the new scheme, we can ignore
this entirely in the first part and then in the second part we can just
reserve an extra register in sections where we know we have to spill. So
no re-running RA many times, and we can get a good result quickly.

The current implementation does linear scan backwards, so that we can
insert the fake dependencies while allocating and avoid creating any
move nodes at all when we have to split a live range. However, it turns
out that this makes handling schedule_first nodes a bit more
complicated, so it's not clear if that was worth it.

481e219c

lima: Add a NIR load duplicating pass · cae50b8d
Connor Abbott authored 5 years ago
```
and use it with vertex shaders.
```
cae50b8d

lima/gpir: Optimize nots created from branch lowering · 21b3e0c9

Connor Abbott authored 5 years ago

We also add a DCE pass to cleanup the result of this pass, which turns
out to also be necessary to cleanup the result of nir->gpir in some
cases that we didn't hit until the next commit.

21b3e0c9

Nov 03, 2019
- lima/gpir: Optimize conditional break/continue · 16a050b6
  Connor Abbott authored 5 years ago
  
  16a050b6
Oct 13, 2019

lima/gpir: Make lima_gpir_node_insert_child() useful · 1a629b94

Connor Abbott authored 5 years ago

We weren't using this function before. The name is confusing, but it
changes the child while also fixing up the dependence link, if you don't
have access to it already. Or at least, I think that's what the
intention is, and what we'll need to change the branch condition in the
next commit. Adding a dependency between the new and old source doesn't
make any sense for this, and we also need to change the actual source.

1a629b94

Oct 09, 2019

aco: don't reorder instructions in order to lower boolean phis · f584c427
Daniel Schürmann authored 5 years ago
```
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
```
f584c427
aco: re-use existing phi instruction when lowering boolean phis · 10be9067
Daniel Schürmann authored 5 years ago
```
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
```
10be9067
aco: Cleanup insert_before_logical_end · a607ea51
Michael Schellenberger Costa authored 5 years ago and Daniel Schürmann committed 5 years ago
```
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
```
a607ea51

lima/ppir: don't clone texture loads · c8554f84

Vasily Khoruzhick authored 5 years ago


Cloning texture loads isn't a good idea since we may move it into
a block that is not shared between all the invocations of the shader.
We'd like to avoid that since it may result in undefined behavior.

Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>

c8554f84

gitlab-ci/lava: Add needs: for container image to test jobs · 94cfe590

Michel Dänzer authored 5 years ago and

Michel Dänzer committed 5 years ago

Without this, the test jobs could spuriously run after the container
job failed or was cancelled, even if the build job didn't run at all.

Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

94cfe590

radv: bump minTexelBufferOffsetAlignment to 4 · 030e67fa

Samuel Pitoiset authored 5 years ago

The spec has probably been misinterpreted during RADV bringup.

This fixes GPU hangs with dEQP-VK.binding_model.*offset_nonzero*.

Fixes: f4e499ec ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

030e67fa

meta: leak of shader program when decompressing tex-images · 1b21b975

Sergii Romantsov authored 5 years ago and

Danylo Piliaiev committed 5 years ago


CC: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>

1b21b975

mesa/main: prefer R8-textures instead of A8 for glBitmap in display lists · bbdbb02a

Erik Faye-Lund authored 5 years ago


This allows drivers to communicate that they prefer R8 textures rather
than A8 for glBitmap usage.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

bbdbb02a

st/mesa: Prefer R8 for bitmap textures · f9222693

Dave Airlie authored 6 years ago and

Erik Faye-Lund committed 5 years ago


If it's not available, we fall back to A8. This should work on all drivers,
because we depend on it in the display-list code already.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

f9222693

drirc: enable vk_x11_override_min_image_count for DOOM · ad96c498

Samuel Pitoiset authored 5 years ago

DOOM fails to handle more images than expected when the adaptative
sync mode is enabled.

Closes: mesa/mesa#1902


Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

ad96c498

radv: implement VK_KHR_shader_clock · cbd6f0a0

Samuel Pitoiset authored 5 years ago


NIR->LLVM and ACO already support nir_intrinsic_shader_clock.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

cbd6f0a0

iris: Implement the Broadwell NP Z PMA Stall Fix · 0b7ecfdd

Kenneth Graunke authored 5 years ago

This should help avoid stalls in the pixel mask array in certain
non-promoted depth cases.  It especially helps for Z16, as each bit
in the PMA corresponds to two pixels when using Z16, as opposed to
the usual one pixel.

Improves performance in GFXBench5 TRex by 22% (n=1).

0b7ecfdd

Oct 08, 2019

docs: Update recently enabled VK extensions on Intel · 4327837b
Caio Oliveira authored 5 years ago

4327837b

anv: Enable VK_EXT_shader_subgroup_{ballot,vote} · 9560c9b4

Caio Oliveira authored 5 years ago


Anvil now supports and passes Vulkan CTS tests matching

    dEQP-VK.subgroups.*.ext_shader_subgroup_ballot.*
    dEQP-VK.subgroups.*.ext_shader_subgroup_vote.*

and crucible tests matching

    func.shader-ballot.*
    func.shader-subgroup-vote.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

9560c9b4

st/mesa: Fix inverted polygon stipple condition · b453b29f

Kenneth Graunke authored 5 years ago


Fixes Piglit's gl-2.1-polygon-stipple-fs on iris.

Fixes: 63f24c3c ("gallium: Enable MESA_framebuffer_flip_y")
Reviewed-by: Fritz Koenig <frkoenig@google.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

b453b29f

gallium: Enable MESA_framebuffer_flip_y · 63f24c3c

Fritz Koenig authored 5 years ago


Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

63f24c3c

mesa: Allow MESA_framebuffer_flip_y for GLES 3 · 66937abe

Fritz Koenig authored 5 years ago


Implement glFramebufferParameteriMESA on GLES 3 so
that the extension is not dependant on GLES 3.1

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

66937abe

mesa: GetFramebufferParameteriv spelling · 9fb76392

Fritz Koenig authored 5 years ago


GetFramebufferParameteriv was incorrectly spelled as
GetFramebufferParameteri.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

9fb76392

include/GLES2: Sync GLES2 headers with Khronos · ab8e5a15

Fritz Koenig authored 5 years ago


Bring in glFramebufferParameteriMESA/glGetFramebufferParameterivMESA

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

ab8e5a15

radeonsi: enable zerovram for Rocket League · 5afbe87d

Clément Guérin authored 5 years ago

Fixes corruption on game startup.

Closes: mesa/mesa#1888



Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

5afbe87d

iris: Properly unreference extra VBOs for draw parameters · face2212

Kenneth Graunke authored 5 years ago

bound_vertex_buffers doesn't include extra draw parameters buffers.
Tracking this correctly is kind of complicated, and iris_destroy_state
isn't exactly in a hot path, so just loop over all VBO bindings.

Fixes: 4122665d (iris: Enable ARB_shader_draw_parameters support)
Reported-by: Sergii Romantsov <sergii.romantsov@globallogic.com>

face2212

meson: fix sys/mkdev.h detection on Solaris · 6f26eae0

Eric Engestrom authored 5 years ago


On Solaris, sys/sysmacros.h has long-deprecated copies of major() & minor()
but not makedev().
sys/mkdev.h has all three and is the preferred choice.

Let's make sure we check for all 3 major(), minor() and makedev().

Reported-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Tested-by: Alan Coopersmith <alan.coopersmith@oracle.com>

6f26eae0

include: update drm-uapi · 02b3aa3c

Eric Engestrom authored 5 years ago

`drm.h` was missing a `#include <stdint.h>`, which was completely
breaking the non-linux builds after 272f9cfe ("dri: Use DRM_FORMAT_*
instead of defining our own copy.") started making use of it.

Fixes: 272f9cfe ("dri: Use DRM_FORMAT_* instead of defining our own copy.")
Closes: #950


Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

02b3aa3c

loader: Simplify handling of the radeonsi driver · 3b8aeb09

Michel Dänzer authored 5 years ago and

Michel Dänzer committed 5 years ago


The list of AMD/ATI devices supported by radeon/r200/r300/r600 is
complete, so anything else must use radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

3b8aeb09

amd/llvm: Fix warning due to asserted-only variable. · a0c930d2

Bas Nieuwenhuizen authored 5 years ago


[212/893] Compiling C object 'src/amd/llvm/ce8261c@@amd_common_llvm@sta/ac_nir_to_llvm.c.o'.
../mesa/src/amd/llvm/ac_nir_to_llvm.c: In function ‘visit_image_atomic’:
../mesa/src/amd/llvm/ac_nir_to_llvm.c:2636:17: warning: unused variable ‘format’ [-Wunused-variable]
 2636 |    const GLenum format = nir_intrinsic_format(instr);
      |                 ^~~~~~

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

a0c930d2

panfrost: Draw the wallpaper when only depth/stencil bufs are cleared · 71eda74f

Boris Brezillon authored 5 years ago


When only the depth/stencil bufs are cleared, we should make sure the
color content is reloaded into the tile buffers if we want to preserve
their content.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

71eda74f

panfrost: Make sure a clear does not re-use a pre-existing batch · c138ca80

Boris Brezillon authored 5 years ago

glClear()s are expected to be the first thing GL apps do before drawing
new things. If there's already an existing batch targetting the same
FBO that has draws attached to it, we should make sure the new clear
gets a new batch assigned to guaranteed that the FB content is actually
cleared with the requested color/depth/stencil values.

We create a panfrost_get_fresh_batch_for_fbo() helper for that and
call it from panfrost_clear().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

c138ca80

iris: Update comment about 3-component formats and buffer textures · 016c19bc

Kenneth Graunke authored 5 years ago

You can't render to PIPE_BUFFER so there's no reason to prefer RGBX.
PBO upload would like to use proper RGB textures as source data.

016c19bc

iris: Allow packed RGB pbo uploads · 64207ebe

Chris Wilson authored 5 years ago and

Kenneth Graunke committed 5 years ago

Hitting any fallback path on Broxton as we require clflushing the whole
buffer even for an upload of a subtexture. However, since gallium
provides a pbo upload path, allow it to sample packed RGB if supported.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

64207ebe

anv/android: fix images created with external format support · e4a826b2

Tapani Pälli authored 5 years ago


This fixes a case where user first creates image and then later binds it
with memory created from AHW buffer.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

e4a826b2

meson: Always add LLVM coroutines module. · 72665a0f

Bas Nieuwenhuizen authored 5 years ago

It gets used by the gallium auxiliary draw module, which gets used
pretty much always when LLVM is used as JIT.

At the same time most builds don't hit the issue here because the
shared library of LLVM contains all modules.

Fixes: d32690b4 ("gallivm: add coroutine pass manager support")
Closes: mesa/mesa#951


Reviewed-by: Gert Wollny <gert.wollny@collabora.com>

72665a0f

Admin message

Admin message