Commits · lima-abs-fix · Andreas Baierl / mesa

Sep 24, 2019

WIP: lima/ppir: Introduce late abs modifier lowering · fbfe42b8

Andreas Baierl authored 5 years ago


Some ops can't deal with abs sources directly, so we have to lower
the abs modifier. Lower abs like sqrt(mul(x, x)).

This is what the blob does and we pass some more piglit tests.
This ppir lowering pass is executed at the very end when lowering
of all the other alu instructions is already done but before
lowering the consts.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>

fbfe42b8

lima/gpir: Fix 64-bit shift in scheduler spilling · fed5b605

Connor Abbott authored 5 years ago


There are 64 physical registers so the shift must be 64 bits.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>

fed5b605

lima/gpir: Don't emit movs when translating from NIR · ef38a659

Connor Abbott authored 5 years ago

The scheduler doesn't expect them. To do this, I had to refactor the
registration part of gpir_node_create_dest() to be separate from
creating and inserting the node, since the last two now aren't done when
handling moves. This adds more code but creates the possibility of
automatically inserting input dependencies when inserting nodes, similar
to what's done in NIR with the use-def lists (this isn't done yet).

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>

ef38a659

lima/gpir: Fix postlog2 fixup handling · 96c31d9a

Connor Abbott authored 5 years ago


We guarantee that a complex1 op is always used by postlog2 directly by
rewriting the postlog2 op to be a move when there would be a move
inserted between them. But we weren't doing this in all circumstances
where there might be a move. Move the logic to place_move() so that it
always happens. Fixes a few log tests that happened to start failing due
to changes in the register allocator leading to a different scheduling
order.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>

96c31d9a

lima/gpir: Use registers for values live in multiple blocks · 1cd1cce0

Connor Abbott authored 5 years ago

This commit adds the framework for cross-basic-block register
allocation. Like ARM's compiler, we assume that the value registers
aren't usable across branches, which means we have to use physical
registers to store any value that crosses a basic block. There are three
parts to this:

1. When translating from NIR, we rely on the NIR out-of-ssa pass to
coalesce values into registers. We insert store_reg instructions for
values used in more than one basic block, and load_reg instructions for
values not defined in the same basic block (or defined after their use,
for loops). So by the time we've translated out of NIR we've already
split things into values (which are only used in the same basic block)
and registers (which are only used in different basic blocks than where
they're defined).

2. We allocate the registers at the same time that we allocate the
values, before the final scheduler. Unlike the values, where the
assigned color is fake, we assign the actual physical index & component
to physregs at this stage. load_reg and store_reg are treated as moves
in the allocator and when creating write-after-read dependencies.

3. Finally, in the main scheduler we have to avoid overwriting existing
live physregs when spilling. First, we have to tell the scheduler which
physical registers are live at the end of each block, to avoid
overwriting those. If a register is only live at the beginning, we can
reuse it for spilling after the last original use in the final program
happens, i.e. before any original use is scheduled, but we have to be
careful to add the proper dependencies so that the spill write is
scheduled before the original reads. To handle this we repurpose
reg_link for uses to be used by the scheduler.

A few register-related things copied over from NIR or from other
drivers can be dropped.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>

1cd1cce0

lima/gpir: Support branch instructions · 7594ef6e

Connor Abbott authored 5 years ago

Because branch conditions have to be in the pass slot, there is no
unconditional branch, and realistically the pass slot has to contain a
move when branching (there's nothing it does that would be useful for
operating on booleans, so we can't use it for anything when computing
the branch condition), we put the branch instruction in the pass slot
and at codegen time turn it into a move of the branch condition. This
means that it doesn't have to be special-cased like store instructions
are in the scheduler. Because of this decision we can remove the
half-implemented BRANCH codegen slot. Finally, we (ab)use the existing
schedule_first mechanism to make sure that branches are always last in
the basic block.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>

7594ef6e

lima/gpir: Only try to place actual children · 2df2e081

Connor Abbott authored 5 years ago


When picking a node to be scheduled, we try to schedule its children as
well. But we shouldn't try to schedule nodes which only have a fake
dependency on the original node, since this isn't the point of
scheduling children at the same time and can break some expectations of
the rest of the code.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>

2df2e081

lima/gpir: Fix compiler warning · f989a024
Connor Abbott authored 5 years ago
```
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
```
f989a024

glx: Implement GLX_EXT_no_config_context · 0d635ccc

Adam Jackson authored 7 years ago

This is the GLX counterpart to EGL_KHR_no_config_context. Contexts may
now be created without reference to an fbconfig, in which case it is
treated as compatible with any fbconfig (and thus any GLX drawable).

Khronos: https://github.com/KhronosGroup/OpenGL-Registry/pull/102

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

0d635ccc

glx: Lift sending the MakeCurrent request to top-level code · 999c2aed

Adam Jackson authored 7 years ago

Somewhat terrifyingly, we never sent this for direct contexts, which
means the server never knew the context/drawable bindings. To handle
this sanely, pull the request code up out of the indirect backend, and
rewrite the context switch path to call it as appropriate.  This
attempts to preserve the existing behavior of not calling unbind() on
the context if its refcount would not drop to zero.

Of course, you can't just do this indiscriminately, because this is GLX
and extant X servers have bugs and everything is terrible. To wit:

- For 1.20.x prior to 1.20.6, you can bind a direct context once, but
the second time you try to modify the context's binding you will get
GLXBadContextTag. This includes unbinding the context. And "deleting"
the context will leak memory, because it will still appear to be
current.

- For 1.19 and earlier, glXMakeCurrent(dpy, None, ctx) should be legal
for GL 3.0+ contexts, but the server will throw BadMatch.

To guard against this, we only send the request for indirect contexts
unless the server is known good, and only mention one context at a time
in such a request; if switching between contexts, we first unbind the
old, and then bind the new. Note that the second VendorRelease() version
is to catch XFree86 4.x and Xorg [67].x, which almost certainly have the
above bugs. Other servers might report different version numbers here,
but we can't do direct rendering against them, so this should be safe.

Fixes glx-make-context, glx-multi-window-single-context and
glx-query-drawable-glx_fbconfig_id-window. Sufficiently old piglit will
regress on glx-make-glxdrawable-current (throwing BadMatch), which is
fixed by mesa/piglit!116.

999c2aed

glx: Move vertex array protocol state into the indirect backend · 01e43798
Adam Jackson authored 7 years ago
```
Only relevant for indirect contexts, so let's get that code out of the
common path.
```
01e43798

Sep 23, 2019

intel: Increase Gen11 compute shader scratch IDs to 64. · b9e93db2

Kenneth Graunke authored 5 years ago


From the MEDIA_VFE_STATE docs:

   "Starting with this configuration, the Maximum Number of Threads must
    be set to (#EU * 8) for GPGPU dispatches.

    Although there are only 7 threads per EU in the configuration, the
    FFTID is calculated as if there are 8 threads per EU, which in turn
    requires a larger amount of Scratch Space to be allocated by the
    driver."

It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern.  The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.

Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.

Fixes: 5ac804bd ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <mattst88@gmail.com>

b9e93db2

Revert "intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM" · 50c0dd86

Kenneth Graunke authored 5 years ago

This reverts commit 729de148.

It turns out that, although the register is in the logical context,
it isn't whitelisted, so we can't actually write it from userspace
batch buffers.  The write just becomes a noop, which is why we saw
no performance changes.

I manually whitelisted it, and still observed no performance gains, but
it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments
on the iris driver.  So we might need to fix something before enabling
this.  To prevent it randomly getting turned on should the kernel ever
whitelist this register, we revert the patch for now.

50c0dd86

util/rb_tree: Replace useless ifs with asserts · 03911195
Faith Ekstrand authored 5 years ago
```
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
```
03911195

broadcom/genxml: Stop manually scrubbing 'α' -> "alpha" · a733423d

Kenneth Graunke authored 5 years ago


'α' has never appeared in any genxml files, so there's no need to
replace it with the word "alpha".

Reviewed-by: Eric Anholt <eric@anholt.net>

a733423d

intel/genxml: Stop manually scrubbing 'α' -> "alpha" · 8489206e

Kenneth Graunke authored 5 years ago


'α' has never appeared in any genxml files, so there's no need to
replace it with the word "alpha".

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

8489206e

freedreno/a6xx: do streamout only in binning pass · d8cbf1ad

Rob Clark authored 5 years ago and

Rob Clark committed 5 years ago


Use VPC_SO_OVERRIDE to control whether we do streamout in binning or
draw pass.  Normally we want to do streamout in binning pass, except
when there is a single tile and binning passed is skipped.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

d8cbf1ad

freedreno/a6xx: fix binning pass vs. xfb · b9bf3745

Rob Clark authored 5 years ago and

Rob Clark committed 5 years ago


We could bit doing streamout from binning pass.  In this case we want to
use the full VS which doesn't have (potentially streamed out) varyings
stripped out.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

b9bf3745

freedreno/a6xx: un-open-code PC_PRIMITIVE_CNTL_1.PSIZE · 331f89a9

Rob Clark authored 5 years ago and

Rob Clark committed 5 years ago


Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

331f89a9

ac/nir: force unnormalized coordinates for RECT · 05d32850
Marek Olšák authored 5 years ago
```
This fixes VAAPI.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
```
05d32850
ac/nir: port Z compare value clamping from radeonsi · 500181b2
Marek Olšák authored 5 years ago
```
This fixes some dEQP tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
```
500181b2
tgsi_to_nir: fix 2-component system values like tess_level_inner_default · 09447ccc
Marek Olšák authored 5 years ago
```
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
```
09447ccc

tgsi_to_nir: fix masked out image loads · 3906fce8

Marek Olšák authored 5 years ago


This caused a failure in NIR validation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

3906fce8

nir: define 8-byte size and alignment for bindless variables · 780eeaf2
Marek Olšák authored 5 years ago
```
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
```
780eeaf2
nir: don't add bindless variables to num_textures and num_images · f5c103ce
Marek Olšák authored 5 years ago
```
It confuses radeonsi.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
```
f5c103ce
amd: remove all PCI IDs supported by amdgpu · 150f6ffb
Marek Olšák authored 5 years ago
```
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
```
150f6ffb

loader: always map the "amdgpu" kernel driver name to radeonsi (v2) · 5a545e35

Sonny Jiang authored 5 years ago


v2: cleanup

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

5a545e35

ac: stop using PCI IDs for chip identification · 94297142

Marek Olšák authored 5 years ago


PCI IDs for amdgpu will be removed from Mesa.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

94297142

ac/addrlib: fix chip identification for Vega10, Arcturus, Raven2, Renoir · 48742de6
Marek Olšák authored 5 years ago
```
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
```
48742de6
amd: add more PCI IDs for Navi14 · 65b69813
Marek Olšák authored 5 years ago
```
trivial and urgent

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
```
65b69813

meson: split compiler warnings one per line · c29c4101

Eric Engestrom authored 5 years ago


Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

c29c4101

nir/repair_ssa: Replace the unreachable check with the phi builder · d63162cf

Faith Ekstrand authored 5 years ago

In a3268599, I attempted to fix nir_repair_ssa for unreachable
blocks. However, that commit missed the possibility that the use is in
a block which, itself, is unreachable. In this case, we can end up in
an infinite loop trying to replace a def with itself. Even though a
no-op replacement is a fine operation, it keeps extending the end of the
uses list as we're walking it. Instead of explicitly checking for the
group of conditions, just check if the phi builder gives us a different
def. That's guaranteed to be 100% reliable and, while it lacks symmetry
with the is_valid checks, should be more reliable.

Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

d63162cf

aco: only emit waitcnt on loop continues if we there was some load or export · 2c050b49
Daniel Schürmann authored 5 years ago
```
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
```
2c050b49
nv50/ir/nir: comparison of integer expressions of different signedness warning · 70e39294
Karol Herbst authored 5 years ago
```
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
```
70e39294

nv50/ir: fix unnecessary parentheses warning · 61ccca12

Karol Herbst authored 5 years ago


Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>

61ccca12

lima: remove partial clear support from pipe->clear() · ab49a0e7

Erico Nunes authored 5 years ago


pipe->clear() is not called for partial clears, which mesa emulates by
drawing a quad.
Furthermore, drivers should not use rasterizer state information for
scissor information (which was being used to handle the partial clears).
So, remove the partial clear support since it was not supposed to be
handled by pipe->clear() anyway.
This fixes issues with clearing after switching to different sized
framebuffers.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>

ab49a0e7

dEQP-GLES2.functional.buffer.write.use.index_array.* are passing now. · 0c6ca0a6
Boris Brezillon authored 5 years ago
```
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
```
0c6ca0a6

panfrost: Fix indexed draws · 055497fa

Boris Brezillon authored 5 years ago


->padded_count should be large enough to cover all vertices pointed by
the index array. Use the local vertex_count variable that contains the
updated vertex_count value for the indexed draw case.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

055497fa

clover/nir: fix compilation with g++-5.5 and maybe earlier · 697eb8f9

Karol Herbst authored 5 years ago and

Karol Herbst committed 5 years ago

fixes "sorry, unimplemented: non-trivial designated initializers not supported"

Fixes: deb04adf ("clover: add support for passing kernels as nir to the driver")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>

697eb8f9

st/mesa: Bail on incomplete attachments in discard_framebuffer · ec81f19b

Kenneth Graunke authored 5 years ago

Incomplete attachments don't have an associated pipe_surface, so
this would crash.

Fixes a WebGL conformance test that uses incomplete attachments:
https://www.khronos.org/registry/webgl/sdk/tests/conformance2/renderbuffers/invalidate-framebuffer.html?webglVersion=2&quiet=0&quick=1

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111756

Reviewed-By: Tapani Pälli <tapani.palli@intel.com>

ec81f19b

Admin message

Admin message