Commits · 21.1-branchpoint · Jocelyn Falempe / mesa

Apr 14, 2021

anv: bump internal descriptor index fields to 32bits · 23c4b59b


Prior to supporting VK_EXT_descriptor_indexing all of our descriptor
limits where below 64k which fitted a uint16_t. Now all of those can
go up to 2^20 entries so we need 32bits indexes to keep track of them.

This change leaves the dynamic indexes at 16bits. We could arguably
bump them too, up to the reviewer's taste.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6e230d76 ("anv: Implement VK_EXT_descriptor_indexing")
Closes: mesa/mesa#4636


Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <mesa/mesa!10228>

23c4b59b

ac: add missing BUF_DATA_FORMAT_10_11_11 vertex format on GFX10+ · 97e7b21c

Samuel Pitoiset authored 3 years ago

This format is supported by the driver.

Fixes vertex explosion in Dirt 5.

Closes: mesa/mesa#4635


Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <mesa/mesa!10226>

97e7b21c

ir3/sched: Don't schedule too many tex/SFU instructions · 2deead18

Connor Abbott authored 4 years ago

Consider a simple loop that does a series of texture instructions and
then reduces the results:

vec4 sum = vec4(0);
for (int i = 0; i < N; i++) {
   sum += texture(...);
}

Assume that the loop is unrolled and we schedule the resulting basic
block. Right now, after we schedule the first texture instruction, the
only instructions available to schedule that don't incur a sync are the
instructions to setup the second texture instruction. So we keep picking
the texture instructions, no matter how large N is, resulting in a
pathological schedule for register pressure when N is very large:

sum1 = texture(...);
sum2 = texture(...);
sum3 = texture(...);
...
sum = sum1 + sum2 + sum3 + ...;

In particular this happens with some CTS tests for VK_EXT_robustness2,
where a loop like that with many iterations is marked as [[unroll]],
forcing NIR to unroll it.

This solution is a balance between the current approach and always
scheduling for register pressure (and ignoring sync's). We only allow a
certain number of texture fetches to be in flight before considering
textures to "sync", even though they don't really, both because they
likely *will* sync in reality (overflowing the internal queue of waiting
texture instructions) and because at some point we need the normal
algorithm to kick in and start lowering register pressure.

Part-of: <mesa/mesa!7571>

2deead18

ir3/sched: Don't penalize uses of already-waited tex/SFU · 7821e5a3

Connor Abbott authored 4 years ago

Once we insert a use of a given tex or SFU instruction, then we must
wait for that tex/SFU instruction (as well as all earlier ones) to
complete, so we shouldn't penalize further uses, even if a subsequent
tex/SFU instruction gets scheduled after the first use. This especially
matters after the next commit when we start forcibly breaking up long
sequences of texture instructions, since if we schedule a group of 8
texture instructions then we want to schedule the uses of those
instructions in parallel with the next 8 texture instructions to reduce
register pressure.

Part-of: <!7571>

7821e5a3

zink: verify that source-format support linear-filter · 5362adf6

Erik Faye-Lund authored 3 years ago


Similar to the previous commit, we should also verify that the
source-format support linear-filter if we try to blit with it.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <!10234>

5362adf6

zink: verify that src/dst support blitting · 0ba3cf1f

Erik Faye-Lund authored 3 years ago


Some Vulkan-drivers don't support blitting between all formats and
layouts. So let's verify this while blitting, and fall back to the
normal rendering code-path instead.

This fixes a crash on start-up in OpenArena on V3DV.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <mesa/mesa!10234>

0ba3cf1f

radv/winsys: Remove use_local_bos · 8ddbac03

Bas Nieuwenhuizen authored 3 years ago


Now that perftest is stored in the winsys.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <mesa/mesa!10198>

8ddbac03

radv: Use VRAM cmdbuffers in more situations. · 284bc57a

Bas Nieuwenhuizen authored 3 years ago


In most games I tested we use 32 MiB of cmdbuffers+cmd upload buffers
at most. Especially since we have mutable descriptors it seems
somewhat unlikely anything else will eat it up so be a bit more
aggressive allocating them in VRAM.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <mesa/mesa!10198>

284bc57a

radv: Refactor cs_domain to be a winsys function. · 057ec395
Bas Nieuwenhuizen authored 3 years ago
```
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <mesa/mesa!10198>
```
057ec395

zink: do not dereference NULL pointer · 9de05fd3

Erik Faye-Lund authored 3 years ago


If first_frame_done isn't set, but fence is NULL, we end up dereferncing
that NULL-pointer.

This can happen in the case where the first submitted batch has no work,
and pfence was passed as a NULL-pointer.

While we're at it, simplify the check with the surrounding code, which
also checks for a NULL-pointer here.

Fixes: e93ca92d ("zink: force explicit fence only on first frame flush")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <!10235>

9de05fd3

aco: Add a simple heuristic to decide early or late primitive export. · f3e004cb

Timur Kristóf authored 3 years ago


Late export is theoretically better if used with LATE_ALLOC,
but in practice, the early export has an advantage of
lower register usage, therefore more concurrent waves.

The idea of this commit is that "small" shaders benefit from early
primitive export more, due to being able to launch much more waves.

Let's consider a NIR shader "small" when it has only 1 block.
This yields both better performance, and better stats, than always
using late export.

Fossil DB on Sienna:

Totals from 12807 (8.76% of 146265) affected shaders:
VGPRs: 609128 -> 620216 (+1.82%); split: -0.01%, +1.83%
SpillSGPRs: 1458 -> 1538 (+5.49%)
CodeSize: 37028204 -> 37019320 (-0.02%); split: -0.17%, +0.14%
MaxWaves: 282902 -> 278516 (-1.55%)
Instrs: 7163142 -> 7162925 (-0.00%); split: -0.18%, +0.18%
VClause: 169285 -> 169547 (+0.15%); split: -1.15%, +1.30%
SClause: 267373 -> 267151 (-0.08%); split: -0.24%, +0.16%
Copies: 446442 -> 444567 (-0.42%); split: -2.68%, +2.26%
Branches: 156245 -> 156195 (-0.03%); split: -0.30%, +0.26%
PreSGPRs: 434701 -> 447396 (+2.92%)
PreVGPRs: 527783 -> 540527 (+2.41%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <mesa/mesa!10106>

f3e004cb

aco: Emit fewer branches for NGG VS/TES with late primitive export. · 5dbab03a

Timur Kristóf authored 3 years ago


Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <mesa/mesa!10106>

5dbab03a

aco: Set block_kind_export_end in create_vs/fs_exports. · af7d5f5b

Timur Kristóf authored 3 years ago


Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <!10106>

af7d5f5b

aco: Extract ngg_nogs_export_prim_id to a separate function. · 2b312a4f

Timur Kristóf authored 3 years ago


Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <mesa/mesa!10106>

2b312a4f

aco: Use s_setprio 3 at the beginning of every VS and TES. · 231ef14b

Timur Kristóf authored 3 years ago


The user-set priority of shaders matters very little, but we hope
this might still help speed up VS input loads especially.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <mesa/mesa!10106>

231ef14b

aco: Remove useless s_setprio near gs_alloc_req. · 4c86c7aa

Timur Kristóf authored 3 years ago


We learned that the gs_alloc_req is not actually when the export
space allocation happens. So it makes no sense to prioritize it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <!10106>

4c86c7aa

zink: fall back from cached to non-cached memory · 1cf6b8d4

Erik Faye-Lund authored 3 years ago


This fixes basic rendering on top of V3DV, which doesn't seem to expose
the cached memory we expect and love.

Fixes: 598dc3dc ("zink: use cached memory for all resources when possible")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <mesa/mesa!10230>

1cf6b8d4

aco: Align NGG scratch size to 16 so a single ds_read can always read it. · 75cd4374

Timur Kristóf authored 3 years ago


Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <mesa/mesa!10155>

75cd4374

aco: Optimize workgroup exclusive scan to better avoid bank conflicts. · c1346e5c

Timur Kristóf authored 3 years ago


Previously, every wave had multiple active lanes read the LDS, and
the data was processed by VALU DPP instructions.

Now, only the first lane reads the LDS in order to avoid bank
conflicts, and the results are processed by SALU.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <mesa/mesa!10155>

c1346e5c

panfrost: Fix pan_blitter_get_blit_shader() · c8c6e0ff

Boris Brezillon authored 3 years ago

The key passed to _mesa_hash_table_search() is wrong, fix it.

Fixes: 8ba2f9f6 ("panfrost: Create a blitter library to replace the existing preload helpers")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <mesa/mesa!10232>

c8c6e0ff

zink: do not clear on cpu · 1c51c533

Erik Faye-Lund authored 3 years ago


This seems to simply be a mixup of what utility function to use.
util_clear_render_target clears on the CPU, whereas
util_blitter_clear_render_target clears on the GPU. Because we do the
zink_blit_begin dance, it seems reasonable to assume the latter was
intended.

Fixes: 622f8f6e ("zink: add a pipe_context::clear_texture hook")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <!10211>

1c51c533

ci: Update to latest ci-templates · 3c15dba4

Michel Dänzer authored 3 years ago

This is possible again thanks to
!9955

 , and
this MR requires rebuilding all templates based docker images anyway,
so we can pull in the latest templates for free.

We need to exclude /dev/* when unpacking rootfs tarballs for the
arm_test image, since x86 container build jobs do not allow mknod
anymore with current templates. The baremetal test jobs have another
filesystem mounted on /dev anyway.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <!9833>

3c15dba4

ci: Do not append ci-templates commit hash to Windows docker image tag · a436b276

Michel Dänzer authored 3 years ago

We're not using the templates for the Windows image.

Fixes needless rebuild of the Windows image when the ci-templates
commit is changed.

Part-of: <!9833>

a436b276

ci: Install Rust & cargo from Debian for x86_test* images · db4ddced

Michel Dänzer authored 3 years ago


Also build deqp-runner once in x86_test-base instead of separately in
x86_test-{gl,vk}.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <!9833>

db4ddced

ci: Install GLVND from Debian bullseye · b0ab534c
Michel Dänzer authored 3 years ago
```
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <mesa/mesa!9833>
```
b0ab534c

ci: Install llvm-spirv from Debian bullseye · 0155881d

Michel Dänzer authored 3 years ago


While we're at it, use a tag instead of whatever happens to be the
current main branch for building libclc.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <mesa/mesa!9833>

0155881d

ci: Install spirv-tools from Debian bullseye · 711b8945

Michel Dänzer authored 3 years ago

v2:
* Drop local build from x86_test-gl image as well (Eric Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Part-of: <mesa/mesa!9833>

711b8945

ci: Install librenderdoc from Debian bullseye · c743421f

Michel Dänzer authored 3 years ago

Debian bullseye has a separate command-line-only renderdoc package, so
no need to install Qt packages and build renderdoc anymore.

Closes: #3125


Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <!9833>

c743421f

ci: Move docker images from Debian buster to bullseye · af0fde95

Michel Dänzer authored 3 years ago

Among other things, this gets us GCC 10 (was 6).

Requires some changes to third party components we use:

* Install apitrace (& waffle) from Debian; was hitting issues with the
  local build, and it's the same version 9.0 anyway.
* Update Fossilize to a newer commit which builds with GCC 10.
* apt.llvm.org repositories are no longer needed.
* Use an SPIRV-LLVM-Translator commit which builds with LLVM 11.0.1.
* Install XCB packages from Debian, 1.13 fails to build with Python 3.9.
* Install wayland-protocols from Debian, 1.12 is too old for
  libgtk-3-dev in bullseye.

LLVM 7/8 packages are no longer available.

Also adapt expected test results to Xvfb now exposing multi-samle
GLXFBConfigs.

v2:
* Install clang instead of clang-11.

Closes: #3124
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Part-of: <!9833>

af0fde95

ci: Bump LLVM/clang from 10 to 11 · a3e38e0b

Michel Dänzer authored 3 years ago


Preparation for moving to Debian bullseye, which has packages for LLVM
9 & 11, but not 10.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <!9833>

a3e38e0b

ci: Do not install armhf LLVM packages · bc8e866b

Michel Dänzer authored 3 years ago


LLVM support has been disabled in the meson-armhf job for some time, so
they were unused.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <!9833>

bc8e866b

intel/blorp: Initialize texture_data[0] · efcdc7f7

Michel Dänzer authored 3 years ago


Avoids warning with GCC 10:

../src/intel/blorp/blorp_blit.c: In function 'blorp_nir_combine_samples':
../src/intel/blorp/blorp_blit.c:702:25: error: 'texture_data[0]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  702 |       texture_data[0] = nir_fmul(b, texture_data[0],
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  703 |                                  nir_imm_float(b, 1.0 / tex_samples));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <mesa/mesa!9833>

efcdc7f7

r600/sb: Use assignments for resetting struct r600_sb::literal · 8ad26e67

Pierre-Eric Pelloux-Prayer authored 3 years ago


Avoids warning with newer GCC:

../src/gallium/drivers/r600/sb/sb_sched.cpp: In member function 'void r600_sb::literal_tracker::reset()':
../src/gallium/drivers/r600/sb/sb_sched.cpp:1953:26: error: 'void* memset(void*, int, size_t)' clearing an object of non-trivial type 'struct r600_sb::literal'; use assignment or value-initialization instead [-Werror=class-memaccess]
 1953 |  memset(lt, 0, sizeof(lt));
      |                          ^
In file included from ../src/gallium/drivers/r600/sb/sb_sched.cpp:35:
../src/gallium/drivers/r600/sb/sb_bc.h:409:8: note: 'struct r600_sb::literal' declared here
  409 | struct literal {
      |        ^~~~~~~

[ Michel Dänzer:
* Expanded commit log
v2:
* Clear all 4 members of lt[4] (Eric Anholt)
]

Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <mesa/mesa!9833>

8ad26e67

ci: Fix HTML summary path for piglit OpenCL job artifacts · cf3d4ea5
Michel Dänzer authored 3 years ago
```
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <!9833>
```
cf3d4ea5

ci/v3dv: skip Vulkan waiver tests · 7c6bcc8e

Juan A. Suárez authored 3 years ago


Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <mesa/mesa!10231>

7c6bcc8e

radv: fix conditions for running nir_opt_vectorize · e3ebc1ca

Rhys Perry authored 3 years ago


No fossil-db changes, probably because all fp16 shaders have at least one
16-bit mov or vec2 somehwere.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <mesa/mesa!10227>

e3ebc1ca

tu: Expose VK_KHR_relaxed_block_layout · 271c18f4

Connor Abbott authored 4 years ago

This was absorbed into Vulkan 1.1, but we forgot to expose it
separately. It's a subset of what's allowed by
VK_EXT_scalar_block_layout.

Part-of: <mesa/mesa!8695>

271c18f4

tu: Expose VK_KHR_spirv_1_4 and VK_EXT_scalar_block_layout · 765c3b85

Connor Abbott authored 4 years ago

VK_KHR_spirv_1_4 is trivial because vtn already supports all the added
SPIR-V features that aren't gated behind Vulkan extensions. I've
observed some robustness2 CTS tests requiring this. However there are
a few tests currently failing due to lacking spilling.

VK_EXT_scalar_block_layout should also be trivial, since support for
"straddling" UBO loads was added recently for other reasons. This is
used by every robustness2 CTS test.

Part-of: <mesa/mesa!8695>

765c3b85

v3d: do not emit attribute if has no resource · 45ae0e9f

Juan A. Suárez authored 3 years ago

When emitting the GL shader state, verify the attribute has a resource
bound; otherwise just skip it

v2 (chema):
 - Move comment
 - Set num_elements_to_emit = 1 if it is 0

Cc: mesa-stable
Closes: #4205


Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <!8826>

45ae0e9f

v3dv/pipeline: reduce descriptor_map size · fc17231b

Alejandro Piñeiro authored 3 years ago


64 was a temporary and conservative "big enough" value, but we can do
better.

Note that as mentioned on the FIXME, we could be even more detailed,
adding a descriptor map allocate method based on the descriptor
type. That would mean more individual allocations, and slightly more
complexity.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <mesa/mesa!10207>

fc17231b

Admin message