Commits · main · The Mitchell Special / mesa

Jun 26, 2023

anv: fix to set predicted weight tables correctly. · 9f4299d6

Hyunjun Ko authored 1 year ago


Fixes: 8d519eb5 ("anv: add initial video decode support for h265")
Closes: mesa/mesa#9214

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <mesa/mesa!23790>

9f4299d6

intel/genxml: changes the type for predicted weight to unsigned. · b8dc7675

Hyunjun Ko authored 1 year ago


Turned out to be unsigned here after some experiments.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <mesa/mesa!23790>

b8dc7675

vulkan/video: keep delta weight and offsets of predicted weight tables in h265 slice parsing · e2f95ad2
Hyunjun Ko authored 1 year ago
```
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <mesa/mesa!23790>
```
e2f95ad2

Jun 25, 2023

vulkan: Update XML and headers to 1.3.255 · c421ecea
Caio Oliveira authored 1 year ago and Marge Bot committed 1 year ago
```
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <mesa/mesa!23837>
```
c421ecea

vulkan: Add NV suffix to VK_NV_cooperative_matrix feature names · 73af0475

Caio Oliveira authored 1 year ago and

Marge Bot committed 1 year ago


In the new Vulkan Headers, VK_KHR_cooperative_matrix gets added and the feature
names are the same.

Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <mesa/mesa!23837>

73af0475

rusticl/program: skip linking compiled binaries · 07597596

Karol Herbst authored 1 year ago


Applications can do their own caching, but are in any case required to
properly "compiler" the binaries via clBuildProgram or clCompileProgram +
clLinkPrograms.

In any case, there is no point building something if we already have the
result.

Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!23847>

07597596

Jun 24, 2023

rusticl: bump bindgen requirement · 18f1087a

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


Apparently on some ARM systems any older bindgen version crashes.

Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!23840>

18f1087a

nir: Add function nir_function_set_impl · 5b294637

Yonggang Luo authored 1 year ago and

Marge Bot committed 1 year ago


This function is added for create strong relationship between
nir_function_impl and nir_function.

So that nir_function->impl->function == nir_function is always true when
(nir_function->impl != NULL && nir_function->impl != NIR_SERIALIZE_FUNC_HAS_IMPL)

And indeed this invariant is already done in functions validate_function and validate_function_impl
of nir_validate

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <mesa/mesa!23820>

5b294637

vtn: Do not assign main_entry_point->impl twice · 9fa38cf1

Yonggang Luo authored 1 year ago and

Marge Bot committed 1 year ago


Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <mesa/mesa!23820>

9fa38cf1

draw: Update the comment and function name to match the type · 0d9f4743

Yonggang Luo authored 1 year ago


Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <mesa/mesa!23845>

0d9f4743

draw: Replace usage of ubyte/ushort/uint with uint8_t/uint16_t/uint32_t in draw_pt_vsplit.c · e7f0dd27

Yonggang Luo authored 1 year ago


This can not be done with tools, so do it manually

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <mesa/mesa!23845>

e7f0dd27

draw: Replace usage of boolean/TRUE/FALSE with bool/true/false in draw_pt_vsplit* · f35ebd22

Yonggang Luo authored 1 year ago


These change can not be done with tools, so do it manually

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <mesa/mesa!23845>

f35ebd22

rusticl/mesa: create proper build-id hash for the disk cache · fbe9a7ca

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


Without generating a proper timestamp for the disk cache, we pull old
binaries out of the disk cache, potentially being buggy or simply
outdated.

Once meson 1.2 lands we can easily pull in LLVM functions.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!21612>

fbe9a7ca

rusticl/meson: extract common bindgen rust args · 29b93251

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!21612>

29b93251

rusticl: generate bindings for build-id stuff · c8963738

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!21612>

c8963738

rusticl: structurize and reorder mesa binding args · d14af004

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago

It became quite a mess, I had enough 



Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!21612>

d14af004

v3dv: replace boolean and uint with bool and size_t · 33790844

Eric Engestrom authored 1 year ago and

Marge Bot committed 1 year ago

There's no reason to use the gallium `p_compiler.h` types in vulkan code.

Inspired by mesa/mesa!23577

,
but using `size_t` for `ulist_data_size` because its two users are
`blob_read_bytes()` and `memcpy()`, both of which expect a `size_t`.

Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <mesa/mesa!23795>

33790844

docs/coding-style: add pre-commit hook fallback for clang-format · fa8a2326
Eric Engestrom authored 1 year ago and Marge Bot committed 1 year ago
```
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <mesa/mesa!23722>
```
fa8a2326
docs/coding-style: add example emacs config for clang-format · 270d898e
Eric Engestrom authored 1 year ago and Marge Bot committed 1 year ago
```
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <mesa/mesa!23722>
```
270d898e
docs/coding-style: add example vim config for clang-format · 342196f7
Eric Engestrom authored 1 year ago and Marge Bot committed 1 year ago
```
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <mesa/mesa!23722>
```
342196f7

r300: properly count maximum used register index · 89873e5e

Pavel Ondračka authored 1 year ago and

Marge Bot committed 1 year ago

The problem is when we have DP2 or DP3 instruction that writes a w
channel like here:

DP3 temp[148].w, -temp[147].xyz_, temp[57].xyz_;

will get pair-converted to

src0.xyz = temp[147], src1.xyz = temp[57]
DP3, -src0.xyz, src1.xyz
DP3 temp[148].w, -src0._, src0._

where the alpha instruction is a basically just a replicate of the
result from the RGB sub intruction. However the destination register
index in the RBG slot is also 148. Now we pair-schedule and regalloc

src0.xyz = temp[13], src1.xyz = temp[3]
DP3, -src0.xyz, src1.xyz
DP3 temp[3].w, -src0._, src0._

We properly regalloc the alpha channel, but we obviously skip the rgb,
because the writemask is empty there. However when we emit the shader
later, we actually check the number of used regs based on the maximum
used register index and we don't consider the writemasks, so we would
think we use 149 temps. AFAIK the shader would be still completelly OK.
But we would think it hits the HW limits and used a dummy one instead.

Fix this by checking for empty writemasks when marking the registers as
used.

GAINED: shaders/glmark/1-22.shader_test FS

This is also needed to prevent another lost Trine shader from
mesa/mesa!23089



Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <mesa/mesa!23838>

89873e5e

anv: Only expose video decode bits with KHR_video_decode_queue · 561cce32

Matt Turner authored 1 year ago and

Marge Bot committed 1 year ago

This fixes dEQP-VK.api.info.format_properties.g8_b8r8_2plane_420_unorm
in combination with the CTS fix from
https://gerrit.khronos.org/c/vk-gl-cts/+/12191

Fixes: 93614817 ("anv: add video format features for the one supported video output format")
Closes: mesa/mesa#8263
Part-of: <mesa/mesa!23776>

561cce32

anv: Pipe anv_physical_device to anv_get_image_format_features2 · 72733504
Matt Turner authored 1 year ago and Marge Bot committed 1 year ago
```
Part-of: <mesa/mesa!23776>
```
72733504

nv50/ir/nir: set numBarriers if we emit an OP_BAR · 02aaf589

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


Even though the field is called `numBarriers` we set it to 1 just like
we do with TGSI. It's unknown on what's the proper behavior here is. But
without this set the GPU will complain to us loudly, so this silences at
least that.

Fixes: a2d7a4f9 ("nv50/ir: convert to scoped_barrier")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <mesa/mesa!23749>

02aaf589

nvc0: fix printing shaders · 69c45278

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <mesa/mesa!23749>

69c45278

rusticl/program: add debugging option to disable SPIR-V validation · 45d86b41

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


This is useful for running applications known to pass in invalid SPIR-V.

Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!23818>

45d86b41

rusticl/program: add debugging for OpenCL C compilation · 2b2a5138

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!23818>

2b2a5138

docs: document CLC_DEBUG · 2362fd50

Karol Herbst authored 1 year ago and

Marge Bot committed 1 year ago


Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Part-of: <mesa/mesa!23818>

2362fd50

intel: Initialize FF_MODE2 on all Gfx12 platforms · 1b3669a1

Kenneth Graunke authored 1 year ago and

Marge Bot committed 1 year ago

On Alchemist, the FF_MODE2 documentation says that we must set the
FF_MODE2 timer values for GS and HS to 224.  The hardware performance
tuning guide also recommends setting the TDS timer to 4.

On Tigerlake, i915 applies workarounds to set the GS timer to 224
(failing to do so can cause HS/DS unit hangs), and the TDS timer to 4
(for performance).  It doesn't currently apply a HS timer there, and
I'm not sure if it's strictly necessary, but given that Alchemist
needed it, and the other two settings matched, let's assume that it
ought to match as well.

Unfortunately, there has been a bug in the i915 workarounds
infrastructure for non-masked context registers where writing one
field of the register zeroes out all the others.  So, I believe the
Tigerlake TDS timer value of 4 isn't being applied correctly there,
though the register is also not readable on that platform which
makes it hard to verify.  So, this may also speed up tessellation.

Closes: mesa/mesa#9233


Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Cc: mesa-stable
Part-of: <mesa/mesa!23839>

1b3669a1

Jun 23, 2023

intel/gfx12.5: Enable L3 partial write merging for compressible surfaces among other cases. · 427fee35

Francisco Jerez authored 1 year ago and

Marge Bot committed 1 year ago


This enables L3 partial write merging for a number of cases that seem
to be getting accidentally disabled by the kernel, which was causing a
serious performance bottleneck on DG2 and MTL platforms.  The
"Compressible Partial Write Merge Enable", "Coherent Partial Write
Merge Enable" and "Cross-Tile Partial Write Merge Enable" bits in
L3SQCREG5 were expected to be enabled by default (and confusingly,
they even read off as enabled if you ran 'intel_reg read 0xb158' on an
idle system), but they are getting clobbered during 3D context
initialization by an i915 workaround.

Enabling L3 partial write merging of compressible surfaces in
particular seems to increase rendering fillrate by over 3x in some
cases (e.g. the
"VulkanFillRate/FillRateGPU/resolution:1[0-3]/format:*/blend:0"
fillrate-bound microbenchmarks).  Significant improvements can also be
reproduced in most real-world workloads we've tested so far,
e.g. Counter Strike GO improves by ~11%, Shadow Of the Tomb Raider
improves by ~5.5%, and AztecRuins-VK improves by ~6.5% on DG2-512 --
Thanks a lot to Caleb Callaway for these figures.  No regressions have
been observed so far.

Even though this patch might strike as surprisingly simple for such a
large payoff, it's the result of Felix DeGrood and I trying to
root-cause the rendering performance gap of DG2 on Linux vs Windows on
and off during the last year, and some of the OA statistics captured
by Felix early this month were greatly helpful for me to connect the
last few dots, so Felix deserves a big chunk of the credit for this
work.

Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <mesa/mesa!23783>

427fee35

ci/fastboot: use gzipped Image to avoid compressing on the runner · d7ec6f17
David Heidelberg authored 1 year ago and Marge Bot committed 1 year ago
```
Faster download, one less step. Win-win.

Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <mesa/mesa!23816>
```
d7ec6f17

frontends/va: fix some coverity scan reported issues · 7d3c29dc

Thong Thai authored 1 year ago and

Marge Bot committed 1 year ago


Added some checks for NULL pointer dereferencing and loop bounds.
v2: Use ARRAY_SIZE instead of magic numbers (@jenatali)

Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <mesa/mesa!23598>

7d3c29dc

meson: Explicitly add "check : false" to a couple instances of run_command · dc93f205

Caio Oliveira authored 1 year ago and

Marge Bot committed 1 year ago

In both cases there's code right after the execution to check the result and
give a proper message.

This gets rid of meson warning

```
WARNING: You should add the boolean check kwarg to the run_command call.
         It currently defaults to false,
         but it will default to true in future releases of meson.
         See also: https://github.com/mesonbuild/meson/issues/9300


```

Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <mesa/mesa!23821>

dc93f205

amd/drm-shim: use fixed-width types · d3e5e04a

Rhys Perry authored 1 year ago and

Marge Bot committed 1 year ago


Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Closes: #9221
Part-of: <!23725>

d3e5e04a

agx: Implement vector live range splitting · 766535c8

Alyssa Rosenzweig authored 2 years ago and

Marge Bot committed 1 year ago


The SSA killer feature is that, under an "optimal" allocator, the number of
registers used (register demand) is *equal* to the number of registers required
(register pressure, the maximum number of variables simultaneously live at any
point in the program). I put "optimal" in scare quotes, because we don't need to
use the exact minimum number of registers as long as we don't sacrifice thread
count or introduce spilling, and using a few extra registers when possible can
help coalesce moves. Details-shmetails.

The problem is that, prior to this commit, our register allocator was not
well-behaved in certain circumstances, and would require an arbitrarily large
number of registers. In particular, since different variables have different
sizes and require contiguous allocation, in large programs the register file may
become fragmented, causing the RA to use arbitrarily many registers despite
having lots of registers free.

The solution is vector live range splitting. First, we calculate the register
pressure (the minimum number of registers that it is theoretically possible to
allocate successfully), and round up to the maximum number of registers we will
actually use (to give some wiggle room to coalesce moves). Then, we will treat
this maximum as a *bound*, requiring that we don't use more registers than
chosen. In the event that register file fragmentation prevents us from finding a
contiguous sequence of registers to allocate a variable, rather than giving up
or using registers we don't have, we shuffle the register file around
(defragmenting it) to make room for the new variable. That lets us use a
few moves to avoid sacrificing thread count or introducing spilling, which is
usually a great choice.

Android GLES3.1 shader-db results are as expected: some noise / small
regressions for instruction count, but a bunch of shaders with improved thread
count. The massive increase in register demand may seem weird, but this is the
RA doing exactly what it's supposed to: using more registers if and only if they
would not hurt thread count. Notice that no programs whatsoever are hurt for
thread count, which is the salient part.

   total instructions in shared programs: 1781473 -> 1781574 (<.01%)
   instructions in affected programs: 276268 -> 276369 (0.04%)
   helped: 1074
   HURT: 463
   Inconclusive result (value mean confidence interval includes 0).

   total bytes in shared programs: 12196640 -> 12201670 (0.04%)
   bytes in affected programs: 1987322 -> 1992352 (0.25%)
   helped: 1060
   HURT: 513
   Bytes are HURT.

   total halfregs in shared programs: 488755 -> 529651 (8.37%)
   halfregs in affected programs: 295651 -> 336547 (13.83%)
   helped: 358
   HURT: 9737
   Halfregs are HURT.

   total threads in shared programs: 18875008 -> 18885440 (0.06%)
   threads in affected programs: 64576 -> 75008 (16.15%)
   helped: 82
   HURT: 0
   Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <!23832>

766535c8

agx/lower_parallel_copy: Lower 64-bit copies · 72e6b683

Alyssa Rosenzweig authored 1 year ago and

Marge Bot committed 1 year ago


To 32-bit. This way we don't get into bad situations where we need to eg swap
unaligned 64-bit values or something funny like that.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <!23832>

72e6b683

agx: Validate predecessor information · bfdaab65
Alyssa Rosenzweig authored 1 year ago and Marge Bot committed 1 year ago
```
Including the new loop header? flag.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <!23832>
```
bfdaab65

agx: Add loop header? flag · 923b9667

Alyssa Rosenzweig authored 1 year ago and

Marge Bot committed 1 year ago


This is useful for deciding whether we need to fix up phis in RA.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <!23832>

923b9667

agx: Recollect stored vectors at their use · a2dbe6b6

Alyssa Rosenzweig authored 2 years ago and

Marge Bot committed 1 year ago


This is Timur's cheesy solution to split-hell.shader_test. Seems to work ok
here.

Before: 94 inst, 588 bytes, 165 halfregs, 1 threads, 0 loops, 0:0 spills:fills
After: 63 inst, 454 bytes, 129 halfregs, 1 threads, 0 loops, 0:0 spills:fills

On Android GLES3.1 shader-db, a few shaders are helped a lot:

   total instructions in shared programs: 1781706 -> 1781473 (-0.01%)
   instructions in affected programs: 4284 -> 4051 (-5.44%)
   helped: 16
   HURT: 2
   Instructions are helped.

   total bytes in shared programs: 12197854 -> 12196640 (<.01%)
   bytes in affected programs: 29526 -> 28312 (-4.11%)
   helped: 20
   HURT: 2
   Bytes are helped.

   total halfregs in shared programs: 489007 -> 488755 (-0.05%)
   halfregs in affected programs: 945 -> 693 (-26.67%)
   helped: 7
   HURT: 0
   Halfregs are helped.

   total threads in shared programs: 18873216 -> 18875008 (<.01%)
   threads in affected programs: 5376 -> 7168 (33.33%)
   helped: 7
   HURT: 0
   Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <!23832>

a2dbe6b6

agx: Extract coordinate register size calculation · 91d98975
Alyssa Rosenzweig authored 1 year ago and Marge Bot committed 1 year ago
```
It will be used for image writes too, not just reads.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <!23832>
```
91d98975

Admin message

Admin message