Commits · native-context-iris · Dmitry Osipenko / mesa

Apr 03, 2025

iris: Skip 2MB BO size alignment optimization for virtio-gpu native context · be4de2a6

Dmitry Osipenko authored 10 months ago and

Dmitry Osipenko committed 4 days ago


Prefer smaller BO sizes for virtio context. Larger BOs take much more
time to allocate and map in a VM, resulting in a too big performance
overhead.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>

be4de2a6

intel: Add virtio-gpu native context · 7ae0458b

Dmitry Osipenko authored 8 months ago


Support virtio-intel native DRM context. Virtio-intel works by passing
ioctl's from guest to host for execution, utilizing available VirtIO-GPU
infrastructure.

This patch adds initial experimental native context support for TigerLake+
GPUs using i915 KMD UAPI.

Compile Mesa with -Dintel-virtio-experimental=true to enable virtio-intel
native context support.

Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>

7ae0458b

intel: Check for userptr UAPI presence · 7d7b6c7d

Dmitry Osipenko authored 10 months ago and

Dmitry Osipenko committed 4 days ago

Check whether userptr UAPI presents and disable userptr features if not.
Kernel i915 driver has config option that disables userptr ioctl. The
ioctl also may not present in a case of virtio native context driver.

Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>

7d7b6c7d

virtio/vdrm: Add vtest backend · 9780c1ba

Rob Clark authored 1 year ago and

Dmitry Osipenko committed 4 days ago


This allows for testing drm native ctx support without spinning up a VM.

Signed-off-by: Rob Clark <robdclark@chromium.org>

9780c1ba

vulkan: Use syncobj shim · 4eebbc8d

Rob Clark authored 5 months ago and

Dmitry Osipenko committed 4 days ago


This will allow syncobj use in cases where the process does not have
direct rendernode access (ex, vtest).

An alternative would be an alternate vk_sync_type implementation, but
the WSI code was also directly using drm syncobjs.

Signed-off-by: Rob Clark <robdclark@chromium.org>

4eebbc8d

util: Add drmSyncobj shim · 9cecd4b9
Rob Clark authored 5 months ago and Dmitry Osipenko committed 4 days ago
```
Signed-off-by: Rob Clark <robdclark@chromium.org>
```
9cecd4b9

tu: disable logic operations for float and sRGB formats · 335cc960

Zan Dobersek authored 1 week ago and

Marge Bot committed 5 days ago


Per spec, logic operations between fragment values and color attachments
should be disabled when attachments are using float or sRGB formats.
Regardless of attachment's format, enabled logic operations should keep
blending disabled.

Fixes: dEQP-VK.pipeline.*.logic_op_na_formats.*

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <!34212>

335cc960

etnaviv: add context flush sw query · d9176252

Lucas Stach authored 6 days ago and

Marge Bot committed 5 days ago


Context flushes can be caused by all kinds of operations that aren't
obvious to a GL API user. As those are quite heavy-weight operations
it is nice to have some insight into how many of those are happening
per frame. Add a sw query to make this information easily accessible.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <!34350>

d9176252

radv: video: rework maxActiveReferenceSlot/MaxDpbSlots · ee535aa0

Stéphane Cerveau authored 2 months ago and

Marge Bot committed 5 days ago


For the pReferenceSlots.slotIndex, the max
value should the maxDpbSlots which is
h264: 16 + 1
h265 : 15 + 2
av1: 7+2

Fixing SVA_CL1_E test vector in JVT-AVC_V1
fluster test suite.

Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <!33094>

ee535aa0

spirv: clamp/sign-extend non 32bit ldexp exponents · c21a5344

Georg Lehmann authored 3 weeks ago and

Marge Bot committed 5 days ago


GLSL.std.450 allows any integer size here.
OpenCL only allows i32.

Cc: mesa-stable

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <!34071>

c21a5344

ir3/ra: create merge sets for splits/collects inserted for shared RA · 45a5ccbf

Job Noorman authored 3 weeks ago and

Marge Bot committed 5 days ago


Since shared RA happens after creating merge sets, newly inserted
splits/collects did not have merge sets created for them. Fix this by
creating merge sets for new instructions after shared RA.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <!33319>

45a5ccbf

ir3: add ir3_aggressive_coalesce helper · 0cafd07b

Job Noorman authored 3 weeks ago and

Marge Bot committed 5 days ago


To allow us to create merge sets outside of ir3_merge_regs.c.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <!33319>

0cafd07b

ir3/ra: assign interval offsets to new defs after shared RA · a0db2f97

Job Noorman authored 3 weeks ago and

Marge Bot committed 5 days ago


Shared RA might insert new defs to be handled by regular RA (e.g.,
shared spills). However, their interval offsets were not initialized
which caused their intervals to sometimes be mistakenly matched with
those containing offset 0. Fix this by calling index_merge_sets after
shared RA and modifying that function to only index new defs in that
case.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fa22b090 ("ir3/ra: Add specialized shared register RA/spilling")
Part-of: <!33319>

a0db2f97

ci: rename ci-tron priority tag to avoid conflict with the generic fdo runners · 6331441e

Eric Engestrom authored 5 days ago and

Marge Bot committed 5 days ago

Otherwise, ci-tron runners with that tag could pick up jobs meant for the fdo
runners, as happened here:
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/73883719

The inverse (fdo runners picking up a job meant for a ci-tron runner) is not
possible though, as ci-tron jobs always include a `farm:$RUNNER_FARM_LOCATION`
tag, so the problem only exists in the other direction.

Part-of: <!34358>

6331441e

ci/build: drop LTO from fedora build · f84578e3

Eric Engestrom authored 1 week ago and

Marge Bot committed 5 days ago

It's been broken for a few months by now and nobody has been interested
in fixing it, so let's drop LTO so that we get the rest of the benefits
from having that build at all.

Part-of: <!34318>

f84578e3

radv: rework suspend/resume user conditional rendering · ef3363ef
Samuel Pitoiset authored 6 days ago and Marge Bot committed 5 days ago
```
Better to suspend/resume in the top level function.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <!34338>
```
ef3363ef
radv: add new helper to suspend/resume user conditional rendering · 4bc971a0
Samuel Pitoiset authored 6 days ago and Marge Bot committed 5 days ago
```
Instead of duplicating same code everywhere.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <!34338>
```
4bc971a0

radv: fix ignoring conditional rendering with vkCmdResolveImage() · 4d1d6d41

Samuel Pitoiset authored 6 days ago and

Marge Bot committed 5 days ago


This command isn't supposed to be affected by conditional rendering.

This fixes new VKCTS coverage
dEQP-VK.conditional_rendering.conditional_ignore.resolve_image*.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <!34338>

4d1d6d41

ir3: make shpe a terminator · dd1ba747

Job Noorman authored 1 week ago and

Marge Bot committed 5 days ago


shpe is a bit of a special instruction: it's not really a terminator
(i.e., it does not perform a jump) but it does have to stay at the end
of its block. Up to now, we tried to enforce this by creating const
write barriers on shpe; the assumption being that everything that
happens in the preamble ends in a write to the const file so shpe stays
at the end. Alas, it turns out this is not true: things like sampler
prefetches do not write the const file and nothing was preventing those
from being scheduled after shpe.

Instead of trying to create even more barrier dependencies, fix this by
making shpe a terminator. Both sched and postsched treat terminators
specially to make sure they always stay at the end of their block.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <!34290>

dd1ba747

ir3: Fix shaders that write only color classified as empty · f5019ee0

Danylo Piliaiev authored 6 days ago and

Marge Bot committed 5 days ago


Shader may have zero instructions and no prefetches but have inputs
that without modifications are used as output.

Fixed vkd3d test:
 test_depth_bias_behaviour

Fixes: b0a98d3b
("ir3: Detect empty fragment shaders")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <mesa/mesa!34348>

f5019ee0

tu: Implement VK_QCOM_fragment_density_map_offset · 75178c46
Connor Abbott authored 1 month ago and Marge Bot committed 5 days ago
```
Part-of: <!33500>
```
75178c46

tu/fdm: Skip some patchpoints when binning · 7351f8d5

Connor Abbott authored 1 month ago and

Marge Bot committed 5 days ago

In order to implement FDM offset, we will have to offset the viewport
and scissor in the binning pass. In order to do this, we have to pass a
bin with nonsensical negative offsets to the patchpoint function, which
would result in asserts when patching the load/store sequences. But we
don't really need to patch these anyways as they are unused during
binning, so add the ability to skip them when binning. FS params and
some implementations of CmdClearAttachments (that don't contribute to
visibility) can similarly be skipped.

Part-of: <!33500>

7351f8d5

tu: Fix CmdClearAttachments with fragment density map · df0c17f7

Connor Abbott authored 2 months ago and

Marge Bot committed 5 days ago

The clear may be a partial clear, in which case we need to make sure
that the clear rectangle is transformed into GMEM space so that it is
clipped correctly.

Part-of: <!33500>

df0c17f7

tu: Split out part of tiling config to vsc config · 0d4eed0e

Connor Abbott authored 2 months ago and

Marge Bot committed 5 days ago

For FDM offset, we will need to expand the number of bins by 1, which
can change how pipes are allocated. We don't necessarily know whether
FDM offset will be used when creating the VkFramebuffer, so we'll have
to create two different configs when FDM is enabled. Split out the parts
that are affected by the number of bins into a separate "VSC config"
struct that will be duplicated with FDM offset.

Part-of: <!33500>

0d4eed0e

tu: Only allow power-of-two fragment areas · 304af47b

Connor Abbott authored 3 weeks ago and

Marge Bot committed 5 days ago

Non-power-of-two fragment areas can result in precision loss and missed
fragments, which was seen in an upcoming CTS test.

Part-of: <!33500>

304af47b

intel/compiler: fix lingering i965 references · 5ad00bae
Caleb Callaway authored 5 days ago and Marge Bot committed 5 days ago
```
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <!34351>
```
5ad00bae

ir3: run opt_if after opt_vectorize · 02ff26be

Job Noorman authored 1 week ago and

Marge Bot committed 5 days ago


nir_opt_vectorize could replace swizzled movs with vectorized movs in a
different block. If this happens with swizzled movs in a then block, it
could leave this block empty. ir3 assumes only the else block can be
empty (e.g., when lowering predicates) so make sure ifs are in that
canonical form again.

This fixes empty predication blocks in some shaders, for example:

predt
predf
...
prede

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <!34272>

02ff26be

Apr 02, 2025

ir3: don't sync every TCS/GEOM block · ee0ee2a3

Job Noorman authored 1 week ago and

Marge Bot committed 5 days ago


TCS/GEOM shaders need (sy)(ss) on their first instruction but we
accidentally set it on the first instruction of every block.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <!34257>

ee0ee2a3

ir3: Split mad with scalar ALU · 3ba315f2

Connor Abbott authored 2 weeks ago and

Marge Bot committed 5 days ago

At least on all a6xx/a7xx, mad.f32 and mad.f16 are not fused. This means
that when the sources of a NIR ffma are all uniform we can split it in
two to execute it on the scalar ALU. This is important to reduce
register pressure and make more preambles executed early.

On fossil-db the statistics are mostly a wash as expected, but with
early preambles increasing dramatically:

Totals:
MaxWaves: 2249180 -> 2249230 (+0.00%); split: +0.01%, -0.01%
Instrs: 49668884 -> 49662951 (-0.01%); split: -0.12%, +0.11%
CodeSize: 103662656 -> 103831154 (+0.16%); split: -0.22%, +0.38%
NOPs: 8502571 -> 8495568 (-0.08%); split: -0.61%, +0.53%
MOVs: 1554442 -> 1538804 (-1.01%); split: -2.01%, +1.01%
Full: 1820906 -> 1814292 (-0.36%); split: -0.39%, +0.03%
(ss): 1168628 -> 1165868 (-0.24%); split: -1.01%, +0.78%
(sy): 616751 -> 616521 (-0.04%); split: -0.52%, +0.49%
(ss)-stall: 4384397 -> 4361662 (-0.52%); split: -1.44%, +0.93%
(sy)-stall: 17850227 -> 17858949 (+0.05%); split: -0.58%, +0.63%

Early-preamble: 102262 -> 115702 (+13.14%)
Cat0: 9375820 -> 9367978 (-0.08%); split: -0.57%, +0.48%
Cat1: 2470212 -> 2454318 (-0.64%); split: -1.28%, +0.64%
Cat2: 18673655 -> 18707106 (+0.18%)
Cat3: 14227810 -> 14211106 (-0.12%)
Cat5: 1424184 -> 1424150 (-0.00%)
Cat7: 1404718 -> 1405808 (+0.08%); split: -0.39%, +0.47%
Part-of: <!34115>

3ba315f2

vulkan/wsi/headless: Remove unnecessary wsi_configure_image() · 64980c4f

Sviatoslav Peleshko authored 1 week ago and

Marge Bot committed 5 days ago

wsi_configure_image() with the same info is already called by
configure_image() in wsi_swapchain_init(), so this second call is
unnecessary. Furthermore, calling it the second time caused a memory
leak of queue family indices array.

Fixes: d4a2c0fc ("vulkan/wsi: add a headless swapchain implementation/option")
Closes: mesa/mesa#12811


Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <mesa/mesa!34194>

64980c4f

ci: Re enable fd-farm · 2b2bcbb9
Rob Clark authored 2 weeks ago and Marge Bot committed 5 days ago
```
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <mesa/mesa!34263>
```
2b2bcbb9

intel/decoder: free memory in error case · ff4b1b1e

Dylan Baker authored 2 weeks ago and

Marge Bot committed 5 days ago


This was handled in other instances in a previous patch, but this
instance remains, as the zlib decompression routine is slightly
different.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <!34118>

ff4b1b1e

intel/tools: move ascii85_decode to common code · da14c0af

Dylan Baker authored 2 weeks ago and

Marge Bot committed 5 days ago


We have 3 copies of this function, so put it in the shared static
library.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <!34118>

da14c0af

intel/tools: deduplicate zlib_inflate function · 7b791cd0

Dylan Baker authored 2 weeks ago and

Marge Bot committed 5 days ago


There are three copies of this function, all of them have the same
memory leak in them. Instead of fixing them one by one, just use a
common implementation for all three, since they already all have a
shared helper lib.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <mesa/mesa!34118>

7b791cd0

docs: add sha sum for 25.0.3 · a1a9c7cd
Eric Engestrom authored 6 days ago and Marge Bot committed 5 days ago
```
Part-of: <!34349>
```
a1a9c7cd
docs: add release notes for 25.0.3 · d9d8a584
Eric Engestrom authored 6 days ago and Marge Bot committed 5 days ago
```
Part-of: <mesa/mesa!34349>
```
d9d8a584
docs: update calendar for 25.0.3 · 6f1d502a
Eric Engestrom authored 6 days ago and Marge Bot committed 5 days ago
```
Part-of: <mesa/mesa!34349>
```
6f1d502a

radeonsi/vcn: Disable AV1 unidir compound with rate control · a5edb9fa

David Rosca authored 1 week ago and

Marge Bot committed 6 days ago


It causes significant bitrate overshoot currently.

Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <!34237>

a5edb9fa

virgl: fix typo inverting a condition · 20630849
Eric Engestrom authored 6 days ago and Marge Bot committed 6 days ago
```
Fixes: 8513bcbd ("virtio: Remove virglrenderer_hw.h entirely")
Part-of: <!34340>
```
20630849

tu: Fix layer_count with dynamic rendering + multiview · 15660caa

Connor Abbott authored 3 weeks ago and

Marge Bot committed 6 days ago

With "classic" renderpasses, the VkFramebuffer's layerCount must be 1 if
multiview is enabled. We accidentally rely on this to not disable GMEM
for multiview, and possibly for other things too. Apparently the dynamic
rendering equivalent, VkRenderingInfo::layerCount, can be anything when
multiview is enabled, and some CTS tests set it to the number of views.
Sanitize it when constructing the internal framebuffer for dynamic
rendering.

Cc: mesa-stable
Part-of: <mesa/mesa!34080>

15660caa

Admin message

Admin message