- Apr 03, 2025
-
-
Prefer smaller BO sizes for virtio context. Larger BOs take much more time to allocate and map in a VM, resulting in a too big performance overhead. Signed-off-by:
Dmitry Osipenko <digetx@gmail.com>
-
Dmitry Osipenko authored
Support virtio-intel native DRM context. Virtio-intel works by passing ioctl's from guest to host for execution, utilizing available VirtIO-GPU infrastructure. This patch adds initial experimental native context support for TigerLake+ GPUs using i915 KMD UAPI. Compile Mesa with -Dintel-virtio-experimental=true to enable virtio-intel native context support. Signed-off-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com>
-
Check whether userptr UAPI presents and disable userptr features if not. Kernel i915 driver has config option that disables userptr ioctl. The ioctl also may not present in a case of virtio native context driver. Signed-off-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com>
-
This allows for testing drm native ctx support without spinning up a VM. Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
This will allow syncobj use in cases where the process does not have direct rendernode access (ex, vtest). An alternative would be an alternate vk_sync_type implementation, but the WSI code was also directly using drm syncobjs. Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Per spec, logic operations between fragment values and color attachments should be disabled when attachments are using float or sRGB formats. Regardless of attachment's format, enabled logic operations should keep blending disabled. Fixes: dEQP-VK.pipeline.*.logic_op_na_formats.* Signed-off-by:
Zan Dobersek <zdobersek@igalia.com> Part-of: <!34212>
-
Context flushes can be caused by all kinds of operations that aren't obvious to a GL API user. As those are quite heavy-weight operations it is nice to have some insight into how many of those are happening per frame. Add a sw query to make this information easily accessible. Signed-off-by:
Lucas Stach <l.stach@pengutronix.de> Reviewed-by:
Christian Gmeiner <cgmeiner@igalia.com> Part-of: <!34350>
-
For the pReferenceSlots.slotIndex, the max value should the maxDpbSlots which is h264: 16 + 1 h265 : 15 + 2 av1: 7+2 Fixing SVA_CL1_E test vector in JVT-AVC_V1 fluster test suite. Reviewed-by:
David Rosca <david.rosca@amd.com> Part-of: <!33094>
-
GLSL.std.450 allows any integer size here. OpenCL only allows i32. Cc: mesa-stable Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!34071>
-
Since shared RA happens after creating merge sets, newly inserted splits/collects did not have merge sets created for them. Fix this by creating merge sets for new instructions after shared RA. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <!33319>
-
To allow us to create merge sets outside of ir3_merge_regs.c. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <!33319>
-
Shared RA might insert new defs to be handled by regular RA (e.g., shared spills). However, their interval offsets were not initialized which caused their intervals to sometimes be mistakenly matched with those containing offset 0. Fix this by calling index_merge_sets after shared RA and modifying that function to only index new defs in that case. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Fixes: fa22b090 ("ir3/ra: Add specialized shared register RA/spilling") Part-of: <!33319>
-
Otherwise, ci-tron runners with that tag could pick up jobs meant for the fdo runners, as happened here: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/73883719 The inverse (fdo runners picking up a job meant for a ci-tron runner) is not possible though, as ci-tron jobs always include a `farm:$RUNNER_FARM_LOCATION` tag, so the problem only exists in the other direction. Part-of: <!34358>
-
It's been broken for a few months by now and nobody has been interested in fixing it, so let's drop LTO so that we get the rest of the benefits from having that build at all. Part-of: <!34318>
-
Better to suspend/resume in the top level function. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!34338>
-
Instead of duplicating same code everywhere. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!34338>
-
This command isn't supposed to be affected by conditional rendering. This fixes new VKCTS coverage dEQP-VK.conditional_rendering.conditional_ignore.resolve_image*. Cc: mesa-stable Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!34338>
-
shpe is a bit of a special instruction: it's not really a terminator (i.e., it does not perform a jump) but it does have to stay at the end of its block. Up to now, we tried to enforce this by creating const write barriers on shpe; the assumption being that everything that happens in the preamble ends in a write to the const file so shpe stays at the end. Alas, it turns out this is not true: things like sampler prefetches do not write the const file and nothing was preventing those from being scheduled after shpe. Instead of trying to create even more barrier dependencies, fix this by making shpe a terminator. Both sched and postsched treat terminators specially to make sure they always stay at the end of their block. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <!34290>
-
Shader may have zero instructions and no prefetches but have inputs that without modifications are used as output. Fixed vkd3d test: test_depth_bias_behaviour Fixes: b0a98d3b ("ir3: Detect empty fragment shaders") Signed-off-by:
Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <mesa/mesa!34348>
-
Part-of: <!33500>
-
In order to implement FDM offset, we will have to offset the viewport and scissor in the binning pass. In order to do this, we have to pass a bin with nonsensical negative offsets to the patchpoint function, which would result in asserts when patching the load/store sequences. But we don't really need to patch these anyways as they are unused during binning, so add the ability to skip them when binning. FS params and some implementations of CmdClearAttachments (that don't contribute to visibility) can similarly be skipped. Part-of: <!33500>
-
The clear may be a partial clear, in which case we need to make sure that the clear rectangle is transformed into GMEM space so that it is clipped correctly. Part-of: <!33500>
-
For FDM offset, we will need to expand the number of bins by 1, which can change how pipes are allocated. We don't necessarily know whether FDM offset will be used when creating the VkFramebuffer, so we'll have to create two different configs when FDM is enabled. Split out the parts that are affected by the number of bins into a separate "VSC config" struct that will be duplicated with FDM offset. Part-of: <!33500>
-
Non-power-of-two fragment areas can result in precision loss and missed fragments, which was seen in an upcoming CTS test. Part-of: <!33500>
-
Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <!34351>
-
nir_opt_vectorize could replace swizzled movs with vectorized movs in a different block. If this happens with swizzled movs in a then block, it could leave this block empty. ir3 assumes only the else block can be empty (e.g., when lowering predicates) so make sure ifs are in that canonical form again. This fixes empty predication blocks in some shaders, for example: predt predf ... prede Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <!34272>
-
- Apr 02, 2025
-
-
TCS/GEOM shaders need (sy)(ss) on their first instruction but we accidentally set it on the first instruction of every block. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <!34257>
-
At least on all a6xx/a7xx, mad.f32 and mad.f16 are not fused. This means that when the sources of a NIR ffma are all uniform we can split it in two to execute it on the scalar ALU. This is important to reduce register pressure and make more preambles executed early. On fossil-db the statistics are mostly a wash as expected, but with early preambles increasing dramatically: Totals: MaxWaves: 2249180 -> 2249230 (+0.00%); split: +0.01%, -0.01% Instrs: 49668884 -> 49662951 (-0.01%); split: -0.12%, +0.11% CodeSize: 103662656 -> 103831154 (+0.16%); split: -0.22%, +0.38% NOPs: 8502571 -> 8495568 (-0.08%); split: -0.61%, +0.53% MOVs: 1554442 -> 1538804 (-1.01%); split: -2.01%, +1.01% Full: 1820906 -> 1814292 (-0.36%); split: -0.39%, +0.03% (ss): 1168628 -> 1165868 (-0.24%); split: -1.01%, +0.78% (sy): 616751 -> 616521 (-0.04%); split: -0.52%, +0.49% (ss)-stall: 4384397 -> 4361662 (-0.52%); split: -1.44%, +0.93% (sy)-stall: 17850227 -> 17858949 (+0.05%); split: -0.58%, +0.63% Early-preamble: 102262 -> 115702 (+13.14%) Cat0: 9375820 -> 9367978 (-0.08%); split: -0.57%, +0.48% Cat1: 2470212 -> 2454318 (-0.64%); split: -1.28%, +0.64% Cat2: 18673655 -> 18707106 (+0.18%) Cat3: 14227810 -> 14211106 (-0.12%) Cat5: 1424184 -> 1424150 (-0.00%) Cat7: 1404718 -> 1405808 (+0.08%); split: -0.39%, +0.47% Part-of: <!34115>
-
wsi_configure_image() with the same info is already called by configure_image() in wsi_swapchain_init(), so this second call is unnecessary. Furthermore, calling it the second time caused a memory leak of queue family indices array. Fixes: d4a2c0fc ("vulkan/wsi: add a headless swapchain implementation/option") Closes: mesa/mesa#12811 Signed-off-by:
Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <mesa/mesa!34194>
-
Signed-off-by:
Rob Clark <robdclark@chromium.org> Part-of: <mesa/mesa!34263>
-
This was handled in other instances in a previous patch, but this instance remains, as the zlib decompression routine is slightly different. Reviewed-by:
Tapani Pälli <tapani.palli@intel.com> Part-of: <!34118>
-
We have 3 copies of this function, so put it in the shared static library. Reviewed-by:
Tapani Pälli <tapani.palli@intel.com> Part-of: <!34118>
-
There are three copies of this function, all of them have the same memory leak in them. Instead of fixing them one by one, just use a common implementation for all three, since they already all have a shared helper lib. Reviewed-by:
Tapani Pälli <tapani.palli@intel.com> Part-of: <mesa/mesa!34118>
-
Part-of: <!34349>
-
Part-of: <mesa/mesa!34349>
-
Part-of: <mesa/mesa!34349>
-
It causes significant bitrate overshoot currently. Cc: mesa-stable Reviewed-by:
Ruijing Dong <ruijing.dong@amd.com> Part-of: <!34237>
-
With "classic" renderpasses, the VkFramebuffer's layerCount must be 1 if multiview is enabled. We accidentally rely on this to not disable GMEM for multiview, and possibly for other things too. Apparently the dynamic rendering equivalent, VkRenderingInfo::layerCount, can be anything when multiview is enabled, and some CTS tests set it to the number of views. Sanitize it when constructing the internal framebuffer for dynamic rendering. Cc: mesa-stable Part-of: <mesa/mesa!34080>
-