mesa merge requestshttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests2024-03-28T16:36:50Zhttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28445turnip: Implement CCHE invalidation2024-03-28T16:36:50ZConnor Abbottturnip: Implement CCHE invalidationWe were missing a necessary `CP_CCHE_INVALIDATE` between writes and reads of UCHE. I observed some failures in ray query tests without this.We were missing a necessary `CP_CCHE_INVALIDATE` between writes and reads of UCHE. I observed some failures in ray query tests without this.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28407Draft: vtn: stop validating caps2024-03-27T13:06:51ZAlyssa RosenzweigDraft: vtn: stop validating capsWhen we -- and as usual, by "we" I mean Faith -- first added SPIR-V support to
Mesa, the landscape looked pretty different:
* there were very few caps
* there were very few drivers consuming SPIR-V
* the VVL was young and not used unive...When we -- and as usual, by "we" I mean Faith -- first added SPIR-V support to
Mesa, the landscape looked pretty different:
* there were very few caps
* there were very few drivers consuming SPIR-V
* the VVL was young and not used universally
As such, it made a reasonable amount of sense to collect cap lists in every
Vulkan driver and validate them in vtn, to help check for app bugs.
Things look different today. We now have lots of caps, lots of drivers, and a
competent VVL that has the responsibility of making sure apps don't declare any
caps that the underlying driver doesn't actually support.
So, rip out the validation so we can stop burdening every driver individually
with this. The alternative would be to auto-generate the validation code -- and I
do have patches to get Mesa most of the way there as a fallback -- but you know
who else has that code? The VVL. It's not Mesa's job to validate Vulkan usage,
and we're totally justified in dropping this pile of stuff.
OpenGL SPIR-V seems to me to be DOA so I'm not at all concerned about the loss
of coverage there.
The one open question I have about this plan is the implications for OpenCL.
That said, I'm not too worried in practice... we have just one source of
in-the-wild CL-flavour SPIR-V (Rusticl), with just a few relevant caps compared
to the larger number of graphics caps. If we need to for Rusticl only, we can
add back just a bit of validation as a treat.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28360Draft: tu: Implement VK_KHR_maintenance62024-03-25T08:06:25ZValentine BurleyDraft: tu: Implement VK_KHR_maintenance6### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
Based on RADV.
This is missing support for version 2 of all descriptor binding commands like [RADV](https://gitlab.freedesktop.org/mesa...### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
Based on RADV.
This is missing support for version 2 of all descriptor binding commands like [RADV](https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26757/diffs?commit_id=744cb98bc689225add12881ff59ce574afed8531) and [ANV](https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26842/diffs?commit_id=ce6899d804dfbeab9e05a6e203ec2ed9160979a4) have added as part of their maintenance6 MR. I've asked about it in the NVK maintenance6 MR.
`maxCombinedImageSamplerDescriptorCount` indicates the maximum number of descriptors needed for any of the formats that require a sampler Y′CBCR conversion supported by the implementation. I've set that to 3 but I haven't looked into it if that's correct yet.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28307tu: Fix missing implementation of creating images from swapchains2024-03-26T17:09:02ZValentine Burleytu: Fix missing implementation of creating images from swapchains### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
On top of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28236.
This is my attempt at fixing the missing implementation of cr...### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
On top of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28236.
This is my attempt at fixing the missing implementation of creating images from swapchains,
so that we can enable a couple of extensions that are implemented in common code.
EXT_swapchain_maintenance1 is needed for Gamescope. (It's based on my NVK patch at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28203)
Turnip is missing `VkImageSwapchainCreateInfoKHR` and `VkBindImageMemorySwapchainInfoKHR` which causes the CTS to segfault.
I haven't been able to test the patch myself and @frog said they'd look at it but I've figured I'd open a draft MR to get more eyes on the problem.
The relevant tests are `dEQP-VK.wsi.*.maintenance1.*` and there was a CTS bug that was recently fixed by https://github.com/KhronosGroup/VK-GL-CTS/commit/a8466bf6ea98f6cd6733849ad8081775318a3e3e.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28254Draft: tu: KHR_8bit_storage support2024-03-26T15:37:22ZZan DobersekDraft: tu: KHR_8bit_storage supportSupport for KHR_8bit_storage in Turnip. Addresses #9979.Support for KHR_8bit_storage in Turnip. Addresses #9979.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28109ir3: Do not set clip/cull mask if no one writes clip/cull2024-03-21T18:34:02ZDanylo Piliaievir3: Do not set clip/cull mask if no one writes clip/cull### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
This may happen when undefined value is written into `gl_ClipDistance`,
then it gets optimized out by `nir_opt_undef`.
Fixes GPU faults...### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
This may happen when undefined value is written into `gl_ClipDistance`,
then it gets optimized out by `nir_opt_undef`.
Fixes GPU faults in Tropico 5 (D3D11) on at least A750.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28054ci: run vulkaninfo on drm-shim and store the output2024-03-09T00:12:03ZEric Engestromeric@engestrom.chci: run vulkaninfo on drm-shim and store the outputThis has two benefits:
- it ensures no regression in exposed features can get merged unnoticed
- it allows much more detailed and accurate information than
`docs/features.txt`, and having this information auto-generated means it
can ...This has two benefits:
- it ensures no regression in exposed features can get merged unnoticed
- it allows much more detailed and accurate information than
`docs/features.txt`, and having this information auto-generated means it
can no longer get out of date or have mistakes with less effort
required from developers.
---
Of all the VK drivers:
~lavapipe, ~turnip, ~ANV, ~hasvk, ~RADV, ~NVK are added.
~v3dv and ~panfrost should be supported, but I couldn't figure out how to make their respective drm-shim to work on vulkan; please help :)
~dozen and ~powervr don't have a drm-shim so they can't be supported (yet).
~venus I'm not sure how it would work, but I'm guessing it would be a cross-product of venus over each of the possible underlying driver? I'm not sure how this would be tested in CI, so I'm also leaving it out for now.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27847tu: fix memory leaks in tu_shader2024-03-28T12:08:08ZZan Dobersektu: fix memory leaks in tu_shader### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
```
tu: fix memory leaks in tu_shader
When tu_shader object is destroyed through vk_pipeline_cache, the relevant
destroy callback shoul...### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
```
tu: fix memory leaks in tu_shader
When tu_shader object is destroyed through vk_pipeline_cache, the relevant
destroy callback should relay to the general tu_shader_destroy function
that will also clean up owned resources.
During shader creation, the ir3_shader object should be destroyed once the
shader variants are retrieved. Since those variants are owned by tu_shader
they should be freed up in tu_shader_destroy.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
```https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27776WIP: tu, ir3: VK_KHR_shader_atomic_int64 for >a7402024-03-04T18:34:17ZAmber HarmoniaWIP: tu, ir3: VK_KHR_shader_atomic_int64 for >a740Passes CTS on a740 + custom tests.
I have not been able to test this on real applications (UE5, etc) yet, so just mr-ing for review.
Passing all of `dEQP-VK.glsl.atomic_operations.*64bit*`Passes CTS on a740 + custom tests.
I have not been able to test this on real applications (UE5, etc) yet, so just mr-ing for review.
Passing all of `dEQP-VK.glsl.atomic_operations.*64bit*`https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27547Draft: turnip/ci: add WSI testing to all the deqp-vk jobs2024-02-09T15:54:36ZEric Engestromeric@engestrom.chDraft: turnip/ci: add WSI testing to all the deqp-vk jobshttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27462freedreno, turnip, ir3: Early preamble2024-02-29T17:44:09ZConnor Abbottfreedreno, turnip, ir3: Early preambleIn addition to introducing the scalar ALU, in a650 a copy of the scalar ALU and some other units were added to the HLSQ, which dispatches work to the uSPTPs (shader cores), and it can now execute the preamble part of shaders "early," i.e...In addition to introducing the scalar ALU, in a650 a copy of the scalar ALU and some other units were added to the HLSQ, which dispatches work to the uSPTPs (shader cores), and it can now execute the preamble part of shaders "early," i.e. before work is dispatched, rather than as part of the first wave dispatched to each uSPTP. This can help hide the latency of executing the preamble. Traditionally, the HLSQ also prefetched various state via the `CP_LOAD_STATE` packet, but recently more and more of this functionality has been moving to the preamble, with the implicit expectation that it is executed in an early preamble:
- Since a730 shared consts (Vulkan push constants) are setup in the preamble.
- Since a730 descriptors are prefetched in the preamble.
- Since a750 `CP_LOAD_STATE` to setup constants is now deprecated and severely limited, so most driver params come from UBOs that are pushed to the constant file in a preamble.
As more and more things are being executed in the preamble, hiding the latency becomes more important.
We can't always execute a preamble early. Early preambles cannot have "normal" (i.e. not shared) registers or predicate registers (so they cannot have control flow). If the preamble contains these, then we have to fall back to using it as a normal "late" preamble.
This MR implements early preamble, based on !22075 which implements the scalar ALU. While this doesn't actually depend on that series, without scalar ALU the cases we can use early preamble are severly limited.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26827vk/util: ignore unsupported feature structs take 22023-12-28T07:36:41ZChia-I Wuvk/util: ignore unsupported feature structs take 2```
vk/util: ignore unsupported feature structs take 2
This is a second try of commit eb5bb5c784e ("vk/util: ignore unsupported
feature structs"). It makes sure all drivers have initialized
vk_properties::apiVersion.
``````
vk/util: ignore unsupported feature structs take 2
This is a second try of commit eb5bb5c784e ("vk/util: ignore unsupported
feature structs"). It makes sure all drivers have initialized
vk_properties::apiVersion.
```https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800nir: properly split CS sys vals into API and driver variants (_zero_base)2024-03-25T22:58:43ZKarol Herbstkherbst@redhat.comnir: properly split CS sys vals into API and driver variants (_zero_base)This always annoyed me, that drivers have to deal with both. Just make the "API" variants always lower to the `_zero_base` ones to make it easier on drivers. This also fixes range analysis trying to optimize the "API" sys vals with hardw...This always annoyed me, that drivers have to deal with both. Just make the "API" variants always lower to the `_zero_base` ones to make it easier on drivers. This also fixes range analysis trying to optimize the "API" sys vals with hardware limits, even though they are unbound in e.g. OpenCL.
I also don't like the `_zero_base` naming, but this can kept until we have a better name.
I think I've figured out all regressions, and hopefully this makes handling of compute sysvals more sane in the future.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26578turnip: VK_EXT_host_image_copy2024-02-27T12:06:08ZConnor Abbottturnip: VK_EXT_host_image_copyThis extension will be used to accelerate uploads and downloads of tiled images, especially block-compressed images, by avoiding an extra staging buffer and copy on the GPU. It is already used in zink, dxvk, and vkd3d-proton.
In order t...This extension will be used to accelerate uploads and downloads of tiled images, especially block-compressed images, by avoiding an extra staging buffer and copy on the GPU. It is already used in zink, dxvk, and vkd3d-proton.
In order to implement this we need an accelerated implementation of the Adreno tiling scheme. I've added the core tiling/untiling routines to fdl, inspired by `isl_tiled_memcpy.c`, and it could be useful for freedreno too. There is also documentation of the reverse-engineered scheme in a comment. Note that this doesn't implement UBWC compression/decompression, only tiling, as is expected for implementations of this extension.
I have vkoverhead patches to test the performance of `fd6_tiled_memcpy`. There is also a pending VK-GL-CTS CL to more thoroughly test this.
The tiling scheme depends on a parameter called the "highest bank bit" that is programmed into registers by the kernel. This means that we ideally should get the value from the kernel. This series includes a fallback which attempts to guess what value the kernel set, but there will be kernel and virgl-renderer patches to expose it to userspace, and we shouldn't land this MR without using that uABI to avoid accidentally making the value programmed by the kernel uABI. As long as we use the new uABI here, old mesa will not care about the highest bank bit whereas newer mesa will always get it from the kernel first, so it should be safe to change the value in the kernel if we need to (e.g. to fix the a650 bug).https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26203Replace dup() with os_dupfd_cloexec()2024-03-15T14:29:31ZSimon Sercontact@emersion.frReplace dup() with os_dupfd_cloexec()dup() will leak the new FD into any child process after fork().dup() will leak the new FD into any child process after fork().https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154turnip: Implement missing partial depth/stencil binding2023-12-05T22:03:38ZConnor Abbottturnip: Implement missing partial depth/stencil bindingWe missed this when bringing up VK_KHR_dynamic_rendering. Fortunately there's a HW feature to make this possible without fiddling with depth/stencil state.We missed this when bringing up VK_KHR_dynamic_rendering. Fortunately there's a HW feature to make this possible without fiddling with depth/stencil state.Connor AbbottConnor Abbotthttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25916nir: I/O vector access improvements2023-11-15T02:53:46ZFaith Ekstrandnir: I/O vector access improvementsThis MR does two things:
1. On NVIDIA, we can indirect access anything, including within a vector and I'd like to avoid lowering to if-ladders if we can. The first 4 commits of this MR make it so that we can indirect on compact variable...This MR does two things:
1. On NVIDIA, we can indirect access anything, including within a vector and I'd like to avoid lowering to if-ladders if we can. The first 4 commits of this MR make it so that we can indirect on compact variables such as tess levels and clip/cull distances. The annoying bit is that this means changing the interface of the `type_size` callback to `nir_lower_io()` which involves touching a lot of drivers.
2. We also need to get SPIR-V doing the right thing on TCS outputs. Right now, if the SPIR-V has a write to a single component of a vector, `spirv_to_nir` emits a load/insert/store pattern which is potentially racy. On NVIDIA, there are CTS tests which actually hit this race so I need this for passing CTS. We have a NIR pass which lowers writes of this form to an if-ladder with write-masks which is what we use for most drivers.MR Label MakerMR Label Makerhttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25718turnip: Store serialized nir for LTO2023-10-17T15:45:22ZConnor Abbottturnip: Store serialized nir for LTOStoring the actual shaders has an inordinate memory cost. Serialize and deserialize them instead.
This should hopefully help get CS2 running with FEX+turnip.
Based on !25679 to avoid rebase troubles.Storing the actual shaders has an inordinate memory cost. Serialize and deserialize them instead.
This should hopefully help get CS2 running with FEX+turnip.
Based on !25679 to avoid rebase troubles.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25680Draft: turnip: VK_EXT_shader_object2023-10-14T06:43:24ZConnor AbbottDraft: turnip: VK_EXT_shader_objectThis isn't ready to land yet because there are a few CTS bugs, including one without an open MR yet. However other than that the CTS results seem good.
There is also going to be a problem with CI runtimes as this adds a *lot* of new tes...This isn't ready to land yet because there are a few CTS bugs, including one without an open MR yet. However other than that the CTS results seem good.
There is also going to be a problem with CI runtimes as this adds a *lot* of new tests.
Once all the reworks land, the remaining challenge to enable shader objects is support for "separable" vertex and tessellation evaluation shaders, that may or may not have a following shader. Thankfully there is hardware support for this so that we don't need to compile multiple variants, we just need to enable it.
This includes !25679.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25454tu: Support AHardwareBuffer2024-02-22T08:23:19Ztarsintu: Support AHardwareBufferDepends on: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25360 !25455 !25410
Tested with u_gralloc IMapper4 API on Android 13
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9691 https://gitlab.freedesktop.org/mesa/mesa...Depends on: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25360 !25455 !25410
Tested with u_gralloc IMapper4 API on Android 13
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9691 https://gitlab.freedesktop.org/mesa/mesa/-/issues/9874