mesa merge requestshttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests2022-08-01T10:23:39Zhttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16262Draft:include/drm-uapi: Add out parameters to drm_amdgpu_sched2022-08-01T10:23:39Zarunpravin24Draft:include/drm-uapi: Add out parameters to drm_amdgpu_schedAdd drm_amdgpu_sched_info as out parameter to assert the
context scheduled on high priority gfx pipe/queue for
execution.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>Add drm_amdgpu_sched_info as out parameter to assert the
context scheduled on high priority gfx pipe/queue for
execution.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658Draft: amdgpu: native context support for virtio2024-03-25T18:59:42ZPierre-Eric Pelloux-PrayerDraft: amdgpu: native context support for virtioThis MR implements native context support for amdgpu: this enables to use native drivers (radeonsi, radeonsi_drv_video and radv) in a guest VM (QEMU+kvm is the only supported setup currently).
Besides performance which *seems* better th...This MR implements native context support for amdgpu: this enables to use native drivers (radeonsi, radeonsi_drv_video and radv) in a guest VM (QEMU+kvm is the only supported setup currently).
Besides performance which *seems* better than virgl/venus (but I only tested 1 Vulkan game and a couple of GL ones so...) the main advantage I see is maintenance: the guest uses the same drivers, except that they don't speak directly to libdrm(_amdgpu) but instead go through a virtio/qemu transport layer. This can be see for in the enablement patches (last 2 of the series) which are quite small.
This is largely based on @robclark's work on freedreno/msm (see https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900 for the Mesa part or the XDC presentation https://www.youtube.com/watch?v=yTO8QRIfOjA&t=29065s).
To be able to test this some patches are currently needed for virglrenderer, qemu and Linux. I'll push them soon.
This MR is marked as draft because it's meant for discussion rather than review - the current code has flaws, quirks and all kind of issues; it's correct enough to be able to run Unigine Superposition and Rocket League though :smile:
The items I'd like to discuss:
* Would the RADV team be interested in supporting native context in their driver?
* Is the current approach regarding libdrm_amdgpu wrapping OK (see be8d35ac7d1 ("amd: port to libdrm_amdgpu virtual table"))? Instead of adding a new virtio winsys, I wrapped libdrm_amdgpu functions because radeonsi/radv amdgpu winsys provide lots of features that I didn't want to reimplement.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22073Draft: ac/nir/ngg: Improve analysis of shaders before culling2023-07-28T13:05:40ZTimur KristófDraft: ac/nir/ngg: Improve analysis of shaders before cullingSeveral improvements to `analyze_shader_before_culling` and expose it so that drivers can call it. This helps more effectively get rid of unnecessary LDS use and gets rid of the hack that is `cleanup_culling_shader_after_dce`.
This is a...Several improvements to `analyze_shader_before_culling` and expose it so that drivers can call it. This helps more effectively get rid of unnecessary LDS use and gets rid of the hack that is `cleanup_culling_shader_after_dce`.
This is also preparation for more upcoming work.Timur KristófTimur Kristófhttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22074Draft: ac/nir/ngg: Add ability to reuse some output variables.2023-11-02T08:37:22ZTimur KristófDraft: ac/nir/ngg: Add ability to reuse some output variables.Based on !22073
`ac_nir_lower_ngg` will try to find outputs
(other than the position) which are already calculated as part of the
position.
It will then repack these variables after culling (using LDS), and
reuse them to reduce the ex...Based on !22073
`ac_nir_lower_ngg` will try to find outputs
(other than the position) which are already calculated as part of the
position.
It will then repack these variables after culling (using LDS), and
reuse them to reduce the execution time of the deferred shader part.
This prevents wasting ALU on computing things that have already been
computed as part of the position.
Each repacked-reused dword costs 4 bytes of LDS per vertex and an
additional VGPR used (due to the case when culling is off and
repacking doesn't happen).
**Marked as WIP until the pre-requisite is merged. Also need to generate Fossil DB stats.**https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25293ac/llvm: Some vkcts fixes2024-01-17T10:29:46ZKonstantin Seurerac/llvm: Some vkcts fixeshttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25397radeonsi/sqtt: small fixes2024-03-22T08:21:05ZPierre-Eric Pelloux-Prayerradeonsi/sqtt: small fixesCommit 1 and 2 fix regressions.
Commit 3 reformat the si_sqtt file (it was using 2-spaces indentation).
This is prep work for a MR to port the recent radv RGP fixes to radeonsi.Commit 1 and 2 fix regressions.
Commit 3 reformat the si_sqtt file (it was using 2-spaces indentation).
This is prep work for a MR to port the recent radv RGP fixes to radeonsi.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25537ac/surface: relax custom pitch requirements to gfx82023-10-05T00:17:06Zshansheng wangac/surface: relax custom pitch requirements to gfx8### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
ac/surface: relax custom pitch requirements to gfx8
Signed-off-by: shanshengwang <shansheng.wang@amd.com>### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
ac/surface: relax custom pitch requirements to gfx8
Signed-off-by: shanshengwang <shansheng.wang@amd.com>https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26251radv,nir: optimize vkd3d-proton's MSAD instruction2024-03-22T11:13:17ZRhys Perryradv,nir: optimize vkd3d-proton's MSAD instructionApparently this DXIL instruction is used by FSR 3. The unoptimized sequence is horrible, but I don't know how much of an effect this has on actual FSR 3 shaders.Apparently this DXIL instruction is used by FSR 3. The unoptimized sequence is horrible, but I don't know how much of an effect this has on actual FSR 3 shaders.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800nir: properly split CS sys vals into API and driver variants (_zero_base)2024-03-25T22:58:43ZKarol Herbstkherbst@redhat.comnir: properly split CS sys vals into API and driver variants (_zero_base)This always annoyed me, that drivers have to deal with both. Just make the "API" variants always lower to the `_zero_base` ones to make it easier on drivers. This also fixes range analysis trying to optimize the "API" sys vals with hardw...This always annoyed me, that drivers have to deal with both. Just make the "API" variants always lower to the `_zero_base` ones to make it easier on drivers. This also fixes range analysis trying to optimize the "API" sys vals with hardware limits, even though they are unbound in e.g. OpenCL.
I also don't like the `_zero_base` naming, but this can kept until we have a better name.
I think I've figured out all regressions, and hopefully this makes handling of compute sysvals more sane in the future.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27797nir, ac/nir: Add workgroup divergence analysis pass and use it for mesh shade...2024-03-27T10:47:44ZTimur Kristófnir, ac/nir: Add workgroup divergence analysis pass and use it for mesh shader output countsBased on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27680
Adds a pass (similar to vertex divergence analysis) which deals with workgroup-divergence (as opposed to the default subgroup-divergence).
Then, use this pass in ...Based on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27680
Adds a pass (similar to vertex divergence analysis) which deals with workgroup-divergence (as opposed to the default subgroup-divergence).
Then, use this pass in `ac_nir_lower_ngg_ms` to handle the output counts more optimally.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28012Lower subsampled image formats correctly in gfx9402024-03-28T16:14:17ZGanesh Belgur RamachandraLower subsampled image formats correctly in gfx940### What does this MR do and why?
Lower subsampled image formats correctly in gfx940.
In particular, packing and unpacking of certain YUV formats are emulated in NIR.### What does this MR do and why?
Lower subsampled image formats correctly in gfx940.
In particular, packing and unpacking of certain YUV formats are emulated in NIR.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28252ac/gpu_info: fix regression in vulkan hw decode2024-03-28T18:07:55ZSathishkumar Sac/gpu_info: fix regression in vulkan hw decodeac/gpu_info: fix regression in vulkan hw decode
Fixes: f3ab454f074 ("ac/gpu_info: query the number of ip instance")ac/gpu_info: fix regression in vulkan hw decode
Fixes: f3ab454f074 ("ac/gpu_info: query the number of ip instance")Sathishkumar SSathishkumar Shttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28358radv/video: expose one video decode ring.2024-03-25T06:26:43ZDave Airlieradv/video: expose one video decode ring.### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
```
radv/video: expose one video decode ring.
The changes in
commit f3ab454f074938ec89b245ad3166c69e0330ca8c
Author: Sathishkumar S <s...### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
```
radv/video: expose one video decode ring.
The changes in
commit f3ab454f074938ec89b245ad3166c69e0330ca8c
Author: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Date: Wed Feb 28 18:58:29 2024 +0530
ac/gpu_info: query the number of ip instance
query the number of ip instances for VCN and JPEG
which causes a regression on radv with video decode.
I'm not sure how to expose instances here, so for now just
fix it.
Fixes: f3ab454f0749 ("ac/gpu_info: query the number of ip instance")
```https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28372drm-shim: Optimize handle allocation2024-03-27T12:23:52ZRob Clarkdrm-shim: Optimize handle allocation### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
```
drm-shim: Optimize handle allocation
I noticed that we were spending a *bunch* of time doing hashtable
lookups when trying to find ...### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
```
drm-shim: Optimize handle allocation
I noticed that we were spending a *bunch* of time doing hashtable
lookups when trying to find a available "handle", making drm-shim
somewhat useless for draw-overhead profiling. Keeping track of the
last allocated handle to avoid re-searching for handles which are
probably still in use avoids that.
Signed-off-by: Rob Clark <robdclark@chromium.org>
```https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28408radv: Delete TCS epilogs entirely2024-03-28T19:30:44ZTimur Kristófradv: Delete TCS epilogs entirelyBased on !28371
Changes `hs_finale` so that it can dynamically check the primitive type and whether TES reads the tess factors. This makes TCS epilogs unnecessary, so we can remove all code related to them. Thanks to Marek for this idea.Based on !28371
Changes `hs_finale` so that it can dynamically check the primitive type and whether TES reads the tess factors. This makes TCS epilogs unnecessary, so we can remove all code related to them. Thanks to Marek for this idea.https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28425Draft: radeonsi, radv, aco: Delete TCS epilogs entirely, and some cleanup.2024-03-27T16:54:21ZTimur KristófDraft: radeonsi, radv, aco: Delete TCS epilogs entirely, and some cleanup.Based on !28371 and !28408
Removes the TCS epilog entirely from RadeonSI and ACO, as well as makes RADV and RadeonSI use the same bitfields for `tcs_offchip_layout`.Based on !28371 and !28408
Removes the TCS epilog entirely from RadeonSI and ACO, as well as makes RADV and RadeonSI use the same bitfields for `tcs_offchip_layout`.