mesa merge requestshttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests2024-01-29T20:18:07Zhttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27247mesa,gallium,hud: move L3 thread scheduling to util, optimize, add mesa_pin_t...2024-01-29T20:18:07ZMarek Olšákmesa,gallium,hud: move L3 thread scheduling to util, optimize, add mesa_pin_threads env var for static affinity maskThis is a rewrite of what we have by moving it to util and changing how it's used.
There are a few functional changes:
* setting the affinity mask is skipped if the kernel hasn't moved the app thread to a different L3. This adds +1% to ...This is a rewrite of what we have by moving it to util and changing how it's used.
There are a few functional changes:
* setting the affinity mask is skipped if the kernel hasn't moved the app thread to a different L3. This adds +1% to CPU-bound FPS consistently, which is bigger than expected.
* fixed an oversight in the previous implementation that the thread scheduling policy wasn't applied:
* when the context was created
* when glthread was synchronized
Finally, `mesa_pin_threads` environment variable is added to reduce randomness of benchmark results. Useful for evaluating the effect of optimizations.
The HUD change to output CSV is for easier gathering of performance data.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27198st/mesa: decrease overhead of st_update_arrays using C++ templates2024-02-13T05:45:31ZMarek Olšákst/mesa: decrease overhead of st_update_arrays using C++ templatesThe last 4 commits decrease overhead. The last commit is the most impactful one, like `struct tc_set_vertex_buffers` is filled by st/mesa directly. The rest are prerequisites and cleanups.The last 4 commits decrease overhead. The last commit is the most impactful one, like `struct tc_set_vertex_buffers` is filled by st/mesa directly. The rest are prerequisites and cleanups.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26584mesa,gallium: decrease CPU overhead for glDrawElements by filling TC batch fr...2024-01-08T23:23:16ZMarek Olšákmesa,gallium: decrease CPU overhead for glDrawElements by filling TC batch from glDraw* directlyThe last commit implements the fast path for `glDraw(Range)Elements(Instanced)(BaseVertex)(BaseInstance)`.
The second last commit optimizes `tc_draw_vbo` for the case when the fast path can't be used.
All preceding commits are just ref...The last commit implements the fast path for `glDraw(Range)Elements(Instanced)(BaseVertex)(BaseInstance)`.
The second last commit optimizes `tc_draw_vbo` for the case when the fast path can't be used.
All preceding commits are just refactoring required by the last commit.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25624tc: implement unsynchronized texture uploads2023-10-26T23:04:14ZMike Blumenkrantztc: implement unsynchronized texture uploadsAt present there are three modes for texture subdata calls:
* **small subdatas** are enqueued
* **large subdatas** are either
* sequenced into strided copies and enqueued (rp-tracked path)
* called directly after a `tc_sync`
The **l...At present there are three modes for texture subdata calls:
* **small subdatas** are enqueued
* **large subdatas** are either
* sequenced into strided copies and enqueued (rp-tracked path)
* called directly after a `tc_sync`
The **large subdatas** part is problematic for the rp-tracked path as it can result in dozens/hundreds of enqueued copies. In spite of the batch overhead, this is still preferable to incurring a `tc_sync` on some implementations (when rp-tracking is enabled), as a sync call is likely to void any of the tracked renderpass data, which obliterates performance on tilers. But having tons of copies is bad for other reasons, least of which is the overhead incurred by having tc serialize that many calls.
An optimization here which works for many cases is to enable unsynchronized subdata calls. This lets the frontend pass the subdata call directly to the driver without any synchronization/serialization as long as the texture is known to be unused.
Usage is detected by a mechanism similar to what zink uses:
* tag textures on use
* special tag for persistent use to block all unsynchronized access
* check texture usage data when attempting to promote a subdata call to unsynchronized
Driver requirements to use/test:
* handle `pipe_context::is_resource_busy` with `TC_TRANSFER_MAP_THREADED_UNSYNC`
* handle `texture_subdata` with `TC_TRANSFER_MAP_THREADED_UNSYNC`
* set `unsynchronized_texture_subdata=true`
https://gitlab.freedesktop.org/zmike/mesa/-/pipelines/1002913 in ci, though currently only lavapipe can access this path since nobody else supports HICMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25227tc/zink: fix some sync issues2023-09-25T03:02:45ZMike Blumenkrantztc/zink: fix some sync issuesMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25206aux/tc: fix renderpass tracking fb state clobber scenario2023-09-25T03:02:47ZMike Blumenkrantzaux/tc: fix renderpass tracking fb state clobber scenario### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
aux/tc: fix renderpass tracking fb state clobber scenario
in a stream like:
* set fb state (A)
* flush
* set fb state (B)
* draw -> dri...### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
aux/tc: fix renderpass tracking fb state clobber scenario
in a stream like:
* set fb state (A)
* flush
* set fb state (B)
* draw -> driver query
* flush
the "driver query" should return the tc info corresponding to the most
recent fb state (B). previously this would increment to C because
the flag for incrementing at the start of a batch was set
Fixes: 07017aa137b ("util/tc: implement renderpass tracking")Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25203aux/tc: Add ASSERTED to unreferenced release build variable2023-09-25T03:02:47ZSil Vilerinoaux/tc: Add ASSERTED to unreferenced release build variable### What does this MR do and why?
aux/tc: Add ASSERTED to unreferenced release build variable
Fixes MSVC build error `src/gallium/auxiliary/util/u_threaded_context.c(3184): error C4189: 'size': local variable is initialized but not ref...### What does this MR do and why?
aux/tc: Add ASSERTED to unreferenced release build variable
Fixes MSVC build error `src/gallium/auxiliary/util/u_threaded_context.c(3184): error C4189: 'size': local variable is initialized but not referenced`
Fixes: 51ad269198e ("aux/tc: handle stride mismatch during rp-optimized subdata")
@zmike @mareko Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25180tc: more fixes for renderpass texsubimage2023-09-24T21:17:12ZMike Blumenkrantztc: more fixes for renderpass texsubimageMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24849aux/tc: handle stride mismatch during rp-optimized subdata2023-09-24T20:20:41ZMike Blumenkrantzaux/tc: handle stride mismatch during rp-optimized subdata### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
aux/tc: handle stride mismatch during rp-optimized subdata
to avoid splitting renderpasses, this subdata optimization handles the usual...### What does this MR do and why?
<!-- Describe in detail what your merge request does and why. -->
aux/tc: handle stride mismatch during rp-optimized subdata
to avoid splitting renderpasses, this subdata optimization handles the usual
driver dance of staging buffer -> gpu copy
if the pbo stride doesn't match the image format's stride, however, then
a direct copy will yield broken pixels and the image will misrender. to avoid this,
detect stride mismatch and translate the single subdata call into a sequence
of non-overlapping subdata calls that the driver can magically figure out
while continuing to not split renderpasses
fixes #9589
cc: mesa-stableMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23492Revert "gallium/u_threaded: buffer sharedness tracking"2023-06-08T08:03:13ZPierre-Eric Pelloux-PrayerRevert "gallium/u_threaded: buffer sharedness tracking"Revert "gallium/u_threaded: buffer sharedness tracking"
This reverts commit 8f159a8576efbb7bb3755d215a54b87c4c99a0d2.
This commit is correct but it exposes an existing bug: DISCARD_RANGE doesn't
work well with shared buffers.
So for no...Revert "gallium/u_threaded: buffer sharedness tracking"
This reverts commit 8f159a8576efbb7bb3755d215a54b87c4c99a0d2.
This commit is correct but it exposes an existing bug: DISCARD_RANGE doesn't
work well with shared buffers.
So for now revert this commit as it's causing hangs on some APUs (see
https://gitlab.freedesktop.org/drm/amd/-/issues/2447) and flickering in
Metro Last Light Redux.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9108Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21801aux/tc: use renderpass tracking to optimize texture_subdata calls2023-03-14T01:30:23ZMike Blumenkrantzaux/tc: use renderpass tracking to optimize texture_subdata callsif it's known that a renderpass is active and the driver wants to do
renderpass optimizing, help out by not forcing a sync and instead doing
what the driver would do: create a staging buffer and copy it to the
image
this requires that t...if it's known that a renderpass is active and the driver wants to do
renderpass optimizing, help out by not forcing a sync and instead doing
what the driver would do: create a staging buffer and copy it to the
image
this requires that the driver already handles buffer -> image copies
with resource_copy_regionMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800tc: some renderpass tracking fixes2023-04-20T23:05:23ZMike Blumenkrantztc: some renderpass tracking fixesmostly to avoid corner cases where the driver might be able to accidentally optimize out the zsbuf when it shouldn'tmostly to avoid corner cases where the driver might be able to accidentally optimize out the zsbuf when it shouldn'tNeeds mergeMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21729aux/tc: fix rp info resizing clobbering current info2023-03-23T20:56:35ZMike Blumenkrantzaux/tc: fix rp info resizing clobbering current infothe recording rp_info may be a pointer to a member of the array being
reallocated, so test for this and re-set it to avoid invalid memory
access
found with this caselist:
KHR-GL46.texture_gather.offset-gather-unorm-2darray
KHR-GL46.text...the recording rp_info may be a pointer to a member of the array being
reallocated, so test for this and re-set it to avoid invalid memory
access
found with this caselist:
KHR-GL46.texture_gather.offset-gather-unorm-2darray
KHR-GL46.texture_view.view_sampling
cc: mesa-stableMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21533aux/tc: track whether queries have been terminated in a renderpass2023-03-05T14:17:20ZMike Blumenkrantzaux/tc: track whether queries have been terminated in a renderpasson tilers it's important to know whether a query is ended mid-renderpass
so that the query begin can occur inside/outside of the renderpasson tilers it's important to know whether a query is ended mid-renderpass
so that the query begin can occur inside/outside of the renderpassMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21365aux/tc: add a 'has_resolve' member to tc_renderpass_info2023-02-23T16:01:27ZMike Blumenkrantzaux/tc: add a 'has_resolve' member to tc_renderpass_infothis indicates that the first color buffer gets resolvedthis indicates that the first color buffer gets resolvedMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19077tc: implement renderpass info tracking2022-10-29T20:44:32ZMike Blumenkrantztc: implement renderpass info tracking## Problem
on tiling gpus, it's crucial for performance to be able to determine how renderpass attachments will be utilized
gallium, however, does not provide any of the info needed to make these determinations, and the mechanics to man...## Problem
on tiling gpus, it's crucial for performance to be able to determine how renderpass attachments will be utilized
gallium, however, does not provide any of the info needed to make these determinations, and the mechanics to manage the tracking for it in-driver is prohibitive to both cost and complexity
to solve this problem, threaded context can be augmented to track usage for framebuffer attachments and then make that usage available to drivers
## Implementation
when drivers opt-in to this behavior, tc will track metadata for framebuffer attachments into this struct:
```c
struct {
uint8_t cbuf_clear;
uint8_t cbuf_load;
uint8_t cbuf_invalidate;
bool zsbuf_clear : 1;
bool zsbuf_clear_partial : 1;
bool zsbuf_load : 1;
bool zsbuf_invalidate : 1;
bool has_draw : 1;
uint8_t pad : 3;
uint8_t cbuf_fbfetch;
bool zsbuf_write_fs : 1;
bool zsbuf_write_dsa : 1;
bool zsbuf_read_dsa : 1;
bool zsbuf_fbfetch : 1;
uint8_t pad2 : 4;
uint16_t pad3;
};
```
drivers can then access this data at any point outside of internal meta operations to know exactly how all framebuffer attachments will be used
## Performance
this has been observed to yield ~10% performance gains on turnip for some glmark2 cases (with some variant of !18762 or !18736 also applied)
when inactive, this yields no observable changes to drawoverhead (tested on radeonsi)
when active, causes ~5% perf loss on base drawoverhead draw casesMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18927gallium/u_threaded: Add some atrace/perfetto2022-10-10T18:11:08ZRob Clarkgallium/u_threaded: Add some atrace/perfettoA couple things I've hacked up locally before for better TC visibility in perfetto tracesA couple things I've hacked up locally before for better TC visibility in perfetto tracesMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18774tc: cpu_storage fixes2022-09-28T17:15:18ZPierre-Eric Pelloux-Prayertc: cpu_storage fixesMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18731gallium/u_threaded: add unsychronized create_fence_fd2022-09-23T04:13:28ZMarek Olšákgallium/u_threaded: add unsychronized create_fence_fdThis may be needed by Android.This may be needed by Android.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17741gallium/u_threaded: buffer subdata merging (v2)2022-08-10T15:14:21ZJonathan Stroblgallium/u_threaded: buffer subdata merging (v2)In a scenario where a sequence of calls happens like:
* subdata(buffer_a, offset=0, size=64)
* subdata(buffer_a, offset=64, size=64)
* subdata(buffer_a, offset=128, size=64)
* subdata(buffer_a, offset=192, size=64)
and the buffer can't ...In a scenario where a sequence of calls happens like:
* subdata(buffer_a, offset=0, size=64)
* subdata(buffer_a, offset=64, size=64)
* subdata(buffer_a, offset=128, size=64)
* subdata(buffer_a, offset=192, size=64)
and the buffer can't be directly mapped (e.g., because it has bindings), the
subdata calls will now be merged together into one larger subdata call.
This is a replacement for https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17597.
This is still lacking the ability to do invalidations + unsynchronized uploads in case a merged subdata call overwrites an entire buffer. This turned out to be much harder than I thought. However, it achieves sizeable performance gains even without this particular optimization.
```
Test: KHR-GL46.CommonBugs.CommonBug_SparseBuffersWithCopyOps on radeonsi
Before:
real 0m1,923s
user 0m1,017s
sys 0m0,051s
After:
real 0m0,686s
user 0m0,502s
sys 0m0,071s
```Marge BotMarge Bot