mesa merge requestshttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests2021-05-12T16:18:52Zhttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10746gallium/u_threaded: fix 32-bit breakage due to incorrect pointer arithmetic2021-05-12T16:18:52ZMarek Olšákgallium/u_threaded: fix 32-bit breakage due to incorrect pointer arithmeticFixes: 1233c90ab4a - gallium/u_threaded: rewrite slot layout to reduce wasted space
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4755Fixes: 1233c90ab4a - gallium/u_threaded: rewrite slot layout to reduce wasted space
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4755Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10753gallium/u_threaded: Add some extra debugging2021-05-12T18:54:42ZRob Clarkgallium/u_threaded: Add some extra debuggingSome extra debugging to help track down issues like #4755 / #4758 (fixed by !10746)Some extra debugging to help track down issues like #4755 / #4758 (fixed by !10746)Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10662gallium/u_threaded, radeonsi: track busy buffers in TC, merge draws in execut...2021-06-25T15:21:41ZMarek Olšákgallium/u_threaded, radeonsi: track busy buffers in TC, merge draws in execute callbacksThis enables busy buffer tracking in `u_threaded_context`, which allows promoting buffer mappings to `UNSYNCHRONIZED` for idle buffers, improving scalability especially for glBufferSubData. RadeonSI is the only driver that enables it cur...This enables busy buffer tracking in `u_threaded_context`, which allows promoting buffer mappings to `UNSYNCHRONIZED` for idle buffers, improving scalability especially for glBufferSubData. RadeonSI is the only driver that enables it currently. TC tracks buffer lists of referenced buffers in unflushed TC batches and unflushed driver command buffers, and they use hashing of unique buffer IDs without atomics for lower overhead instead of pipe_resource pointers. TC also tracks all currently bound buffers in terms of buffer IDs, which is required for reconstructing the buffer list when starting a new TC batch where the buffer list starts empty.
Now TC has all the synchronization prevention optimizations that drivers should have (or even more than drivers have), and can be safely enabled by default.
Draw merging is also moved to tc_call_draw_single to facilitate more complex merging in the future. The first 2 commits do it.
Issues:
* The `pipe->flush` for the buffer list ring might have to be handled the same as an internal flush in drivers. (for maximum efficiency) - radeonsi might need changes here.
* Invalidated buffers are not re-bound in other instances of u_threaded_context. This affects synchronization, but I don't know yet how.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10883freedreno: Add support for TC is_resource_busy2021-05-21T16:55:32ZRob Clarkfreedreno: Add support for TC is_resource_busyThis helps avoid staging blits by letting TC know when it can promote transfers to UNSYNCHRONIZEDThis helps avoid staging blits by letting TC know when it can promote transfers to UNSYNCHRONIZEDMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10882zink: implement new tc buffer features2021-05-26T08:39:17ZMike Blumenkrantzzink: implement new tc buffer featuresI think this is right, but there's no longer a reference implementation, so ???I think this is right, but there's no longer a reference implementation, so ???Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11187aux/tc: fix ubo unbinding2021-06-04T17:25:41ZMike Blumenkrantzaux/tc: fix ubo unbindingunsetting a ubo requires an unbindunsetting a ubo requires an unbindMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11278Revert "st/mesa: execute glFlush asynchronously if no image has been imported...2021-06-16T14:48:50ZRob ClarkRevert "st/mesa: execute glFlush asynchronously if no image has been imported/exported"A number of the piglit glx tests use multiple contexts on a single
thread, and previously the flush in MakeCurrent() was enforcing the
ordering between draws on those different contexts. When that flush
was made ASYNC, now there is noth...A number of the piglit glx tests use multiple contexts on a single
thread, and previously the flush in MakeCurrent() was enforcing the
ordering between draws on those different contexts. When that flush
was made ASYNC, now there is nothing ordering the draws because we have
two (or more) driver threads for a single frontend thread which is
using nothing more than glxMakeCurrent() to enforce the ordering.
This reverts commit 057a702a3f6a78a8bcd347a74e5a79d70dfc4153.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11245aux/tc: pass rebind count and rebind bitmask with replace_buffer_storage func2021-06-14T21:06:30ZMike Blumenkrantzaux/tc: pass rebind count and rebind bitmask with replace_buffer_storage functc already calculates all the rebinding that needs to be done on a given
context, so (some of) this info can be passed on to drivers to enable
optimizations
#4800tc already calculates all the rebinding that needs to be done on a given
context, so (some of) this info can be passed on to drivers to enable
optimizations
#4800Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11335gallium/u_threaded: clear valid buffer range only if it's not bound for write2021-06-14T23:38:34ZMarek Olšákgallium/u_threaded: clear valid buffer range only if it's not bound for writeWe can't invalidate the range if a buffer is bound for write because we
would need to add the range that is bound, which we don't track.
This fixes buffer mappings incorrectly promoted to unsynchronized because
the valid range was clear...We can't invalidate the range if a buffer is bound for write because we
would need to add the range that is bound, which we don't track.
This fixes buffer mappings incorrectly promoted to unsynchronized because
the valid range was cleared while the buffers were bound for write.
It also clears the valid range if the invalidation is allowed but skipped.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11349util, gallium/u_threaded: merge draws faster by merging indexbuf unreferencin...2021-06-15T04:32:30ZMarek Olšákutil, gallium/u_threaded: merge draws faster by merging indexbuf unreferencing (add p_atomic_sub_return)Instead of N times decrementing the index buffer refcount by 1, decrement
it by N once.Instead of N times decrementing the index buffer refcount by 1, decrement
it by N once.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12898util/tc: rename tc_replace_buffer_storage_func::num_rebinds and document2021-09-17T13:24:57ZMike Blumenkrantzutil/tc: rename tc_replace_buffer_storage_func::num_rebinds and documentthis parameter is only a hint, as tc provides no method for tracking cases
when a buffer is bound multiple times to the same site (e.g., multiple vertex
buffer slots will be counted as 1 bind), so rename to "minimum" to be more clearthis parameter is only a hint, as tc provides no method for tracking cases
when a buffer is bound multiple times to the same site (e.g., multiple vertex
buffer slots will be counted as 1 bind), so rename to "minimum" to be more clearMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13163gallium/u_threaded: Get reset status without sync2021-10-04T20:24:04ZRob Clarkgallium/u_threaded: Get reset status without syncGPU hangs are asynchronous already, there should not be an expectation
that this is synchronized with driver thread.
-----
I noticed chrome(ium) regularly calls `glGetGraphicsResetStatus()`, causing TC syncs. It does appear to at leas...GPU hangs are asynchronous already, there should not be an expectation
that this is synchronized with driver thread.
-----
I noticed chrome(ium) regularly calls `glGetGraphicsResetStatus()`, causing TC syncs. It does appear to at least ratelimit the calls to 5ms. But I'm not convinced that we need to sync with driver thread here, as GPU resets are already async.
For freedreno, this call basically just boils down to a ioctl call to query the reset count from the kernel, which should be safe to do from frontend thread instead of driver thread. I'm less sure about other drivers. Possibly we should let drivers opt-in to no-sync reset status queries?Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13207gallium/u_threaded: Split out options struct2021-10-07T17:52:57ZRob Clarkgallium/u_threaded: Split out options structNeed more testing to decide if the 2nd patch is a good idea or not. (And maybe there should be some thresholds, like min # of draws before `set_framebuffer_state()` triggers a flush?) But the 2nd patch motivated the first one, which I ...Need more testing to decide if the 2nd patch is a good idea or not. (And maybe there should be some thresholds, like min # of draws before `set_framebuffer_state()` triggers a flush?) But the 2nd patch motivated the first one, which I do think is a reasonable idea, rather than continuing to add more `threaded_context_create()` parameters.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13235u_threaded,radeonsi: fix draw_vertex_state with multi draws, fixes2021-10-08T02:52:41ZMarek Olšáku_threaded,radeonsi: fix draw_vertex_state with multi draws, fixesMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13670d3d12: Enable threaded context2021-11-09T01:32:03ZJesse Nataliejenatali@microsoft.comd3d12: Enable threaded contextThe only interesting thing here is the addition of new logic to handle buffer rebinding. Everything else pretty mechanical.The only interesting thing here is the addition of new logic to handle buffer rebinding. Everything else pretty mechanical.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14770aux/tc: add tc_buffer_write to replace pipe_buffer_write usage2022-01-28T15:52:04ZMike Blumenkrantzaux/tc: add tc_buffer_write to replace pipe_buffer_write usagetc_buffer_write is the tc-safe version of this function which will
avoid accidental invalidations that break behavior
Acked-by: Marek Olšák <marek.olsak@amd.com>
split from !14745tc_buffer_write is the tc-safe version of this function which will
avoid accidental invalidations that break behavior
Acked-by: Marek Olšák <marek.olsak@amd.com>
split from !14745Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14933d3d12: Performance improvements2022-03-04T22:55:33ZJesse Nataliejenatali@microsoft.comd3d12: Performance improvementsThis series improves fps of the GfxBench T-Rex benchmark by 50% (22fps -> 34fps, GPU utilization 15% -> 75% on my NVIDIA Quadro).
There's three primary perf improvements here:
* Fix the buffer suballocators to actually suballocate - the...This series improves fps of the GfxBench T-Rex benchmark by 50% (22fps -> 34fps, GPU utilization 15% -> 75% on my NVIDIA Quadro).
There's three primary perf improvements here:
* Fix the buffer suballocators to actually suballocate - the size/alignment values were all wrong resulting in a bunch of 64KiB buffers (D3D12's allocation granularity) instead of chunks of buffers. The actual sub-buffer alignment should've been 512 (the least common denominator across buffer alignments, which is the alignment for buffer <-> texture copies).
* In order to do this, a bunch of places needed to be fixed to actually handle suballocation. Specifically for TBOs, they were broken, so there's a fix in here for them, but unfortunately we can't suballocate TBOs because D3D's offsets for them are specified in elements, and there's no reasonable alignment for `R32G32B32` buffers that results in a whole number of elements.
* Fix the buffer cache to actually return cached buffers - the `DONOTWAIT` map check doesn't work for buffers that aren't mappable, so use the explicit busy check callback.
* Let buffer readback for `DEFAULT` buffers go through the TC CPU storage path, so vbuf/primconvert doesn't have to do copy+stall in order to read back.
* As part of this, also report no support for R8 index buffers so the R8 -> R16 conversion can be done at the vbuf level (above TC). We're still doing actual primconvert for non-supported primitives in the driver though, but eventually this should move to the vbuf level too.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15074radeonsi / tc: enable cpu_storage by default2022-04-22T04:03:31ZPierre-Eric Pelloux-Prayerradeonsi / tc: enable cpu_storage by defaultMarge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16182util, d3d12: Refcount tracking fixes, a TC leak fix, and a D3D12 leak fix2022-05-02T17:43:22ZJesse Nataliejenatali@microsoft.comutil, d3d12: Refcount tracking fixes, a TC leak fix, and a D3D12 leak fixI'm happy to split out and land any of these individually. The most important one here is ebf4e9f8 (with 8873ab68 as a prereq), to fix the leak I was chasing down. The rest is either speculative leak fixes or leak tracking improvements.
...I'm happy to split out and land any of these individually. The most important one here is ebf4e9f8 (with 8873ab68 as a prereq), to fix the leak I was chasing down. The rest is either speculative leak fixes or leak tracking improvements.
/cc @zmike for the primconvert/TC fix.Marge BotMarge Bothttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16300u_threaded: clear non-async debug callback correctly2022-05-10T11:34:20ZPierre-Eric Pelloux-Prayeru_threaded: clear non-async debug callback correctlyThe following sequence:
glEnable(GL_DEBUG_OUTPUT_KHR);
glEnable(GL_DEBUG_OUTPUT_SYNCHRONOUS_KHR);
glDebugMessageCallbackKHR(my_callback, NULL);
Will cause the 2nd call to be ignored - but since the callback
function used by _m...The following sequence:
glEnable(GL_DEBUG_OUTPUT_KHR);
glEnable(GL_DEBUG_OUTPUT_SYNCHRONOUS_KHR);
glDebugMessageCallbackKHR(my_callback, NULL);
Will cause the 2nd call to be ignored - but since the callback
function used by _mesa_update_debug_callback is always the
same (_debug_message), this means we'll keep using it, causing
"my_callback" to be called from driver-internal threads.
So instead of skipping the 2nd call, make sure we pass the
information to the driver.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5206Marge BotMarge Bot