- 13 Apr, 2022 40 commits
-
-
Jason Ekstrand authored
-
Jason Ekstrand authored
This version takes both a source and destination size and repeats the source as many times as needed to fill the destination.
-
Jason Ekstrand authored
We don't need any of the fancy CCS handling stuff that blorp_copy does and this will give us a bit more control.
-
Jason Ekstrand authored
-
No fossil-db changes. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!14124>
-
fossil-db (Sienna Cichlid): Totals from 400 (0.30% of 134621) affected shaders: VGPRs: 18696 -> 18688 (-0.04%) CodeSize: 2031348 -> 1946640 (-4.17%) Instrs: 374703 -> 360226 (-3.86%) Latency: 4200727 -> 4108628 (-2.19%); split: -2.20%, +0.01% InvThroughput: 1059935 -> 1029441 (-2.88%); split: -2.88%, +0.00% VClause: 5777 -> 5771 (-0.10%) SClause: 11890 -> 10891 (-8.40%); split: -8.57%, +0.17% Copies: 34035 -> 33259 (-2.28%); split: -2.98%, +0.70% Branches: 11108 -> 11100 (-0.07%); split: -0.08%, +0.01% PreSGPRs: 15999 -> 15942 (-0.36%); split: -0.44%, +0.08% PreVGPRs: 16994 -> 16970 (-0.14%) fossil-db (Polaris10): Totals from 400 (0.29% of 135668) affected shaders: SGPRs: 23799 -> 22919 (-3.70%); split: -4.30%, +0.61% VGPRs: 18480 -> 18472 (-0.04%) CodeSize: 2090316 -> 2041592 (-2.33%) Instrs: 395461 -> 385747 (-2.46%); split: -2.46%, +0.00% Latency: 5045768 -> 5020196 (-0.51%); split: -0.53%, +0.02% InvThroughput: 2694320 -> 2689886 (-0.16%); split: -0.23%, +0.07% VClause: 5982 -> 5968 (-0.23%) SClause: 12064 -> 10823 (-10.29%); split: -10.33%, +0.04% Copies: 48233 -> 48322 (+0.18%); split: -0.47%, +0.65% PreSGPRs: 16409 -> 16358 (-0.31%); split: -0.39%, +0.08% fossil-db (Pitcairn): Totals from 400 (0.29% of 135668) affected shaders: SGPRs: 22431 -> 22215 (-0.96%); split: -2.60%, +1.64% VGPRs: 18776 -> 18560 (-1.15%); split: -1.21%, +0.06% CodeSize: 2104440 -> 2017708 (-4.12%) MaxWaves: 2363 -> 2367 (+0.17%) Instrs: 413099 -> 397446 (-3.79%) Latency: 5507707 -> 5450251 (-1.04%); split: -1.12%, +0.07% InvThroughput: 2838867 -> 2786903 (-1.83%); split: -1.83%, +0.00% VClause: 10334 -> 10097 (-2.29%) SClause: 12346 -> 11005 (-10.86%); split: -10.89%, +0.02% Copies: 54034 -> 52065 (-3.64%); split: -3.99%, +0.35% PreSGPRs: 17916 -> 17857 (-0.33%); split: -0.40%, +0.07% PreVGPRs: 16917 -> 16893 (-0.14%) Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!14124>
-
The callback now supports this. This shouldn't have any effect yet except on GFX6 with 12 byte loads. fossil-db (Pitcairn): Totals from 246 (0.18% of 135668) affected shaders: VGPRs: 14684 -> 14768 (+0.57%); split: -0.44%, +1.01% CodeSize: 1765792 -> 1738040 (-1.57%) Instrs: 344605 -> 340055 (-1.32%) Latency: 4892904 -> 4861942 (-0.63%) InvThroughput: 2479599 -> 2446070 (-1.35%) VClause: 8782 -> 8735 (-0.54%) SClause: 9854 -> 9853 (-0.01%) Copies: 47327 -> 45401 (-4.07%); split: -4.08%, +0.01% Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!14124>
-
fossil-db (Sienna Cichlid): Totals from 7 (0.01% of 134621) affected shaders: VGPRs: 760 -> 776 (+2.11%) CodeSize: 222000 -> 222044 (+0.02%); split: -0.01%, +0.03% Instrs: 40959 -> 40987 (+0.07%); split: -0.01%, +0.08% Latency: 874811 -> 886609 (+1.35%); split: -0.00%, +1.35% InvThroughput: 437405 -> 443303 (+1.35%); split: -0.00%, +1.35% VClause: 1242 -> 1240 (-0.16%) SClause: 1050 -> 1049 (-0.10%); split: -0.19%, +0.10% Copies: 4953 -> 4973 (+0.40%); split: -0.04%, +0.44% Branches: 1947 -> 1957 (+0.51%); split: -0.05%, +0.56% PreVGPRs: 741 -> 747 (+0.81%) fossil-db changes seem to be noise. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!14124>
-
Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!14124>
-
These are the same as the normal ones, but they take an unsigned 32-bit offset in BASE and another unsigned 32-bit offset in the last source. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!14124>
-
For example, dwordx3->dwordx4 or ubyte3->dwordx2. Global loads don't have the bounds checking that buffer loads have that makes this safe. The alignment checks are added to global_load_callback() in case byte_align_loads=false, align=1 and bytes_needed=3. Without them, the callback will create a dword load. fossil-db (Sienna Cichlid): Totals from 267 (0.20% of 134621) affected shaders: CodeSize: 1603352 -> 1606568 (+0.20%) Instrs: 294946 -> 295482 (+0.18%); split: -0.00%, +0.18% Latency: 2997003 -> 2997052 (+0.00%); split: -0.02%, +0.02% InvThroughput: 526645 -> 526659 (+0.00%) SClause: 9179 -> 9185 (+0.07%); split: -0.02%, +0.09% Copies: 25363 -> 25375 (+0.05%); split: -0.08%, +0.13% Branches: 8298 -> 8299 (+0.01%) fossil-db (Polaris10): Totals from 267 (0.20% of 135668) affected shaders: CodeSize: 1636672 -> 1638756 (+0.13%); split: -0.00%, +0.13% Instrs: 308484 -> 308733 (+0.08%); split: -0.01%, +0.09% Latency: 3446045 -> 3446904 (+0.02%); split: -0.00%, +0.03% InvThroughput: 1206722 -> 1206828 (+0.01%); split: -0.00%, +0.01% SClause: 9308 -> 9311 (+0.03%); split: -0.08%, +0.11% Copies: 36933 -> 36921 (-0.03%); split: -0.08%, +0.05% fossil-db (Pitcairn): Totals from 275 (0.20% of 135668) affected shaders: SGPRs: 17616 -> 17520 (-0.54%); split: -0.64%, +0.09% VGPRs: 15428 -> 15540 (+0.73%); split: -0.23%, +0.96% CodeSize: 1885792 -> 1929120 (+2.30%); split: -0.00%, +2.30% MaxWaves: 1284 -> 1285 (+0.08%) Instrs: 368963 -> 376095 (+1.93%); split: -0.00%, +1.94% Latency: 5122922 -> 5168398 (+0.89%); split: -0.01%, +0.90% InvThroughput: 2562866 -> 2604279 (+1.62%) VClause: 9268 -> 9296 (+0.30%); split: -0.13%, +0.43% SClause: 10702 -> 10705 (+0.03%); split: -0.05%, +0.07% Copies: 48620 -> 50629 (+4.13%); split: -0.08%, +4.21% Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!14124>
-
fossil-db (Sienna Cichlid): Totals from 38 (0.03% of 134621) affected shaders: CodeSize: 237196 -> 237060 (-0.06%); split: -0.09%, +0.03% Instrs: 43895 -> 43894 (-0.00%); split: -0.02%, +0.01% Latency: 914633 -> 916263 (+0.18%); split: -0.01%, +0.19% InvThroughput: 468215 -> 468971 (+0.16%); split: -0.02%, +0.18% SClause: 1239 -> 1242 (+0.24%) PreSGPRs: 997 -> 1003 (+0.60%) PreVGPRs: 936 -> 923 (-1.39%); split: -1.50%, +0.11% Regression seems to be RA noise, creating a waitcnt. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!14124>
-
fossil-db (Sienna Cichlid): Totals from 229 (0.17% of 134621) affected shaders: CodeSize: 1520192 -> 1517644 (-0.17%) Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!14124>
-
Robust vectorization is to prevent vectorization of loads using the near maximum offset with loads of offset 0. Global loads can't read from offset 0 (NULL) anyways, so this isn't necessary. No fossil-db changes. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!14124>
-
Fixes: 4d219b0e ("iris: implement scratch space!") Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Part-of: <!15897>
-
Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!15837>
-
When not in passthrough mode, the NGG shader needs to calculate the primitive export value from the input primitive's vertex indices. So, GS vertex offset 2 is needed when NGG has triangles and isn't in passthrough mode. Fixes: 7ad69e2f "radv: stop loading invocation ID for NGG vertex shaders" Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!15837>
-
Replace out of bounds loads with undef. Then, delete instructions with out of bounds access. Fixes: f5adf27f "nir,radv: add and use nir_vectorize_tess_levels()" Closes: #6264 Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!15775>
-
There was a v_or_b32 that accidentally used SOP2. It should use VOP2. Issue found by looking at a gfxreconstruct trace posted by a user in this bug: mesa/mesa#5838 Cc: mesa-stable Fixes: 93c8ebfa "aco: Initial commit of independent AMD compiler" Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Part-of: <mesa/mesa!15923>
-
anv sets the default EDSC flag, do the same for iris too Fixes: 5ae278da ("iris: use vtbl to avoid multiple symbols, fix state base address") Signed-off-by:
Rohan Garg <rohan.garg@intel.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Acked-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <mesa/mesa!15905>
-
Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <mesa/mesa!15910>
-
c78be5da ("intel/fs: lower ray query intrinsics") introduced a helper function using nir_(push|pop)_if which invalidated dominance & block_index for the replacement of nir_intrinsic_rt_trace_ray. We can still keep dominance/block_index metadata for the lowering of nir_intrinsic_rt_execute_callable though. This change uses 2 different lowering function with correct metadata preservation. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: c78be5da ("intel/fs: lower ray query intrinsics") Reviewed-by:
Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <mesa/mesa!15910>
-
this is illegal affects: KHR-GL46.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-packed-matC cc: mesa-stable Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!15894>
-
this may or may not be 1 cc: mesa-stable Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!15894>
-
Gert Wollny authored
The transformation must come before the code emission. Fixes: 6a264e70 Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Part-of: <mesa/mesa!15919>
-
Suggested by Francisco Jerez. Although including VF invalidation in the flush bits is strange, we believe this is the only way to guarantee that stream output has finished. Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Part-of: <!15275>
-
FLUSH_HDC is sufficient to flush things out to L3, so we'd rather use that where possible. It's also emulated via DATA_CACHE_FLUSH on platforms where it isn't supported, so we can use it unconditionally. We still use DATA_CACHE_FLUSH for invalidating the data cache, and to flush the DC-tagged cachelines in L3 to be globally-observable. Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Part-of: <mesa/mesa!15275>
-
Push constant loading is not coherent with L3 according to the document that describes the hardware change for the vertex buffer L3 Bypass Disable field. If we've updated a push constant buffer with say, a blorp_buffer_copy, we may need to flush both the render cache and the tile cache. Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Part-of: <mesa/mesa!15275>
-
We should be using the cache tracker for this. We can consider this access IRIS_DOMAIN_OTHER_READ now that it's the catch-all non-L3-coherent read-only access domain. Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Part-of: <mesa/mesa!15275>
-
When stream output is active, we need to let the cache tracker know about any SO buffers, which we access via IRIS_DOMAIN_OTHER_WRITE. In particular, we may have written to those buffers via another mechanism, such as BLORP buffer copies. In that case, previous writes happened via IRIS_DOMAIN_RENDER_WRITE, in which case we'd need to flush both the render cache and the tile cache to make that data globally- observable before we begin writing via streamout, which is incoherent with the earlier mechanism. Fixes misrendering in Ryujinx. Closes: mesa/mesa#6085 Fixes: d8cb7621 ("iris: Fix MOCS for buffer copies") Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Part-of: <mesa/mesa!15275>
-
Most clients are L3-coherent these days. However, there are some notable exceptions, such as push constants, stream output, and command streamer memory reads and writes. With the advent of the tile cache, flushing the render or depth caches alone are no longer sufficient for memory to become globally-observable. For those, we need to flush the tile cache as well. However, we'd like to avoid that for L3-coherent clients, as it shouldn't be necessary, and is expensive. Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Part-of: <mesa/mesa!15275>
-
This will let us use it without performing a VF cache invalidation, should we want to do that. Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Part-of: <mesa/mesa!15275>
-
The render, depth, sampler, and data (HDC) caches are all coherent with L3. We consider OTHER_READ and OTHER_WRITE to be non-coherent, as they're kitchen-sink domains which include non-L3-clients. Starting with Tigerlake, the VF cache is coherent with L3 (because we set the L3BypassDisable bit in the vertex/index buffer packets). Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Part-of: <mesa/mesa!15275>
-
On Tigerlake, we use the data cache for reading indirect UBOs instead of the sampler. But we still use the constant cache for direct UBO access, so unfortunately we may access it through two different domains. To work around this, we add a new domain for pull constants (UBOs), which will be either constant+texture or constant+data. Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Part-of: <mesa/mesa!15275>
-
The bulk of IRIS_DOMAIN_OTHER_READ domain usage was the 3D sampler, but there were also a few oddball cases like command streamer reads, blitter access, and so on. The sampler is definitely L3 coherent, but some off the more esoteric reads may not be, so I'd like to separate them, so that OTHER_READ can become a non-L3-coherent kitchen-sink domain. The sampler cases only need TEXTURE_CACHE_INVALIDATE, and can skip the CONSTANT_CACHE_INVALIDATE we had on IRIS_DOMAIN_OTHER_READ. Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Part-of: <!15275>
-
We were using IRIS_DOMAIN_OTHER_READ for read-only depth/stencil access in an attempt to avoid unnecessary flushing; IRIS_DOMAIN_DEPTH_WRITE could indicate read-write access. However, IRIS_DOMAIN_OTHER_READ is clearly the wrong domain. Depth and stencil data is read via the depth cache, while IRIS_DOMAIN_OTHER_READ currently corresponds to the sampler cache and constant cache together (although this will change in future patches). It's unclear whether this hack was useful. For now, just drop it and use the correct depth cache domain, even if it's marked as read-write. Reviewed-by:
Francisco Jerez <currojerez@riseup.net> Reviewed-by:
Rohan Garg <rohan.garg@intel.com> Part-of: <mesa/mesa!15275>
-
For texture fetches and buffer load the fix is not needed, and the override creates faulty TGSI. In addition remove all modifiers from the src in the additional mov instruction. Fixes: d1c7a7b1 virgl: Add an extra mov for int outputs from constant and immediate inputs v2: Move workaround after the use of virgl_tgsi_rewrite_src_for_input_temp (Emma) Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Part-of: <mesa/mesa!15896>
-
Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Reviewed-by:
Emma Anholt <emma@anholt.net> Part-of: <mesa/mesa!15898>
-
This patch adds support for transcoding ASTC to BC7 (BPTC) and prefers it over BC3 (DXT5) when hardware supports that format. BC7 is a much newer format (~2009 vs. ~1999) and offers higher quality than the older BC3 format. Furthermore, our encoder seems to be faster. Tapani put together a small benchmark for transcoding a 1024x1024 ASTC texture, and switching from BC3 to BC7 improves performance of that microbenchmark by 25% on my Tigerlake NUC (with hardware ASTC disabled so we can test this path). Presumably, this isn't fundamental to the formats, but rather reflects the speed of our in-tree compressors. So, we should use BC7 where possible. Reviewed-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Emma Anholt <emma@anholt.net> Reviewed-by:
Nanley Chery <nanley.g.chery@intel.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Part-of: <mesa/mesa!15875>
-
This is probably unnecessary in that all drivers which support the sRGB format likely also support the non-sRGB format. But we may as well check both the formats we use, for documentation if nothing else. Reviewed-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Emma Anholt <emma@anholt.net> Reviewed-by:
Nanley Chery <nanley.g.chery@intel.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Part-of: <mesa/mesa!15875>
-