- Dec 12, 2024
-
-
Alyssa Rosenzweig authored
Signed-off-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Signed-off-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
hk will use this, it's a pretty obvious thing to want. Signed-off-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Also moves the src argument before dst which is more consistent. Reviewed-by:
Friedrich Vock <friedrich.vock@gmx.de> Part-of: <mesa/mesa!32488>
-
As part of the migration to deqp-runner suites, remove these definitions to prevent the introduction of additional piglit jobs without test suites. Signed-off-by:
Valentine Burley <valentine.burley@collabora.com> Part-of: <mesa/mesa!32461>
-
Convert the panfrost-g52-piglit-gles2:arm64 job to a deqp-runner suite. Signed-off-by:
Valentine Burley <valentine.burley@collabora.com> Part-of: <mesa/mesa!32461>
-
Convert the vmware-vmx-piglit:x86_64 job to a deqp-runner suite. Signed-off-by:
Valentine Burley <valentine.burley@collabora.com> Part-of: <!32461>
-
This reverts commit 3b010a9e. This should be fixed properly now. Part-of: <mesa/mesa!32583>
-
This injects a MRTZ export with only the alpha channel to select it with COVERAGE_TO_MASK_ENABLE for alpha-to-coverage. Co-Authored-by:
Rhys Perry <pendingchaos02@gmail.com> Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!32583>
-
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!32583>
-
mesa/piglit@468221c7...4c0fd15f Part-of: <mesa/mesa!32478>
-
Displayable DCC should also be disabled, otherwise it's asserting somewhere in ac_surface.c Fixes: e3d1f27b ("radv: add radv_disable_dcc_stores and enable for Indiana Jones: The Great Circle") Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!32584>
-
This way, linear predecessors and successors better reflect the actual control flow which improves wait state insertion and hazard mitigation. Totals from 10252 (12.91% of 79395) affected shaders: (Navi31) Instrs: 18824540 -> 18803823 (-0.11%); split: -0.11%, +0.00% CodeSize: 99025464 -> 98942028 (-0.08%); split: -0.08%, +0.00% Latency: 169291854 -> 165781877 (-2.07%); split: -2.07%, +0.00% InvThroughput: 29701086 -> 29228602 (-1.59%); split: -1.59%, +0.00% SClause: 510587 -> 510586 (-0.00%) Part-of: <mesa/mesa!32389>
-
No fossil changes. Part-of: <mesa/mesa!32389>
-
instr is the branch instruction, its opcode won't ever be writelane. We should check inst instead. Found by inspection. Cc: mesa-stable Part-of: <mesa/mesa!32389>
-
This optimization gets applied during postRA optimization, now. No fossil changes. Part-of: <mesa/mesa!32330>
-
Totals from 196 (0.25% of 79206) affected shaders: (Navi31) Instrs: 534343 -> 534438 (+0.02%); split: -0.00%, +0.02% CodeSize: 2774852 -> 2775420 (+0.02%); split: -0.00%, +0.02% Latency: 7103512 -> 7103021 (-0.01%); split: -0.01%, +0.00% InvThroughput: 959477 -> 959447 (-0.00%) Copies: 42646 -> 42648 (+0.00%) Part-of: <mesa/mesa!32330>
-
Part-of: <mesa/mesa!32330>
-
Part-of: <!32330>
-
Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Part-of: <mesa/mesa!32347>
-
Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 17096f87 ("intel: Switch to COMPUTE_WALKER_BODY") Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Part-of: <mesa/mesa!32347>
-
Borderlands 3 (both DX11 and DX12 renderers) have a common pattern across many shaders: con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture) ... con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture) con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture) A single basic block contains piles of texelFetches from a 1D buffer texture, with constant coordinates. In most cases, only the .x channel of the result is read. So we have something on the order of 28 sampler messages, each asking for...a single uint32_t scalar value. Because our sampler doesn't have any support for convergent block loads (like the untyped LSC transpose messages for SSBOs)...this means we were emitting SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar, replicating what's effectively a SIMD1 value to the entire register. This is hugely wasteful, both in terms of register pressure, and also in back-and-forth sending and receiving memory messages. The good news is we can take advantage of our explicit SIMD model to handle this more efficiently. This patch adds a new optimization pass that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic block, with constant offsets, from the same texture. It constructs a new divergent coordinate where each channel is one of the constants (i.e <10, 11, 12, ..., 26> in the above example). It issues a new NoMask divergent texel fetch which loads N useful channels in one go, and replaces the rest with expansion MOVs that splat the SIMD1 result back to the full SIMD width. (These get copy propagated away.) We can pick the SIMD size of the load independently of the native shader width as well. On Xe2, those 28 convergent loads become a single SIMD32 ld message. On earlier hardware, we use 2 SIMD16 messages. Or we can use a smaller size when there aren't many to combine. In fossil-db, this cuts 27% of send messages in affected shaders, 3-6% of cycles, 2-3% of instructions, and 8-12% of live registers. On A770, this improves performance of Borderlands 3 by roughly 2.5-3.5%. Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <mesa/mesa!32573>
-
- Dec 11, 2024
-
-
This could erroneously cause an assertion to fail if the target block index was larger than UINT16_MAX. Fixes: cab5639a ('aco/assembler: chain branches instead of emitting long jumps') Part-of: <mesa/mesa!32599>
-
On a single runner, this job currently times out due to taking over 5 hours. The estimate from dEQP runner itself suggests a full run might take over 8 hours with the current configuration. We can't really work with that long runs, even if they are manual. We currently have 7 vim3 runners, so we can actually afford to parallelize the run a bit, to make this a bit more manageable. If we choose 4, we take up a bit more than half of the runners, but we leave two runners (plus a spare) for the pre-merge CI. With this, a each job takes about 2.5 hours. We leave the timeout at 3 hours for now, to have some headroom for new tests being enabled. Acked-by:
Daniel Stone <daniels@collabora.com> Part-of: <mesa/mesa!32591>
-
Acked-by:
Daniel Stone <daniels@collabora.com> Part-of: <mesa/mesa!32591>
-
Acked-by:
Daniel Stone <daniels@collabora.com> Part-of: <mesa/mesa!32591>
-
Acked-by:
Matt Turner <mattst88@gmail.com> Part-of: <!32534>
-
This enables support for GFX version 11.5.3. Signed-off-by:
Tim Huang <tim.huang@amd.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Part-of: <mesa/mesa!32567>
-
While the nr_channels is defined with 3 bits, which allows up to 7 channels, actually the number of channels is less or equal to 4. This adds an assertion that helps static analyzers to avoid several false positives related with this. Reviewed-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Signed-off-by:
Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <!32589>
-
This is the default but the option wasn't completely removed. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!32590>
-
This updates entry for 14017823839 which fixes issues on BMG with: dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1 Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
José Roberto de Souza <jose.souza@intel.com> Part-of: <mesa/mesa!32550>
-
We know we have a broken Vulkan driver, so it's debatable whether it's a broken Vulkan 1.0 or broken 1.1. Advertising 1.1 lets us run more tests, and this patch does this. We also bump the instance version id to 1.4, which seems appropriate since the overall Vulkan infrastructure within Mesa is at that level. Reviewed-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <mesa/mesa!32464>
-
We were using the same routine to find the device and instance version numbers. This isn't correct; the device version may vary based on the physical hardware we are using, but the instance version should always be the same. Reviewed-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <mesa/mesa!32464>
-
Turn on `imageCubeArray` and `fragmentStoresAndAtomics`, which we already support (the latter only on v10 and later). Reviewed-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <!32464>
-
Like mad, it's sometimes useful to swap the srcs of sad since not all flags are allowed on all srcs. However, unlike mad, sad is 3-src commutative so more srcs can be swapped. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <mesa/mesa!32501>
-
In preparation for supporting sad, rename to try_swap_cat3_two_srcs and add argument for src n. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <mesa/mesa!32501>
-
In preparation for supporting sad (which like mad may benefit from swapping some of it srcs), extract the swapping from try_swap_mad_two_srcs so that it can be reused for sad. This is necessary since, unlike mad, sad might also benefit from swapping srcs 1->2 (instead of only 2->1) or 3->2. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <mesa/mesa!32501>
-
We would mark mad srcs as swapped once we tried swapping them, even if it would not succeed. However, it might happen (especially after running ir3_shared_folding) that a new opportunity for swapping comes up later. Therefore, we should only mark the srcs as swapped when it actually succeeded. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <mesa/mesa!32501>
-
Turns out that sad is just iadd3. I assume it's an acronym for "Sum of Absolute Differences" which may make sense since its 2nd src supports (neg) which would allow SAD to be implemented using this instruction. NIR already supports algebraic patterns for selecting iadd3 so adding codegen support in ir3 is trivial. However, sad seems to have the same hardware limitation as mad and doesn't support the scalar ALU so we have to make sure to disable it when emitting iadd3. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <mesa/mesa!32501>
-
It only supports (neg) in its 2nd src but other than that has the same properties as mad. Signed-off-by:
Job Noorman <jnoorman@igalia.com> Part-of: <mesa/mesa!32501>
-