- 21 Jan, 2021 40 commits
-
-
Erik Faye-Lund authored
If we want syntax-highlighting to actually work here, we should make sure the code actually parses. This fixes a warning during docs build. Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <mesa/mesa!8243>
-
Erik Faye-Lund authored
Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <mesa/mesa!8243>
-
Erik Faye-Lund authored
There's a few more cases that needs proper quoting for Sphinx. Asterisks and ticks at the start of words, as well as underscores at the end of symbols, even when they have trailing escaped characters. We should really find a way to robustly escape these things when generating them. Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!8243>
-
Alejandro Piñeiro authored
Although I assume that this should be caught by the validation layers, recently while triaging the following tests: dEQP-VK.ycbcr.query.*r8g8b8a8_unorm* We found that they were setting a descriptorCount of zero, because it was not handling correctly the differences between Vulkan 1.0 and Vulkan 1.1. So let's just assert, just in case it happens again, as that would make the bugfixing far easier. Reviewed-by:
Iago Toral Quiroga <itoral@igalia.com> Part-of: <!8614>
-
Arcady Goldmints-Orlov authored
Reviewed-by:
Iago Toral Quiroga <itoral@igalia.com> Part-of: <!8570>
-
Iago Toral authored
The documentation states that if we disable Early Z for the whole frame in the RCL Tile Rendering Mode packet, then we should not emit any draw calls with it enabled (which we can do by enabling it in the CFG_BITS packet). Since we emit our RCL after recording our draw calls in the BCL and we were not considering there if any condition for global disable would be met, it was possible that we end up with an incorrect configuration when we decide for a global disable in the RCL, which can cause rendering artifacts. This can be easily observed by simply forcing the RCL bit to disable early Z in applications that are known to enable it in CFG_BITS (such as the UE Shooter demo for example). With this change we keep track of this scenario when we record draw calls in the BCL and if decide that we need to disable EZ for the entire job, we make sure we never enable it for any draw calls in the frame. Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!8589>
-
Iago Toral authored
This is an optimization that should make Z/S clears faster. To enable this we can't have any Z/S loads or stores in the job. Also, it seems that enabling early Z/S clearing is independent of whether early Z/S testing is enabled. Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!8589>
-
Iago Toral authored
There was a misunderstanding regarding the scope of some hardware bugs that led us to think that: 1. The Clear Tile Buffer Z/S bit was broken 2. The Clear Tile Buffer RTs bit would also clear Z/S. 1) is not really true, what happened was that some other bugs for which we need workarounds anyway would have that effect. 2) was only true for V3D 4.1, so it doesn't affect v3dv. This change makes proper use of the Z/S bit instead of falling back to clearing all tile buffers every time we have a Z/S clear. This also allows us to do color clears on the tile store (which is faster) rather than falling back to the clear all RTs bit every time we have a Z/S clear. v2: rewrite the original comment about the hardwarebug description to include recent discussions with Broadcom instead of keeping it as is and amending it with an update note. Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!8589>
-
Iago Toral authored
Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!8589>
-
Iago Toral authored
Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!8589>
-
Iago Toral authored
Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!8589>
-
Rhys Perry authored
Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Gitlab: #3993 Part-of: <!8163>
-
Rhys Perry authored
Matches SPIR-V -> NIR implementation of OpArrayLength. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Part-of: <!8163>
-
Daniel Schürmann authored
Currently, these cannot be vectorized as in NIR shift operands are 32bit while for 16bit-vectorization they need to be 16bit. No fossildb changes. Fixes: fcd2ef23 ('radv: vectorize 16bit instructions') Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8612>
-
Erik Faye-Lund authored
Because Gallium and Vulkan disagree on what kind of state strides is, we need to wrangle this state a bit, and up until now, we've been simply fixing this up while binding the vertex-buffers. But this isn't robust, because the vertex element state might be bound after the vertex-buffer state was bound. We also need to take binding-map into account, which we're currently missing as well. Instead, w need to deal with this at a place where we know what's being used for both of these. So let's do this during draw instead. Ideally, we'd also do some dirty-tracking to know if this is needed or not, but I believe Mike has some patches in this areas lined up, so it might be easier to wait for those. Fixes: 8d46e35d ("zink: introduce opengl over vulkan") Closes: #3661 Closes: #4125Reviewed-By:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <!8588>
-
Daniel Schürmann authored
Totals from 273 (0.20% of 139391) affected shaders (Navi10): VGPRs: 11600 -> 11792 (+1.66%) CodeSize: 1389304 -> 1383152 (-0.44%); split: -0.53%, +0.08% MaxWaves: 3848 -> 3752 (-2.49%) Instrs: 240228 -> 239478 (-0.31%); split: -0.37%, +0.06% Cycles: 20637708 -> 20580024 (-0.28%); split: -0.46%, +0.18% VMEM: 39164 -> 38831 (-0.85%); split: +0.06%, -0.91% SMEM: 21743 -> 22204 (+2.12%) VClause: 4787 -> 4783 (-0.08%) Copies: 39057 -> 38308 (-1.92%); split: -2.28%, +0.37% Branches: 6556 -> 6557 (+0.02%) Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Daniel Schürmann authored
No fossildb changes. Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Daniel Schürmann authored
This patch relaxes copy-propagation for PSEUDO instructions with subdword Operands / Definitions: general: - only propagate VGPR temps if the Definition is VGPR (or on p_as_uniform) parallelcopy/create_vector/phis: - size has to be the same extract_vector/split_vector: - propagate SGPR temps on GFX9+ or if the Definitions are not subdword - split_vector: size must not increase Totals from 282 (0.20% of 140985) affected shaders (Polaris10): VGPRs: 14520 -> 14408 (-0.77%) CodeSize: 2693956 -> 2694316 (+0.01%); split: -0.20%, +0.21% Instrs: 512874 -> 512864 (-0.00%); split: -0.16%, +0.16% Cycles: 26338860 -> 26320652 (-0.07%); split: -0.36%, +0.29% VMEM: 49460 -> 49634 (+0.35%); split: +0.47%, -0.12% SMEM: 10035 -> 10036 (+0.01%) VClause: 7675 -> 7674 (-0.01%) Copies: 66012 -> 65943 (-0.10%); split: -1.31%, +1.20% Branches: 17265 -> 17281 (+0.09%); split: -0.10%, +0.19% PreVGPRs: 12211 -> 12124 (-0.71%) Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Daniel Schürmann authored
This affects constants/SGPRs on GFX6-8 and the operand regClass of SDWA instructions. Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Daniel Schürmann authored
Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Daniel Schürmann authored
Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Daniel Schürmann authored
This will allow to propagate and emit sub-register constants on all hardware generations. Also fixes GFX8 constant emission to not use SDWA. Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Daniel Schürmann authored
It could happen that due to inconsistent copy-propagation v1 = p_parallelcopy v2b instructions were left after optimization on GFX8. Cc: 20.3 Cc: 21.0 Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Daniel Schürmann authored
This is dangerous w.r.t. LCSSA-phis. Totals from 746 (0.54% of 139391) affected shaders (Navi10): CodeSize: 8592160 -> 8568156 (-0.28%); split: -0.30%, +0.02% MaxWaves: 5172 -> 5171 (-0.02%); split: +0.02%, -0.04% Instrs: 1653949 -> 1648489 (-0.33%); split: -0.36%, +0.03% Cycles: 49474892 -> 49329224 (-0.29%); split: -0.33%, +0.03% VMEM: 137574 -> 137421 (-0.11%); split: +0.18%, -0.29% SMEM: 42391 -> 42439 (+0.11%); split: +0.12%, -0.01% VClause: 26946 -> 26943 (-0.01%) Copies: 130902 -> 126176 (-3.61%); split: -4.05%, +0.43% Branches: 54891 -> 54556 (-0.61%); split: -0.64%, +0.03% PreVGPRs: 53941 -> 53939 (-0.00%) This has a slight effect on RA due to affinity changes. Cc: 20.3 Cc: 21.0 Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8260>
-
Samuel Pitoiset authored
When NGG is used, the hw can't know the number of geometry shader primitives. To fix that, the NGG geometry shader accumulates itself the number of primitives by using an atomic operation directly to GDS. Then, begin/query copy the start/stop values from GDS to the query pool buffer using a PS_DONE event. This was actually wrong because PS_DONE is completely asynchronous to everything and executed when the preceding draws finish pixel shaders. Fix this by using a COPY_DATA packet which is synced with CP. This fixes random failures on Sienna Cichlid with dEQP-VK.query_pool.statistics_query.*.geometry_shader_primitives.*. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!8590>
-
Tapani Pälli authored
This makes glGetInternalformati64v queries work correctly. Closes: #4120Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Reviewed-by:
Adam Jackson <ajax@redhat.com> Part-of: <!8575>
-
Vinson Lee authored
Fix defect reported by Coverity Scan. Uninitialized pointer field (UNINIT_CTOR) member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize targ. Signed-off-by:
Vinson Lee <vlee@freedesktop.org> Reviewed-by:
Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by:
Pierre Moreau <dev@pmoreau.org> Reviewed-by:
Karol Herbst <kherbst@redhat.com> Part-of: <!8541>
-
Icecream95 authored
Cc: mesa-stable Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!8583>
-
Marek Olšák authored
Remove the done variable and return directly from switches. Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
This changes glCallLists from using a switch inside a loop to using loops inside a switch. Also fix the comments that didn't tell WHY something was important. Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
vbo_save implements this better by putting the vertices into a VBO. The SET functions removed here were overriding it to use the slower version that uploads the vertices on every call. Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
The glapi scripts are fully capable of generating this correctly for all GL APIs if we don't set exec="dynamic". exec="dynamic" should only be used for glBegin, glEnd, and all functions that are legal inside Begin/End. Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
The existing display list code is reused to call display lists from the app thread. The util_queue_fence_wait call waits for the last call that modifies display lists (such as glEndList and glDeleteLists), which ensures that accessing display lists from a non-mesa thread is thread safe because the wait guarantees that display lists are immutable during the asynchronous display list execution. Display lists are executed just like normal display lists except that they call glthread functions instead of the default GL dispatch. Many calls in display lists are skipped because glthread only tracks a few states. Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
glthread will use it. Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
for viewperf Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
This just tracks matrix stack depths in MatrixStackDepth and everything else here is needed to make it correct. Matrix stack depths will be returned by glGetIntegerv without synchronizing. Display lists will be handled by a separate commit. Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-
Marek Olšák authored
This decreases CPU time spent in the unmarshal_DrawElements function from 0.44% to 0.26% if no user buffers are present. Instead of converting all calls to either unmarshal_DrawArraysInstanced- BaseInstance or unmarshal_DrawElementsInstancedBaseVertexBaseInstance, which both also conditionally bind uploaded user buffers if needed and call one of: - DrawArraysInstancedBaseInstance - DrawElementsInstancedBaseVertexBaseInstance - DrawRangeElementsBaseVertex, add 3 unmarshal draw variants that are specialized version of the above that never bind uploaded user buffers. This removes all conditionals from the unmarshal functions for the common case when there are no user buffers. Unused function enums are used for the various draw variants. For example, CMD_DrawArrays is used to dispatch DrawArraysInstacedBaseInstance without user buffers, while CMD_DrawArraysInstacedBaseInstance is used to dispatch the same with user buffers. glthread isn't flexible enough to do it cleanly. Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!8297>
-