Commits on Source (89)
-
Ever since 4246c286 and 7d85dc4f loop unrolling can no longer depend on inot being eliminated from the loop terminator condition so we need to be able to handle it. Here we simply check to see if the inot contains a simple terminator condition we previously handled. We also update the previous users of this function to use a newly name copy of the previous behaviour nir_is_terminator_condition_with_two_inputs(). Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com> Part-of: <!18006>
-
Ever since 4246c286 and 7d85dc4f loop unrolling can no longer depend on inot being eliminated from the loop terminator condition so we need to be able to handle it. This change avoids 292 loop unrolling regressions with shader-db once the following patch is applied. Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com> Part-of: <!18006>
-
This also prevents some small regressions in "glsl: remove GLSL IR inverse comparison optimisations". shader-db results: All Sandy Bridge and newer Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19941025 -> 19940805 (<.01%) instructions in affected programs: 52431 -> 52211 (-0.42%) helped: 188 / HURT: 6 total cycles in shared programs: 858451784 -> 858431633 (<.01%) cycles in affected programs: 2119134 -> 2098983 (-0.95%) helped: 183 / HURT: 12 LOST: 2 GAINED: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8364668 -> 8364670 (<.01%) instructions in affected programs: 753 -> 755 (0.27%) helped: 2 / HURT: 4 total cycles in shared programs: 248752572 -> 248752238 (<.01%) cycles in affected programs: 87290 -> 86956 (-0.38%) helped: 2 / HURT: 4 fossil-db results: Skylake, Ice Lake, and Tiger Lake had similar results. (Ice Lake shown) Instructions in all programs: 144909184 -> 144909130 (-0.0%) Instructions helped: 6 Cycles in all programs: 9138641740 -> 9138640984 (-0.0%) Cycles helped: 8 Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com> Part-of: <!18006>
-
As per 7d85dc4f GLSL IR is not smart enough to handle this correctly for NANs. Shader-db radeonsi (RX 6800): Totals from affected shaders: SGPRS: 26848 -> 26848 (0.00 %) VGPRS: 13552 -> 13552 (0.00 %) Spilled SGPRs: 134 -> 134 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 635000 -> 630988 (-0.63 %) bytes Max Waves: 5474 -> 5474 (0.00 %) Shader-db iris (BDW): total instructions in shared programs: 17538859 -> 17539018 (<.01%) instructions in affected programs: 29369 -> 29528 (0.54%) helped: 3 HURT: 126 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.49% max: 0.49% x̄: 0.49% x̃: 0.49% HURT stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1 HURT stats (rel) min: 0.27% max: 1.32% x̄: 0.61% x̃: 0.54% 95% mean confidence interval for instructions value: 1.13 1.33 95% mean confidence interval for instructions %-change: 0.54% 0.63% Instructions are HURT. total loops in shared programs: 4866 -> 4866 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 858548230 -> 858548915 (<.01%) cycles in affected programs: 1331737 -> 1332422 (0.05%) helped: 0 HURT: 92 HURT stats (abs) min: 2 max: 49 x̄: 7.45 x̃: 6 HURT stats (rel) min: 0.01% max: 1.90% x̄: 0.12% x̃: 0.05% 95% mean confidence interval for cycles value: 5.72 9.17 95% mean confidence interval for cycles %-change: 0.05% 0.19% Cycles are HURT. Note: With the addition of "nir/comparison_pre: See through an inot to apply the optimization", idr's shader-db results are: All Broadwell and newer Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 19940805 -> 19940802 (<.01%) instructions in affected programs: 582 -> 579 (-0.52%) helped: 3 / HURT: 0 total cycles in shared programs: 858431633 -> 858431747 (<.01%) cycles in affected programs: 4938 -> 5052 (2.31%) helped: 0 / HURT: 3 All older Intel platforms had similar results. (Haswell shown) total instructions in shared programs: 16715626 -> 16715670 (<.01%) instructions in affected programs: 9496 -> 9540 (0.46%) helped: 0 / HURT: 44 total cycles in shared programs: 881224396 -> 881232314 (<.01%) cycles in affected programs: 600610 -> 608528 (1.32%) helped: 6 / HURT: 44 Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com> Part-of: <!18006>
-
Rob Clark authored
Looks like wgl doesn't have much display state to protect. But it's ref_count should be atomic before we start removing locking from eglapi to protect against MakeCurrent being called in parallel on multiple threads. Signed-off-by:
Rob Clark <robdclark@chromium.org> Acked-by:
Eric Engestrom <eric@igalia.com> Reviewed-by:
Adam Jackson <ajax@redhat.com> Part-of: <!18050>
-
Rob Clark authored
In particular, MakeCurrent can be called on multiple threads in parallel. Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Eric Engestrom <eric@igalia.com> Reviewed-by:
Adam Jackson <ajax@redhat.com> Part-of: <!18050>
-
Rob Clark authored
In preperation of relaxing eglapi to not hold a lock across driver calls, but instead only for protecting it's own state, add our own lock to protect code paths that need locking or have not been audited yet. The blocking calls (ClientWaitSyncKHR) or critical path and/or blocking (MakeCurrent, SwapBuffers*) are lockless, as they have already been audited for thread safety. Signed-off-by:
Rob Clark <robdclark@chromium.org> Acked-by:
Eric Engestrom <eric@igalia.com> Reviewed-by:
Adam Jackson <ajax@redhat.com> Part-of: <mesa/mesa!18050>
-
Rob Clark authored
Once we relax the locking, we will be doing _eglPutFoo() outside of the big display lock. Signed-off-by:
Rob Clark <robdclark@chromium.org> Acked-by:
Eric Engestrom <eric@igalia.com> Reviewed-by:
Adam Jackson <ajax@redhat.com> Part-of: <!18050>
-
Rob Clark authored
eglTerminate() must be serialized against all other EGL calls. But in most cases, other EGL calls do not need to be serialized against each other. Which fits rather well with a rwlock. One would be tempted to simply replace the existing BDL with a rwlock, but several portability and debuggability limitations of the rwlock implementation prevent that, as described in the TerminateLock comment block. Signed-off-by:
Rob Clark <robdclark@chromium.org> Acked-by:
Eric Engestrom <eric@igalia.com> Reviewed-by:
Adam Jackson <ajax@redhat.com> Part-of: <!18050>
-
Rob Clark authored
Now that we have the rwlock TerminateLock protecting us against eglTerminate() yanking the rug from under us, drop the BDL across calls to driver (or at least the main ones that can potentially block). Closes: #7039 Signed-off-by:
Rob Clark <robdclark@chromium.org> Acked-by:
Eric Engestrom <eric@igalia.com> Reviewed-by:
Adam Jackson <ajax@redhat.com> Part-of: <!18050>
-
Otherwise running the CTS emits lots of warnings about these formats missing in the drivers format table. Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Part-of: <!18462>
-
Current code doesn't handle this, however it is easy to make it work by moving the negate to the presubtract source. Minor win in shader-db, mostly with Unigine shaders. Shader-db RV530: total instructions in shared programs: 136382 -> 136236 (-0.11%) instructions in affected programs: 9911 -> 9765 (-1.47%) total temps in shared programs: 18939 -> 18942 (0.02%) temps in affected programs: 37 -> 40 (8.11%) Reviewed-by:
Filip Gawin <filip@gawin.net> Signed-off-by:
Pavel Ondračka <pavel.ondracka@gmail.com> Part-of: <!18289>
-
The kernel driver has a range of valid priority values that can be supplied to it, submitting any priority value outside these bounds will result in `-EINVAL`. To avoid this, the priority value is now clamped to the range that the kernel supports. Fixes: 0c6fbfca Signed-off-by:
Mark Collins <mark@igalia.com> Part-of: <!18389>
-
Signed-off-by:
Mark Collins <mark@igalia.com> Part-of: <!18390>
-
Set the flag if the descriptor set is updated. Set the fragment descriptor dirty flag if blend consts are updated. Signed-off-by:
Karmjit Mahil <Karmjit.Mahil@imgtec.com> Reviewed-by:
Frank Binns <frank.binns@imgtec.com> Part-of: <!18429>
-
Signed-off-by:
Karmjit Mahil <Karmjit.Mahil@imgtec.com> Reviewed-by:
Frank Binns <frank.binns@imgtec.com> Part-of: <!18431>
-
From clang 16 has_trivial_destructor is deprecated. Use the replacement __is_trivially_destructible if it is available. Fixes new warnings with clang 16 like: ../src/compiler/glsl/list.h:58:4: warning: builtin __has_trivial_destructor is deprecated; use __is_trivially_destructible instead [-Wdeprecated-builtins] ../src/util/ralloc.h:551:4: note: expanded from macro 'DECLARE_RZALLOC_CXX_OPERATORS' DECLARE_ALLOC_CXX_OPERATORS_TEMPLATE(type, rzalloc_size) ^ ../src/util/ralloc.h:542:12: note: expanded from macro 'DECLARE_ALLOC_CXX_OPERATORS_TEMPLATE' if (!HAS_TRIVIAL_DESTRUCTOR(TYPE)) \ ^ ../src/util/macros.h:233:44: note: expanded from macro 'HAS_TRIVIAL_DESTRUCTOR' Reviewed-by:
Eric Engestrom <eric@igalia.com> Part-of: <!18423>
-
Signed-off-by:
Frank Binns <frank.binns@imgtec.com> Reviewed-by:
Karmjit Mahil <Karmjit.Mahil@imgtec.com> Part-of: <!18437>
-
When an OP_UNION def takes part in a vector source e.g. for a tex instruction we failed to clean up the OP_UNION instruction as rep() points towards the coalesced value instead. This fixes a regression on nv50 moving to NIR, but also potentially issues with nvc0. The main reason this is common in nv50 is, that we lower OP_SLCT to a set, predicated movs and a union. Closes: #6406 Closes: #7117 Cc: mesa-stable Signed-off-by:
Karol Herbst <kherbst@redhat.com> Reviewed-by:
M Henning <drawoc@darkrefraction.com> Part-of: <!18377>
-
The change didn't make any sense. `s` will always be `NV50_SHADER_STAGE_COMPUTE`, because it's used to loop over all shader stages. And the TSC cache on the compute side is already flushed in `nv50_compute_validate_samplers`. Fixes spurious `CACHE_ERROR` dmesg messages. Fixes: ba6ba8c9 ("nv50: adapt texture and constbuf paths for compute shaders") Signed-off-by:
Karol Herbst <kherbst@redhat.com> Reviewed-by:
M Henning <drawoc@darkrefraction.com> Part-of: <!18382>
-
Push descriptors are part of VK_KHR_push_descriptor. Not supporting it for now. Signed-off-by:
Karmjit Mahil <Karmjit.Mahil@imgtec.com> Reviewed-by:
Frank Binns <frank.binns@imgtec.com> Part-of: <!18430>
-
Uprev it to: d5aa3941aa03 ("freebsd: Move from 13.0 to 13.1") Reviewed-by:
David Heidelberg <david.heidelberg@collabora.com> Part-of: <!18467>
-
Some packages that are being installed via recommends are conflicting with already installed packages, causing this error: E: Packages need to be removed but remove is disabled. We dont need these packages, so don't install them. Reviewed-by:
David Heidelberg <david.heidelberg@collabora.com> Part-of: <!18467>
-
As it conflicts now with some packages already installed but not necessary, such as: libpam-systemd packagekit packagekit-tools policykit-1 systemd-sysv Reviewed-by:
David Heidelberg <david.heidelberg@collabora.com> Part-of: <!18467>
-
ci-templates will now pass all env vars to the command. Reviewed-by:
David Heidelberg <david.heidelberg@collabora.com> Part-of: <!18467>
-
Foz-DB Navi21: Totals from 7 (0.01% of 134913) affected shaders: CodeSize: 213364 -> 213028 (-0.16%) Instrs: 38347 -> 38319 (-0.07%) Latency: 780148 -> 779776 (-0.05%) InvThroughput: 520098 -> 519851 (-0.05%) Signed-off-by:
Georg Lehmann <dadschoorse@gmail.com> Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Emma Anholt <emma@anholt.net> Part-of: <!18435>
-
When loading a TCS or GS input, we generate some code to read the URB handle for a particular input control point (ICP handle), which often involves indirect addressing due to a non-constant vertex. For example: mov(8) vgrf148+0.0:UW, 76543210V shl(8) vgrf149:UD, vgrf148+0.0:UW, 2u shl(8) vgrf150:UD, vgrf145:UD, 5u add(8) vgrf151:UD, vgrf150:UD, vgrf149:UD mov_indirect(8) vgrf147:UD, g2:UD, vgrf151:UD, 96u Unfortunately, the first load with 76543210V is considered a partial write because the 8 channels of 16-bit UW data doesn't fill an entire register, and we can't allocate VGRFs at sub-register granularity. This causes none of the above math to be CSE'd, even though the first two instructions are common to *all* input loads, and the rest may be reused sometimes as well. To work around this, we stop emitting 76543210V to a temporary, and instead use nir_system_values[SYSTEM_VALUE_SUBGROUP_INVOCATION], which already contains this value, and is unconditionally set up for us. With all input loads using the same register for the sequence, our CSE pass is able to eliminate the rest of the common math. shader-db results on Tigerlake: total instructions in shared programs: 20748243 -> 20744844 (-0.02%) instructions in affected programs: 73410 -> 70011 (-4.63%) helped: 242 / HURT: 21 helped stats (abs) min: 1 max: 37 x̄: 14.17 x̃: 15 helped stats (rel) min: 0.17% max: 19.58% x̄: 6.13% x̃: 6.32% HURT stats (abs) min: 1 max: 4 x̄: 1.38 x̃: 1 HURT stats (rel) min: 0.18% max: 1.31% x̄: 0.58% x̃: 0.58% 95% mean confidence interval for instructions value: -13.73 -12.12 95% mean confidence interval for instructions %-change: -6.00% -5.19% Instructions are helped. total cycles in shared programs: 785828951 -> 785788480 (<.01%) cycles in affected programs: 597593 -> 557122 (-6.77%) helped: 227 / HURT: 13 helped stats (abs) min: 6 max: 624 x̄: 182.19 x̃: 185 helped stats (rel) min: 0.24% max: 18.22% x̄: 7.85% x̃: 7.80% HURT stats (abs) min: 2 max: 153 x̄: 68.08 x̃: 36 HURT stats (rel) min: 0.03% max: 7.79% x̄: 2.97% x̃: 1.25% 95% mean confidence interval for cycles value: -182.55 -154.71 95% mean confidence interval for cycles %-change: -7.84% -6.69% Cycles are helped. Reviewed-by:
Caio Oliveira <caio.oliveira@intel.com> Part-of: <!18455>
-
Make current check robust to incremental linking. Compare JMP targets if the first byte is opcode 0xE9. Reviewed-by:
Jesse Natalie <jenatali@microsoft.com> Part-of: <!18400>
-
this ensures types which consume more than 1 slot are effectively tagged so that the next stage inputs are also assigned properly fixes: spec@arb_enhanced_layouts@execution@component-layout@vs-fs-array-dvec3 cc: mesa-stable Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <mesa/mesa!18444>
-
Trust the host vulkan driver to load it through whatever way it finds to be most efficient. Saves upload on the frontend when other uniforms change. Reviewed-by:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <!18374>
-
nir_to_spirv can handle it. Cuts instructions in a turnip CS shader on Aztec Ruins from 36k to 3k. Part of #6115 Reviewed-by:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <!18374>
-
Let's be consistent. Also needed for parsing device and traces yml file from logs. Reviewed-by:
Guilherme Gallo <guilherme.gallo@collabora.com> Signed-off-by:
David Heidelberg <david.heidelberg@collabora.com> Part-of: <!18422>
-
Let's be consistent. Also needed for parsing device and traces yml file from logs. Reviewed-by:
Guilherme Gallo <guilherme.gallo@collabora.com> Signed-off-by:
David Heidelberg <david.heidelberg@collabora.com> Part-of: <!18422>
-
Implement natively by always returning invalid feedback. This is a legal (but useless) implementation according to the spec. In the future, I want to return the real feedback values from the host, but that requires changes to the venus protocol. The protocol does not know that the VkPipelineCreationFeedback structs in the VkGraphicsPipelineCreateInfo pNext are output parameters. Before VK_EXT_pipeline_creation_feedback, the pNext chain was input-only. Tested with `dEQP-VK.pipeline.*.creation_feedback.*`. The tests in vulkan-cts-1.3.3.0 are buggy. I submitted a fix to dEQP upstream; see below. Results with the bug: Passed: 0/30 ( 0.0%) Failed: 12/30 (40.0%) Not supported: 18/30 (60.0%) Warnings: 0/30 ( 0.0%) Results with bugfix: Passed: 12/30 (40.0%) Failed: 0/30 ( 0.0%) Not supported: 18/30 (60.0%) Warnings: 0/30 ( 0.0%) See: https://gerrit.khronos.org/c/vk-gl-cts/+/10086 See: virgl/virglrenderer!909 Reviewed-by:
Yiwei Zhang <zzyiwei@chromium.org> Signed-off-by:
Chad Versace <chadversary@chromium.org> Part-of: <!18035>
-
Android requires this to enable zink. Signed-off-by:
Yiwei Zhang <zzyiwei@chromium.org> Reviewed-By:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <!18453>
-
- use legacy EGL driver interface - use Android Vulkan loader - avoid unrelated kopper source files v2: update false #elif to #else Signed-off-by:
Yiwei Zhang <zzyiwei@chromium.org> Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (v1) Part-of: <!18453>
-
This change fixes below: 1. Dup the fence fd, otherwise, since external semaphore import takes the ownership of the fd, non-Vulkan part touches the fd leading to undefined behavior. This can be hit on implementations that defer the processing of the passed fd. 2. Use VK_SEMAPHORE_IMPORT_TEMPORARY_BIT for importing since that's required for SYNC_FD handle type because of its copy transference. Meanwhile, doing temporary import for opaque fd is fine in this path. Fixes: 32597e11 ("zink: implement GL semaphores") Signed-off-by:
Yiwei Zhang <zzyiwei@chromium.org> Reviewed-By:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <!18453>
-
For in-fence handling, dri2 has this below sequence in a row: 1. create_fence_fd: import external fence fd 2. fence_server_sync: import the pipe fence into the driver ctx 3. fence_reference: deref the created pipe fence Before this change, zink pushed the wrapped external semaphore to the wait semaphores of the next batch but the followed fence_reference will destroy the imported semaphore immediately. Instead of extending the lifecycle of the pipe fence throughout the batch state, we can simply transfer the semaphore ownership to the batch and destroy it upon batch reset. Fixes: 32597e11 ("zink: implement GL semaphores") Signed-off-by:
Yiwei Zhang <zzyiwei@chromium.org> Reviewed-By:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <!18453>
-
fence_get_fd is required for any kind of surface flush or native fence sync export on Android. The typical scenarios are: - eglDupNativeFenceFDANDROID - eglSwapBuffers* - eglMakeCurrent - glFlush/glFinish for front buffer rendering This change updates zink_flush to handle PIPE_FLUSH_FENCE_FD via a forced submit to signal an external sync_fd semaphore. fence_get_fd is implemented to export the sync file from that semaphore. Signed-off-by:
Yiwei Zhang <zzyiwei@chromium.org> Reviewed-By:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <!18453>
-
When the workgroup is 1 dimensional, simply use a vec3 filled with zeroes and the local invocation index. This is is better than lower_id_to_index + constant folding, because this way we don't leave behind extra ALU instrs. Note, this is relevant to mesh shaders on RDNA2 because it enables us to better detect cross-invocation output access. Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by:
Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <!18464>
-
Similar to how other I/O info is cleared at the beginning of gather_info we should also clear the cross-invocation mesh shader output mask. Fixes: 112a8568 Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Acked-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by:
Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <mesa/mesa!18464>
-
and ensure all submissions are synchronous upon NO_ASYNC_QUEUE_SUBMIT No other intended change in behavior. Signed-off-by:
Yiwei Zhang <zzyiwei@chromium.org> Part-of: <!18475>
-
This is to ensure semaphore export under globalFencing represents the correct submission. Signed-off-by:
Yiwei Zhang <zzyiwei@chromium.org> Part-of: <!18475>
-
libX11 has a perfectly good XID-based hash table we can be using, let's. Reviewed-by:
Emma Anholt <emma@anholt.net> Part-of: <!18474>
-
This reports RT commands like vkCmdTraceRaysKHR and vkCmdBuildAccelerationStructuresKHR in RGP. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!18496>
-
They were just not recorded. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!18496>
-
RGP expects a compute bind point. This allows it to show ISA of RT shaders and also enables instruction timing. Closes: #7213 Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!18496>
-
When building the main FS with GPL we don't know the color export formats. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!18255>
-
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!18255>
-
The color format would be zero and all exports would be removed. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!18255>
-
This will be used to prefetch PS epilogs. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!18255>
-
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!18255>
-
Long jumps seem to be slow and prefetching might help. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!18255>
-
This enables using PS epilogs with GPL. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Part-of: <!18255>
-
If a color attachment is used in a render pass but not exported by the FS, cb_shader_mask would be non-zero for this MRT. Though, to make sure the hw remapping of SPI_SHADER_COL_FORMAT<->CB_SHADER_MASK works as expected, we should also clear the unused color attachment in CB_SHADER_MASK. Otherwise, the hw will remap to the wrong MRT. Closes: #7221 Fixes: 8fcb4aa0 ("radv: compact MRTs to save PS export memory space") Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!18491>
-
With EXT_gpu_shader4 the support is already in place, we just have to allow it in glsl and expose the extension name. v2: Check whether the extension is enabled in the shader (Adam Jackson) v3: Don't check GLES version in lexer (mareko) Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Reviewed-by:
Adam Jackson <ajax@redhat.com> Part-of: <!18460>
-
It was determined that a significant part of queue submission overhead was from allocation/freeing of CSes constantly inside `tu_autotune_on_submit`. This has been reduced by retaining instances of `tu_submission_data` with their corresponding CSes, this results in entirely eliminating that overhead as resetting a CS is a very cheap operation compared to allocation or even freeing it wholly. Signed-off-by:
Mark Collins <mark@igalia.com> Part-of: <!18461>
-
The test seem to fail when run in conjunction with other tests. This got revealed after I introduced parrallelism in the VKCTS execution on VanGogh. This was caught by pre-merge CI, but idiot me thought this was a flake... and did not try re-running the job to verify...</BrownBag> Reference: #7220 Signed-off-by:
Martin Roukala (né Peres) <martin.roukala@mupuf.org> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!18480>
-
With dynamic rendering, the concept of framebuffer dimensions goes away so this won't make sense. Even with render passes, the render area is guaranteed to be inside the framebuffer so we may as well clip to the potentially smaller render area. This commit also moves window scissor setup to CmdBeginRenderPass2() time. This should be fine, even for meta ops, as the only meta ops which happen inside a render pass need the same render area as the render pass itself. Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!15587>
-
Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!15587>
-
If the current layout supports DCC, we initialize it. There's no reason why we can't leave it in that layout and need to stomp it to COLOR_ATTACHMENT_OPTIMAL. If the layout supports DCC, it's effectively identical to COLOR_ATTACHMENT_OPTIMAL anyway. Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!15587>
-
Also, update list of expected failures. dEQP-VK.image.sample_texture.*_bit_compressed_format_two_samplers_* now reliably pass on Polaris10 (GFX8) and Pitcairn (GFX6). Stoney has new failures but given there is already a lot of depth/stencil resolve failures, we shouldn't worry about them. Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!15587>
-
On dbenable depth bias is enabled so we need to write the depth bias data into the depth_bias_array (which gets uploaded to the device) and also setup the depth bias index (used in the control stream). Signed-off-by:
Karmjit Mahil <Karmjit.Mahil@imgtec.com> Reviewed-by:
Frank Binns <frank.binns@imgtec.com> Part-of: <!18438>
-
Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!18387>
-
Iago Toral authored
We don't have any special requirements for this, so we can just expose the extension. The tests in CTS have an issue where they only check if a format is supported for sampling but don't check if an image with that format can be created for sampling. In our case, since we can't sample 1D depth/stencil images, this causes affected tests to crash in the simulator (they pass on the device though). There is an issue with a fix here: https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/3923 Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!18489>
-
The heap size is a 64-bit value. Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!18483>
-
GPU addresses are 32-bit, so we can't address more than 4GB. Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by:
Eric Engestrom <eric@igalia.com> Part-of: <!18483>
-
Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <!18483>
-
This is mostly based on Turnip's implementation. Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by:
Eric Engestrom <eric@igalia.com> Part-of: <!18483>
-
Tomeu Vizoso authored
For those drivers that don't make full use of the 64 bits in pipe_query_result.u64. Applications will make use of it via GL_QUERY_COUNTER_BITS to handle when the value rolls over. Signed-off-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-By:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by:
Emma Anholt <emma@anholt.net> Part-of: <!10770>
-
Danilo Krummrich authored
Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
The 'set' instruction does distinguish between signed and unsigned, but always treats values as 32 bit. For singed values < 0 with a bit width smaller than 32 bit this falsely results in treating it as a positive value. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
Instructions like mov u16 %r78s 0x00ff (0) are dropped, since they're not supported by the HW, hence avoid using 8/16 bit destination registers for OP_MOV and use the full width of the register instead. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
Add helper functions to check whether a DataType is an unsigned integer type and whether a DataType is either an unsigned or signed integer type. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
Converting signed and unsigned integers from 8/16 bit sources to a 64 bit floating point destination (i2f64 / u2f64) isn't possible, hence convert the source to 32 bit first. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
Conversions to integers must be rounded towards zero, hence, actually do this for all integers including 8/16 bit sources. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
Directly converting from a float to an 8 bit integer and from a 64 bit float to an integer smaller than 32 bit is not supported, therefore add an intermediate conversion to an 32 bit integer. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
We can't convert from a 64 bit integer to any integer smaller than 64 bit directly, hence split the value first and then cvt / mov to the target type. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
We can't convert directly from signed integers smaller 64 bit to signed 64 bit integers. For 32 bit integers this is handled with SHR and MERGE. In order to also support 8/16 bit singed integers convert them to 32 bit first. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Danilo Krummrich authored
We can't directly convert from unsigned integers smaller than 64 bit to unsigned 64 bit integers. Hence, converting from 32 bit to 64 bit is handled by just merging with 0. To support U8/U16 integers handle them just the same way. Reviewed-by:
Karol Herbst <kherbst@redhat.com> Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Part-of: <!18109>
-
Caio Oliveira authored
Allow us to implement this in brw_fs_visitor.cpp, which then will let us deduplicate code between the CS-like barrier and the TCS barrier in a later patch. Reviewed-by:
Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Part-of: <!18362>
-
Caio Oliveira authored
CS-like and TCS control barriers converged in gfx >= 125, so use a common helper for the message payload setup. Reviewed-by:
Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Part-of: <!18362>
-
To avoid crash during cleanup if lima_context_create fails. Closes: #7196 Signed-off-by:
Roman Stratiienko <r.stratiienko@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Part-of: <!18407>
-
taking too long Part-of: <!18523>
-
There aren't too many. Signed-off-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by:
Eric Engestrom <eric@igalia.com> Part-of: <!18508>
-
This flag controls virgl's behavior when buffers are accessed on the guest through Mesa's GBM interface. As such, this flag needs to be consistent in both the resource creation and fd import case. Previously, the fd import resource's flag value would be inconsistent with the original resource's value. This patch fixes this by inferring the value of this flag based on the resource's size. Signed-Off By: Isaac Bosompem <mrisaacb@google.com> Part-of: <!18477>
-
Fixes: 7662a5e9 mesa: Remove PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED/lower_cs_derived. Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Part-of: <!18518>
-
GL_MAX_FRAGMENT_UNIFORM_COMPONENTS may not report a size that is useful to calculate the supported UBO size. Use the value GL_MAX_UNIFORM_BLOCK_SIZE instead when the host supports this. Related: virgl/virglrenderer#286 Fixes: 5b683ba1 virgl: Only progagate the uniform numbers if the numbers are actually right Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Part-of: <mesa/mesa!18512>
Showing
- .gitlab-ci.yml 2 additions, 2 deletions.gitlab-ci.yml
- .gitlab-ci/container/cross_build.sh 1 addition, 1 deletion.gitlab-ci/container/cross_build.sh
- .gitlab-ci/container/debian/x86_test-base.sh 3 additions, 1 deletion.gitlab-ci/container/debian/x86_test-base.sh
- .gitlab-ci/container/debian/x86_test-vk.sh 1 addition, 1 deletion.gitlab-ci/container/debian/x86_test-vk.sh
- .gitlab-ci/container/gitlab-ci.yml 3 additions, 3 deletions.gitlab-ci/container/gitlab-ci.yml
- .gitlab-ci/image-tags.yml 5 additions, 5 deletions.gitlab-ci/image-tags.yml
- .gitlab-ci/lava/lava-submit.sh 2 additions, 0 deletions.gitlab-ci/lava/lava-submit.sh
- .gitlab-ci/test/gitlab-ci.yml 2 additions, 0 deletions.gitlab-ci/test/gitlab-ci.yml
- docs/envvars.rst 38 additions, 0 deletionsdocs/envvars.rst
- docs/features.txt 5 additions, 5 deletionsdocs/features.txt
- docs/gallium/screen.rst 2 additions, 0 deletionsdocs/gallium/screen.rst
- src/amd/ci/radv-pitcairn-aco-fails.txt 0 additions, 4 deletionssrc/amd/ci/radv-pitcairn-aco-fails.txt
- src/amd/ci/radv-polaris10-aco-fails.txt 0 additions, 4 deletionssrc/amd/ci/radv-polaris10-aco-fails.txt
- src/amd/ci/radv-stoney-aco-fails.txt 12 additions, 0 deletionssrc/amd/ci/radv-stoney-aco-fails.txt
- src/amd/ci/radv-vangogh-aco-flakes.txt 8 additions, 0 deletionssrc/amd/ci/radv-vangogh-aco-flakes.txt
- src/amd/common/ac_sqtt.h 8 additions, 0 deletionssrc/amd/common/ac_sqtt.h
- src/amd/vulkan/layers/radv_sqtt_layer.c 129 additions, 18 deletionssrc/amd/vulkan/layers/radv_sqtt_layer.c
- src/amd/vulkan/meson.build 0 additions, 1 deletionsrc/amd/vulkan/meson.build
- src/amd/vulkan/radv_cmd_buffer.c 379 additions, 948 deletionssrc/amd/vulkan/radv_cmd_buffer.c
- src/amd/vulkan/radv_meta.c 6 additions, 14 deletionssrc/amd/vulkan/radv_meta.c
This diff is collapsed.