- 02 Feb, 2023 24 commits
-
-
Martin Roukala authored
Fixes: f6c06ef2 ("ci: Add manual rules variations to disable.") Reviewed-by:
Emma Anholt <emma@anholt.net> Signed-off-by:
Martin Roukala (né Peres) <martin.roukala@mupuf.org> Part-of: <mesa/mesa!21036>
-
Martin Roukala authored
Quoting a condition is apparently an effective way of working around YAML parsing weirdness. However, the quotes need to surround the whole expression, not just parts of it. Fixes: f6c06ef2 ("ci: Add manual rules variations to disable.") Suggested-by:
Michel Dänzer <mdaenzer@redhat.com> Reviewed-by:
Eric Engestrom <eric@igalia.com> Signed-off-by:
Martin Roukala (né Peres) <martin.roukala@mupuf.org> Part-of: <mesa/mesa!21036>
-
Lowering buffer textures will interact with multiple of our existing lowerings, and it's convenient to have it all in one place. This also keeps the pass ordering dependencies centralized. Signed-off-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <mesa/mesa!21060>
-
This reverts commit 0733aafa . Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Acked-by:
Vasily Khoruzhick <anarsoul@gmail.com> Part-of: <mesa/mesa!21035>
-
at some point this used to work, but it no longer does what it's supposed to do, which is return a memtype from a heap+flags Fixes: d702a503 ("zink: support multiple heaps per memory type") Part-of: <mesa/mesa!21025>
-
this should be ignored by drivers/layers, but it isn't, and the crashing is immense Part-of: <mesa/mesa!21025>
-
the original code was quite conservative and always created a new layout, but many times this is unnecessary, and the original layout can just be refcounted since it doesn't need to be merged Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!21051>
-
this is no longer used Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!21051>
-
deferring these can cause memory ballooning and oom Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!21051>
-
this should minimize pipeline creation time and make fast-linking "fast" Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!21051>
-
there's also now a(n unused) flag to indicate that the csos have been created Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!21051>
-
this is just about ownership, not modification, so refcounting saves time Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!21051>
-
Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!21051>
-
this avoids creating a separate noop fs for every pipeline Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <mesa/mesa!21051>
-
On sway+xwayland, both explicit and implicit modifiers are advertised. While dri3proto says nothing about it, zwp_linux_dmabuf_v1 says A compositor that sends valid modifiers and DRM_FORMAT_MOD_INVALID for a given format supports both explicit modifiers and implicit modifiers. "glmark2 -b build:model=bunny --fullscreen" goes from 468 to 598fps on a618 @ 2160x1440. Part-of: <mesa/mesa!20892>
-
Part-of: <mesa/mesa!20892>
-
Fossil DB stats on GFX11: Totals from 1343 (1.00% of 134913) affected shaders: SpillSGPRs: 7145 -> 7137 (-0.11%) CodeSize: 20737744 -> 20739148 (+0.01%); split: -0.02%, +0.03% Instrs: 4010443 -> 4008449 (-0.05%); split: -0.05%, +0.00% Latency: 50021520 -> 50021105 (-0.00%); split: -0.00%, +0.00% InvThroughput: 6354371 -> 6354112 (-0.00%); split: -0.00%, +0.00% VClause: 63035 -> 63038 (+0.00%); split: -0.01%, +0.01% SClause: 121162 -> 121166 (+0.00%) Copies: 251354 -> 251058 (-0.12%); split: -0.18%, +0.06% PreSGPRs: 137283 -> 137299 (+0.01%) Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com> Reviewed-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <mesa/mesa!20936>
-
We lower NIR's load_constant to load_global_constant, which uses A64 bindless messages. As such, we do the following math to produce the address for each load: base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH base@64 <- pack_64_2x32_split(base_lo, base_hi) addr@64 <- iadd(base@64, u2u64(offset@32)) On platforms that emulate 64-bit math, we have to emit additional code for the 64-bit iadd to handle the possibility of a carry happening and affecting the top bits. However, NIR constant data is always uploaded adjacent to the shader assembly, in the same buffer. These buffers are required to live in a 4GB region of memory starting at Instruction State Base Address. We always place the base address at a 4GB address. So the constant data always lives in a buffer entirely contained within a 4GB region, which means any offsets from the start of the buffer cannot possibly affect the high bits. So instead, we can simply do a 32-bit addition between the low bits of the base and the offset, then pack that with the unchanged high bits. On anv, INSTRUCTION_STATE_POOL_MIN_ADDRESS is 8GB, so the high bits are always 0x2. We don't even need to patch that portion of the address and can just use an immediate value. We do still need to pack, however. fossil-db on Icelake indicates the following for affected shaders: Instrs: 10830023 -> 10750080 (-0.74%) Cycles: 1048521282 -> 1046770379 (-0.17%); split: -0.33%, +0.16% Subgroup size: 103104 -> 103112 (+0.01%) Send messages: 570886 -> 570760 (-0.02%) Loop count: 14428 -> 14429 (+0.01%) Spill count: 14246 -> 14244 (-0.01%); split: -0.06%, +0.04% Fill count: 22802 -> 22794 (-0.04%); split: -0.04%, +0.01% Scratch Memory Size: 654336 -> 662528 (+1.25%) Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <mesa/mesa!20999>
-
We lower NIR's load_constant to load_global_constant, which uses A64 bindless messages. As such, we do the following math to produce the address for each load: base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH base@64 <- pack_64_2x32_split(base_lo, base_hi) addr@64 <- iadd(base@64, u2u64(offset@32)) On platforms that emulate 64-bit math, we have to emit additional code for the 64-bit iadd to handle the possibility of a carry happening and affecting the top bits. However, NIR constant data is always uploaded adjacent to the shader assembly, in the same buffer. These buffers are required to live in a 4GB region of memory starting at Instruction State Base Address. We always place the base address at a 4GB address. So the constant data always lives in a buffer entirely contained within a 4GB region, which means any offsets from the start of the buffer cannot possibly affect the high bits. So instead, we can simply do a 32-bit addition between the low bits of the base and the offset, then pack that with the unchanged high bits. On iris, IRIS_MEMZONE_SHADER is at [0, 4GB) so the high bits are always zero. We don't even need to patch that portion of the address and can simply use u2u64 to promote the 32-bit add result to a 64-bit value where the top bits are 0. shader-db on Icelake indicates that this: - Helps instructions: -1.13% in 135 affected programs - Helps spills/fills: -4.08% / -4.18% in 4 affected programs - Gains us 1 SIMD16 compute shader instead of SIMD8 Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <mesa/mesa!20999>
-
People who use RADV on eGPU have reported poor performance by default. They also noted that the "nosam" option helps. This commit disables placing CS objects in VRAM when the bandwidth is below that of PCIe 3.0 x8. Note that eGPUs are typically PCIe 3.0 x4. Contributes-to: mesa/mesa#7340 Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!20842>
-
This is so that we can tell whether the current kernel has the PCIe bandwidth info available or not. Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!20842>
-
Reviewed-by:
Giancarlo Devich <gdevich@microsoft.com> Part-of: <mesa/mesa!20945>
-
The Win32 WSI will want to query capabilities of the device to determine what's available. Reviewed-by:
Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <mesa/mesa!20945>
-
Be conservative in Gfx11+ and always stall in a fence. Since there are two different fences, and shader might want to synchronize between them. This change also brings back the original code block for the stall between the fence and comment from the commit b390ff35. v2: (Caio) - Re-arrange code block. - Adjust comment. Closes: mesa/mesa#6958 Fixes: f7262462 ("intel/fs: Rework fence handling in brw_fs_nir.cpp") Signed-off-by:
Sagar Ghuge <sagar.ghuge@intel.com> Tested-by:
Mark Janes <markjanes@swizzler.org> Reviewed-by:
Caio Oliveira <caio.oliveira@intel.com> Part-of: <mesa/mesa!20996>
-
- 01 Feb, 2023 16 commits
-
-
We were supposed to be checking that the job had "performance" in the name, not that the user (which we already checked is marge) has "performance" in their name. Fixes: f6c06ef2 ("ci: Add manual rules variations to disable irrelevant driver jobs.") Reviewed-by:
David Heidelberg <david.heidelberg@collabora.com> Part-of: <mesa/mesa!21002>
-
They got accidentally disabled entirely, so they didn't block merge, but once they re-enable then they'll block us again. The problem was that I moved allow_failure to a .performance-rules section, but we only ever inherit the rules from that location, not the rest of yml. This is basically a revert of 67547a04 ("ci: Move the performance jobs' allow_failure:true to the gl rules."), though I still keep the allow_failure in a more common location with comments, since perf jobs are a huge trap. Part-of: <mesa/mesa!21002>
-
This is no longer used. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!21048>
-
It's already zeroed in radv_pipeline_layout_init(). Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!21048>
-
That value is already computed when a descriptor set layout is created. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <mesa/mesa!21048>
-
Log the deviceName and driverInfo gated behind VN_DEBUG=log_ctx_info Signed-off-by:
Yiwei Zhang <zzyiwei@chromium.org> Part-of: <mesa/mesa!21030>
-
When the unused channels were at the end and so no reswizzling was needed, we wouldn't correctly mark the progress. Fixes: 3305c960 Signed-off-by:
Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Emma Anholt <emma@anholt.net> Part-of: <mesa/mesa!21014>
-
When the unused channels were at the end and so no reswizzling was needed, we wouldn't correctly mark the progress. Fixes: cb7f2012 Signed-off-by:
Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Emma Anholt <emma@anholt.net> Part-of: <mesa/mesa!21014>
-
Signed-off-by:
Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Emma Anholt <emma@anholt.net> Part-of: <mesa/mesa!21014>
-
Reviewed-by:
Faith Ekstrand <faith.ekstrand@collabora.com> Reviewer-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by:
Amber Amber <amber@igalia.com> Part-of: <mesa/mesa!20813>
-
This is necessary to properly support ARB_shader_texture_image_samples fixes crash in KHR-GL45.shader_texture_image_samples_tests.image_functional_test Reviewed-by:
Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by:
Rob Clark <robclark@freedesktop.org> Reviewer-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by:
Amber Amber <amber@igalia.com> Part-of: <mesa/mesa!20813>
-
This can be used by multiple drivers that do not support ms images Reviewed-by:
Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by:
Rob Clark <robclark@freedesktop.org> Reviewer-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by:
Amber Amber <amber@igalia.com> Part-of: <mesa/mesa!20813>
-
If the buffer hasn't been bound to memory yet, we will dereference a NULL pointer in radv_CreateAccelerationStructureKHR. cc: mesa-stable Closes: #8199 Reviewed-by:
Friedrich Vock <friedrich.vock@gmx.de> Part-of: <mesa/mesa!21019>
-
Fixes some H264 <-> HEVC transcode cases where the wrong level/profile was assigned to the output bitstream Part-of: <mesa/mesa!21043>
-
Complicated CFG and lots of SALU can cause this to take an extremely long time to finish. Fixes dEQP-VK.graphicsfuzz.cov-value-tracking-selection-dag-negation-clamp-loop and Monster Hunter Rise demo compile times. fossil-db (gfx1100): Totals from 57 (0.04% of 134574) affected shaders: Instrs: 170919 -> 171165 (+0.14%) CodeSize: 860144 -> 861128 (+0.11%) Latency: 961466 -> 961505 (+0.00%) InvThroughput: 127598 -> 127608 (+0.01%) Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Timur Kristóf <timur.kristof@gmail.com> Closes: mesa/mesa#8153 Fixes: 5806f024 ("aco/gfx11: workaround VALUPartialForwardingHazard") Part-of: <mesa/mesa!20941>
-
u_vector_add() don't keep the returned pointers valid. After the initial size allocated in u_vector_init() is reached it will allocate a bigger buffer and copy data from older buffer to the new one and free the old buffer, making all the previous pointers returned by u_vector_add() invalid and crashing the application when trying to access it. This is reproduced when running dEQP-VK.synchronization.signal_order.timeline_semaphore.* in DG2 SKUs that has 4 CCS engines, INTEL_COMPUTE_CLASS=1 is set and of course perfetto build is enabled. To fix this issue here I'm moving the storage/allocation of struct intel_ds_queue to struct anv_queue/iris_batch and using struct list_head to maintain a chain of intel_ds_queue of the intel_ds_device. This allows us to append or remove queues dynamically in future if necessary. Fixes: e760c5b3 ("anv: add perfetto source") Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by:
José Roberto de Souza <jose.souza@intel.com> Part-of: <mesa/mesa!20977>
-