- 23 Feb, 2021 40 commits
-
-
Lionel Landwerlin authored
If we don't wait on anything, I bet it makes the QueuePresent faster, but also completely wrong... Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 02f94c33 ("anv: don't wait for completion of work on vkQueuePresent()") Closes: #4276Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!9211> (cherry picked from commit b0b1bf99)
-
Rhys Perry authored
Found in a Death Stranding shader with loop unrolling disabled. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Fixes: 9a089baf ("aco: optimize boolean phis with uniform selections") Part-of: <!9193> (cherry picked from commit 75c9adf0)
-
Lionel Landwerlin authored
Another mistake which is that we don't use the right wait API. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 829699ba ("anv: implement shareable timeline semaphores") Closes: #4276Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!9188> (cherry picked from commit 02f94c33)
-
Jeremy Huddleston Sequoia authored
Cc: 20.3 21.0 <mesa-stable@lists.freedesktop.org> Closes: #4113Signed-off-by:
Jeremy Huddleston Sequoia <jeremyhu@apple.com> Reviewed-by:
Dylan Baker <dylan.c.baker@intel.com> Part-of: <!8796> (cherry picked from commit 38ae84b8)
-
Dave Airlie authored
When mesa gets a DRI2 1.1 connection (as experienced with vmwware DDX) we don't get a pointer for this. Don't explode just keep going. Fixes: 60ebeb46 ("glx: Implement GLX_EXT_swap_control for DRI2 and DRI3") Reviewed-by:
Adam Jackson <ajaX@redhat.com> Part-of: <!9184> (cherry picked from commit 279d1705)
-
Alyssa Rosenzweig authored
We don't support it yet. Fixes: 61d3ae6e ("panfrost: Initial stub for Panfrost driver") Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by:
Boris Brezillon <boris.brezillon@collabora.com> Part-of: <!9164> (cherry picked from commit 5eff64e3)
-
Bas Nieuwenhuizen authored
We have to choose between: 1) Stop handling two identical GPUs 2) Stop having crashes with other layers active. 3) Fix the Vulkan Loader. Since nobody seems to want to spend enough effort to do 3 the effective choice is between 1 and 2. This is choosing 2, as two identical GPUs is pretty uncommon since crossfire doesn't work on Linux anyway. (And it would only work sporadically as the game needs to enable the extension) CC: mesa-stable Closes: #3801Reviewed-by:
Dave Airlie <airlied@redhat.com> Part-of: <!8414> (cherry picked from commit 38ce8d4d)
-
Bas Nieuwenhuizen authored
Can be used without sharing, so if only the dedicated memory info is set we know it isn't shareable. Use that. Fixes: a639d40f ("radv: add support for local bos. (v3)") Closes: #4330Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!9176> (cherry picked from commit 2d520b69)
-
Vinson Lee authored
Fix defect reported by Coverity Scan. Uninitialized scalar variable (UNINIT) uninit_use: Using uninitialized value ds_state.front. Field ds_state.front.writeMask is uninitialized. Fixes: d488d0fd ("aco: add framework for testing isel and integration tests") Signed-off-by:
Vinson Lee <vlee@freedesktop.org> Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!9033> (cherry picked from commit 7cc83f23)
-
Erico Nunes authored
If this is not defined, mesa may not deallocate sampler views, which can result in memory leaks. Just define it to the same as max texture samplers, like other mesa drivers do. Cc: mesa-stable Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com> Part-of: <!9172> (cherry picked from commit f3d47ba0)
-
Ian Romanick authored
fmin(-A, -B) is -fmax(A, B), and fmax(-A, -B) is -fmin(A, B). Therefore the logic joining A and B should toggle between ior and iand for the negated versions. At the very least, a shader from Euro Truck Simulator 2 in shader-db is affected by this. The KIL instruction in the (ARB assembly) shader ends up with the wrong logic. This is _probably_ the source of #1346. That said, the issue mentions that Mesa 18.0.5 works, but commit 68420d83 ("nir: Simplify min and max of b2f") was added in 17.3. Moreover, I was not able to reproduce the error in the ETS2 shader from shader-db from any Mesa commit near the time the original fd.o bugzilla was submitted (December 2018).
🤷 In fact, the current error in that shader starts with 9167324a ("nir/algebraic: Mark some logic-joined comparison reductions as exact"). That's a bit of a red herring as 9167324a just sets off a chain of replacements that eventually leads to the incorrect min/max of b2f patterns fixed by this commit. The other affected shaders in the shader-db results are from Cargo Commander. These are also ARB assembly shaders. I think any ARB assembly shader that uses the pattern SLT r0, ...; ... KIL -r0; will suffer from issues related to this. This change fixes the piglit tests/spec/arb_fragment_program/kil-of-slt.shader_test test added in piglit!454. shader-db results: All Gen6+ platforms had similar result. (Ice Lake shown) total instructions in shared programs: 20034604 -> 20034486 (<.01%) instructions in affected programs: 3885 -> 3767 (-3.04%) helped: 47 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 2.64 x̃: 2 helped stats (rel) min: 2.33% max: 8.33% x̄: 3.48% x̃: 3.39% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 13.64% max: 16.67% x̄: 15.15% x̃: 15.15% 95% mean confidence interval for instructions value: -2.83 -1.99 95% mean confidence interval for instructions %-change: -3.84% -1.60% Instructions are helped. total cycles in shared programs: 979881379 -> 979879406 (<.01%) cycles in affected programs: 119873 -> 117900 (-1.65%) helped: 46 HURT: 3 helped stats (abs) min: 10 max: 756 x̄: 45.41 x̃: 26 helped stats (rel) min: 0.53% max: 19.72% x̄: 1.67% x̃: 1.26% HURT stats (abs) min: 28 max: 56 x̄: 38.67 x̃: 32 HURT stats (rel) min: 1.44% max: 3.54% x̄: 2.75% x̃: 3.27% 95% mean confidence interval for cycles value: -70.83 -9.70 95% mean confidence interval for cycles %-change: -2.23% -0.57% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8115098 -> 8115076 (<.01%) instructions in affected programs: 2592 -> 2570 (-0.85%) helped: 32 HURT: 2 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.88% max: 2.70% x̄: 1.35% x̃: 1.31% HURT stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 HURT stats (rel) min: 17.24% max: 18.52% x̄: 17.88% x̃: 17.88% 95% mean confidence interval for instructions value: -1.15 -0.15 95% mean confidence interval for instructions %-change: -1.83% 1.39% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 238189718 -> 238189802 (<.01%) cycles in affected programs: 75076 -> 75160 (0.11%) helped: 3 HURT: 31 helped stats (abs) min: 2 max: 130 x̄: 44.67 x̃: 2 helped stats (rel) min: 0.18% max: 5.70% x̄: 2.02% x̃: 0.19% HURT stats (abs) min: 2 max: 70 x̄: 7.03 x̃: 4 HURT stats (rel) min: 0.07% max: 6.41% x̄: 0.53% x̃: 0.15% 95% mean confidence interval for cycles value: -7.27 12.21 95% mean confidence interval for cycles %-change: -0.33% 0.94% Inconclusive result (value mean confidence interval includes 0). No fossil-db changes on any Intel platform. Fixes: 68420d83 ("nir: Simplify min and max of b2f") Closes: #1346Reviewed-by:Matt Turner <mattst88@gmail.com> Part-of: <!9122> (cherry picked from commit 7e127c1f)
-
Yevhenii Kharchenko authored
Fixes 'nir_tex_src_coord' param was provided to NIR 'txf' operation as a vec3 for TEXTURE_1D_ARRAY target, causing an assert. Only following targets require vec3: TEXTURE_2D_ARRAY, TEXTURE_3D, TEXTURE_CUBE, TEXTURE_CUBE_ARRAY. The rest must use vec2. Packing layer value into Y-coordinate the same way it was done in 'create_fs' in commit 2bf6dfac. Fixes: a01ad311 ("st/mesa: Add NIR versions of the PBO upload/download shaders. ") Signed-off-by:
Yevhenii Kharchenko <yevhenii.kharchenko@globallogic.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Part-of: <!9014> (cherry picked from commit 1516b6bd)
-
Samuel Pitoiset authored
Higher values break tessellation. I was only able to reproduce this by switching back/from AMDVLK which was really weird... According to Marek (1c6eca23), it looks like it's related to register shadowing and PAL enables it, that probably explains a bit. Copied from PAL and RadeonSI. Closes: #4207 Gitlab: #2498 Fixes: 74d69299 ("radv/gfx10: double the number of tessellation offchip buffers per SE") Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!9141> (cherry picked from commit e3bdf815)
-
Timur Kristóf authored
Cc: mesa-stable Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!9100> (cherry picked from commit a6e1178f)
-
Timur Kristóf authored
The LLVM backend forgot to set config->lds_size, which is used for reporting LDS stats. Fixes: cf89bdb9 "radv: align the LDS size in calculate_tess_lds_size()" Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!9098> (cherry picked from commit 72c348f8)
-
Jason Ekstrand authored
They've all supported it since either forever or Iron Lake which is equivalent to forever for Vulkan. From Kenneth Graunke's GitLab review: "Linear blending of depth buffer data is usually fairly nonsense (something's 2 meters away? another thing's 6 meters away? let's just report 4 meters?)...but it's definitely a thing we can do, so we may as well let apps do it, and trust them not when it doesn't make sense." Cc: mesa-stable@lists.freedesktop.org Reviewed-by:
Eric Anholt <eric@anholt.net> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Part-of: <!9110> (cherry picked from commit 56d005c2)
-
Anuj Phogat authored
Test the sampler->conversion for NULL pointer before dereferencing it. Fixes: Regressions in VulkanCTS. Fixes: 22631611 "intel/anv: Fix condition to set MipModeFilter for YUV surface" Signed-off-by:
Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit 69e94e89)
-
Ian Romanick authored
On Intel platforms before Gen6, there is no min or max instruction. Instead, a comparison instruction (*more on this below) and a SEL instruction are used. Per other IEEE rules, the regular comparison instruction, CMP, will always return false if either source is NaN. A sequence like cmp.l.f0.0(16) null<1>F g30<8,8,1>F g22<8,8,1>F (+f0.0) sel(16) g8<1>F g30<8,8,1>F g22<8,8,1>F will generate the wrong result for min if g22 is NaN. The CMP will return false, and the SEL will pick g22. To account for this, the hardware has a special comparison instruction CMPN. This instruction behaves just like CMP, except if the second source is NaN, it will return true. The intention is to use it for min and max. This sequence will always generate the correct result: cmpn.l.f0.0(16) null<1>F g30<8,8,1>F g22<8,8,1>F (+f0.0) sel(16) g8<1>F g30<8,8,1>F g22<8,8,1>F The problem is... for whatever reason, we don't emit CMPN. There was even a comment in lower_minmax that calls out this very issue! The bug is actually older than the "Fixes" below even implies. That's just when the comment was added. That we know of, we never observed a failure until #4254. If src1 is known to be a number, either because it's not float or it's an immediate number, use CMP. This allows cmod propagation to still do its thing. Without this slight optimization, about 8,300 shaders from shader-db are hurt on Iron Lake. Fixes the following piglit tests (from piglit!475): tests/spec/glsl-1.20/execution/fs-nan-builtin-max.shader_test tests/spec/glsl-1.20/execution/fs-nan-builtin-min.shader_test tests/spec/glsl-1.20/execution/vs-nan-builtin-max.shader_test tests/spec/glsl-1.20/execution/vs-nan-builtin-min.shader_test Closes: #4254 Fixes: 2f2c00c7 ("i965: Lower min/max after optimization on Gen4/5.") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8115134 -> 8115135 (<.01%) instructions in affected programs: 229 -> 230 (0.44%) helped: 0 HURT: 1 Part-of: <!9027> (cherry picked from commit 3c31364f)
-
Ian Romanick authored
Since the CMPN builder was never used, there was no reason to make its interface usable. :) Fixes: 2f2c00c7 ("i965: Lower min/max after optimization on Gen4/5.") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!9027> (cherry picked from commit 684ec33c)
-
Ian Romanick authored
v2: Move checks to the EU validator. Suggested by Jason. Fixes: 2f2c00c7 ("i965: Lower min/max after optimization on Gen4/5.") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!9027> (cherry picked from commit 6c8e2e93)
-
Anuj Phogat authored
Mip Mode Filter must be set to MIPFILTER_NONE for Planar YUV surfaces. Add the missing condition to check for planar format. Fixes: b24b93d5 "anv: enable VK_KHR_sampler_ycbcr_conversion" Signed-off-by:
Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit 22631611)
-
Mike Blumenkrantz authored
this fixes crashes on startup Fixes: a3512ddf ("st/mesa: don't enable NV_copy_depth_to_color if NIR doesn't support FP64") fixes #4312Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Part-of: <!9082> (cherry picked from commit 4feca7ec)
-
Vinson Lee authored
Fix defect reported by Coverity Scan. Resource leak (RESOURCE_LEAK) leaked_storage: Variable cs going out of scope leaks the storage it points to. Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs") Signed-off-by:
Vinson Lee <vlee@freedesktop.org> Reviewed-by:
Christian Gmeiner <christian.gmeiner@gmail.com> Part-of: <!9034> (cherry picked from commit a7a7d25e)
-
Jason Ekstrand authored
On Gen7, we have to split shuffles into two MOVs for 64-bit types so we can't handle source modifiers. On Gen12.5, we have to use integer types all the time so we can't use them there either. Fixing that will be a different commit but it interacts with this one. Fixes: 90c9f295 "i965/fs: Add support for nir_intrinsic_shuffle" Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <!9068> (cherry picked from commit 3ce6ca72)
-
Jason Ekstrand authored
We can't move the shuffle to a new block so this only works if the shuffle and the bcsel are in the same block. Fortunately, in the motivating case, this is true. Also, we have to be careful around discard. We could try really hard to just avoid moving them past discard but we choose to simply bail if we see a discard instead. Fixes: 4ff4d4e5 "nir/opt_intrinsic: Optimize bcsel(b, shuffle..." Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com> Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <!9068> (cherry picked from commit ceb6986d)
-
Mike Blumenkrantz authored
spirv requires that 1bit values be bool types, not uints Fixes: 93af0050 ("zink: use uvec for undefs") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!9059> (cherry picked from commit 8300bc1f)
-
Daniel Schürmann authored
VGPRs are now allocated in blocks of 8 normal or 16 shared VGPRs, respectively. Fixes: 14a5021a ('aco/gfx10: Refactor of GFX10 wave64 bpermute.') Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!8921> (cherry picked from commit bacc3b36)
-
Bas Nieuwenhuizen authored
Otherwise there might be buffers for which we don't have a type. Fixes: 7262c743 ("radv: Determine memory type for import based on fd.") Closes: #4280Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!8996> (cherry picked from commit 045a8508)
-
Giovanni Mascellani authored
By the Vulkan specification, and similarly to many other Vulkan calls, it is allowed to destroy a null descriptor update template. Signed-off-by:
Giovanni Mascellani <gmascellani@codeweavers.com> Fixes: af5f13e5 ("anv: add VK_KHR_descriptor_update_template support") Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <!9005> (cherry picked from commit 72b8e643)
-
Giovanni Mascellani authored
Signed-off-by:
Giovanni Mascellani <gmascellani@codeweavers.com> Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com> Fixes: e2c4435b ("util/disk_cache: add thread queue to disk cache") Part-of: <!8983> (cherry picked from commit c6731daa)
-
Lionel Landwerlin authored
Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 34f32a6d ("anv: implement VK_KHR_timeline_semaphore") Closes: #4277Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!8987> (cherry picked from commit 6673c400)
-
Timur Kristóf authored
Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Eric Anholt <eric@anholt.net> Fixes: f3b33a5a Closes: #4127 Part-of: <!8920> (cherry picked from commit e163f1c9)
-
Dave Airlie authored
asan on llvmpipe with piglit tests/spec/arb_gl_spirv/execution/ssbo/array-indirect.shader_test reported. ================================================================= ==3288325==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f5b2d6513cf in __interceptor_malloc (/lib64/libasan.so.6+0xab3cf) #1 0x7f5b2a1ae810 in ralloc_size ../src/util/ralloc.c:133 #2 0x7f5b2a1ae7e1 in ralloc_context ../src/util/ralloc.c:120 #3 0x7f5b2b210177 in gl_nir_link_uniform_blocks ../src/compiler/glsl/gl_nir_link_uniform_blocks.c:585 #4 0x7f5b2af7f52d in gl_nir_link_spirv ../src/compiler/glsl/gl_nir_linker.c:614 #5 0x7f5b2a3b76fa in st_link_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:765 #6 0x7f5b2a3ace7b in st_link_shader ../src/mesa/state_tracker/st_glsl_to_ir.cpp:65 #7 0x7f5b2a471165 in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3122 #8 0x7f5b2a97a6d8 in link_program ../src/mesa/main/shaderapi.c:1311 #9 0x7f5b2a97a6d8 in link_program_error ../src/mesa/main/shaderapi.c:1419 #10 0x7f5b2a97df45 in _mesa_LinkProgram ../src/mesa/main/shaderapi.c:1911 #11 0x7f5b299b59e5 in stub_glLinkProgram /mnt/devel/gl/piglit/tests/util/piglit-dispatch-gen.c:33956 #12 0x40a71a in link_and_use_shaders /mnt/devel/gl/piglit/tests/shaders/shader_runner.c:1604 #13 0x415722 in init_test /mnt/devel/gl/piglit/tests/shaders/shader_runner.c:5225 #14 0x4164ce in piglit_init /mnt/devel/gl/piglit/tests/shaders/shader_runner.c:5597 #15 0x7f5b29a214e9 in run_test /mnt/devel/gl/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:73 #16 0x7f5b29a103fe in piglit_gl_test_run /mnt/devel/gl/piglit/tests/util/piglit-framework-gl.c:229 #17 0x407847 in main /mnt/devel/gl/piglit/tests/shaders/shader_runner.c:72 #18 0x7f5b2928f1e1 in __libc_start_main (/lib64/libc.so.6+0x281e1) SUMMARY: AddressSanitizer: 48 byte(s) leaked in 1 allocation(s). Fixes: 57239192 ("nir/linker: add gl_nir_link_uniform_blocks.c") Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <!8974> (cherry picked from commit 14b2dc00)
-
Lionel Landwerlin authored
I'm pretty sure this doesn't fix anything because the WSI code only use a single VkSubmitInfo, but better be safe. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: ccb7d606 ("anv: Use submit-time implicit sync instead of allocate-time") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!8934> (cherry picked from commit 64cb03a5)
-
Rhys Perry authored
This should no longer be necessary since the mark_block_wqm() we use to flag break conditions as WQM now adds block to the worklist. With them added to the worklist, get_block_needs() will add WQM to block_needs. Adding WQM to block_needs here without adding the block to the worklist (like we do here) can cause issues because it does not ensure that the predecessors' branches are in WQM (needed for it to be possible to transition to WQM in the block). This happened in an Overwatch shader. No fossil-db changes. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Fixes: 661922f6 ("aco: add block to worklist in mark_block_wqm()") Closes: #4066 Part-of: <!8446> (cherry picked from commit f0074a6f)
-
Bas Nieuwenhuizen authored
Fixes: cf2eebdf ("radv,gallium: Add driconf option to reduce advertised VRAM size.") Reviewed-by:
Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!8915> (cherry picked from commit bd7d8a77)
-
Caio Marcelo de Oliveira Filho authored
Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable Tested-by:
Tapani Pälli <tapani.palli@intel.com> Part-of: <!8864> (cherry picked from commit 568a6682)
-
Ian Romanick authored
The base mask previously used was 0xffffffff. This is not correct (but should still work) for 16-bit and 8-bit values, but it means the high 32-bits of 64-bit values will get chopped off. Instead of just restricting the pattern to 32-bits (as was done before 00b28a50), this extends the optimization in two ways: 1. Make it correct for other bit sizes. 2. Make it work for arbitrary shift counts. This has the added benefit of reducing the number of patterns actually added (7 previously, 4 now). The "Reassociate for improved CSE" part is just reverted to its pre-00b28a50 behavior. I doubt that pattern is likely to have much impact outside 32-bits. This change fixes the piglit tests tests/spec/arb_gpu_shader_int64/fs-shl-of-shr-int64.shader_test and tests/spec/arb_gpu_shader_int64/fs-iand-of-iadd-int64.shader_test. All of the shaders helped in shader-db are vertex shaders on platforms with vector-oriented vertex processing. The shaders contain ((x >> 16) << 16). These platforms set lower_extract_word, so the optimization that transforms (x >> 16) to extract_u16 doesn't trigger. With only ~60 shaders involved, I didn't bother trying to add extract_XYZ versions of these patterns to try to get those cases. Fixes: 00b28a50 ("nir/algebraic: trivially enable existing 32-bit patterns for all bit sizes") Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Haswell and earlier Intel GPUs had simlar results. (Haswell shown) total instructions in shared programs: 16397554 -> 16397496 (<.01%) instructions in affected programs: 7961 -> 7903 (-0.73%) helped: 58 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.36% max: 1.89% x̄: 0.99% x̃: 0.78% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.13% -0.85% Instructions are helped. total cycles in shared programs: 1035483770 -> 1035483504 (<.01%) cycles in affected programs: 75922 -> 75656 (-0.35%) helped: 44 HURT: 2 helped stats (abs) min: 2 max: 12 x̄: 6.14 x̃: 2 helped stats (rel) min: 0.05% max: 1.67% x̄: 0.87% x̃: 0.72% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -7.28 -4.29 95% mean confidence interval for cycles %-change: -1.03% -0.63% Cycles are helped. Part-of: <!8852> (cherry picked from commit 6b0443a9)
-
Simon Ser authored
The hardware can only scan-out linear buffers with a pitch aligned to 256. It can only use packed buffers for cursors. Signed-off-by:
Simon Ser <contact@emersion.fr> Reviewed-by:
Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable Part-of: <!8500> (cherry picked from commit a4c11385)
-
Simon Ser authored
The hardware can only scan-out linear buffers with a pitch aligned to 256. It can only use packed buffers for cursors. Signed-off-by:
Simon Ser <contact@emersion.fr> Reviewed-by:
Ilia Mirkin <imirkin@alum.mit.edu> Closes: drm/nouveau#36 Cc: mesa-stable Part-of: <!8500> (cherry picked from commit 6650c53e)
-