- Mar 26, 2019
-
-
Timothy Arceri authored
When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing extra regisiter pressure in some shaders, however that no longer seems to be an issue. Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS
-
Dave Airlie authored
This fixes piglit clearbuffer-mixed-format Reviewed-by:
Brian Paul <brianp@vmware.com>
-
Dave Airlie authored
This gets the basevertex from the draw depending on whether it's an indexed or non-indexed draw. We still fail a transform feedback test for vertex id, as the vertex id actually an index id, and isn't getting translated properly to a vertex id, suggestions on how/where to fix that welcome. Reviewed-by:
Brian Paul <brianp@vmware.com>
-
Nicolai Hähnle authored
This field was added in a recent addrlib update, and while there currently seems to be no issue with skipping it, we will have to set it correctly in the future. Reviewed-by:
Marek Olšák <marek.olsak@amd.com>
-
Bas Nieuwenhuizen authored
To preserve the invariant that nir ssa defs are integers or pointers in LLVM. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by:
Dave Airlie <airlied@redhat.com>
-
Kristian Høgsberg authored
Most cat5 instructions are constructed using ir3_SAM, which uses regs[1] for the (sampler, tex) src. Not DSX/DSY though, so we look up src1 and src2 differently for those two. Fixes: 1dffb089 ("freedreno/ir3: fix sam.s2en encoding") Signed-off-by:
Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by:
Rob Clark <robdclark@gmail.com>
-
Kristian Høgsberg authored
In 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") we started counting number of samplers based on the uniform vars instead of number of cat5 instructions. We used the number of samplers to determine whether to enable derivatives, but when we only use derivatives and no samplers, that now breaks. Track whether we need derivatives explicitly and use that to enable the state. Fixes: 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") Signed-off-by:
Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by:
Rob Clark <robdclark@gmail.com>
-
- Mar 25, 2019
-
-
Andre Heider authored
iris is thread safe, enable csmt for a ~5% performace boost. Signed-off-by:
Andre Heider <a.heider@gmail.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Axel Davy <davyaxel0@gmail.com>
-
Faith Ekstrand authored
-
Faith Ekstrand authored
We will need them for a new ACCESS_NON_UNIFORM flag that's about to be added in the next commit. Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Faith Ekstrand authored
On Intel, we have both bindless and bindful and we'd like to use them at the same time if we can so we need to be able to distinguish at the NIR level between the two. This also fixes nir_lower_tex to properly handle bindless in its tex_texture_size and get_texture_lod helpers. Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Fix the order of src0_alpha and sample mask in fb payload. From SKL PRM Volume 7, "Data Payload Register Order for Render Target Write Messages": Type S0A oM sZ oS M2 M3 M4 SIMD8 1 1 0 0 s0A oM R SIMD16 1 1 0 0 1/0s0A 3/2s0A oM It also fixes working of alpha to coverage with sample mask on GEN6 since now they are in correct order. Signed-off-by:
Danylo Piliaiev <danylo.piliaiev@globallogic.com> Signed-off-by:
Francisco Jerez <currojerez@riseup.net> Reviewed-by:
Francisco Jerez <currojerez@riseup.net>
-
From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) | 0x0808 * (m & 2) | 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <currojerez@riseup.net> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by:
Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by:
Francisco Jerez <currojerez@riseup.net>
-
Faith Ekstrand authored
Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Faith Ekstrand authored
Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Dave Airlie authored
If the geom shader emits a point size we failed to find it here, use the correct API to look it up. Fixes: tests/spec/glsl-1.50/execution/geometry/point-size-out.shader_test Reviewed-by:
Brian Paul <brianp@vmware.com>
-
Dave Airlie authored
With indirect rendering it's fine to set the instance count parameter to 0, and expect the rendering to be ignored. Fixes assert in KHR-GLES31.core.compute_shader.pipeline-gen-draw-commands on softpipe v2: return earlier before changing fpstate Reviewed-by:
Brian Paul <brianp@vmware.com>
-
Leo Liu authored
The wait here is unnecessary since we got a pool of back buffers, and the wait for swap buffer will happen before the present pixmap, at the same time the previous back buffer will be put back to pool for reuse after the check for PresentIdleNotify event Signed-off-by:
Leo Liu <leo.liu@amd.com> Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com>
-
v2 (Topi): - Make bit-size handling order be 16-bit, 32-bit, 64-bit - Clamp lower exponent range at -28 instead of -30. Reviewed-by:
Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
And enable it on Intel. v2: - Squash the change to enable it on Intel (Jason) Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
And enable it on Intel. v2: - Squash the change to enable this lowering on Intel (Jason) Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Brian Paul authored
When we destroy a context, we need to temporarily make that context the current one for the thread. That's because during context tear-down we make many calls to _mesa_reference_texobj(&texObj, NULL). Note there's no context parameter. If the texture's refcount goes to zero and we need to delete it, we use the thread's current context. But if that context isn't the context we're tearing down, we get into trouble when deallocating sampler views. See patch 593e36f9 ("st/mesa: implement "zombie" sampler views (v2)") for background information. Also, we need to release any sampler views attached to the fallback textures. Fixes a crash on exit with a glretrace of the Nobel Clinician application. v2: at end of st_destroy_context(), check if save_ctx == ctx and unbind the context if so. Reviewed-by:
Roland Scheidegger <sroland@vmware.com> Reviewed-by:
Neha Bhende <bhenden@vmware.com> Reviewed-by:
Jose Fonseca <jfonseca@vmware.com>
-
Brian Paul authored
Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
In Android O, MESA needs to statically link libexpat so that it's in same VNDK namespace. v2: apply change also to anv driver (Tapani) v3: use += in anv change (Eric Engestrom) Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33 Signed-off-by:
Kishore Kadiyala <kishore.kadiyala@intel.com> Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Samuel Iglesias Gonsálvez authored
If VK_QUERY_RESULT_WITH_AVAILABILY_BIT is set and VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not set, we need return to VK_NOT_READY only and set the availability status field for each query. From Vulkan spec: "If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not set then no result values are written to pData for queries that are in the unavailable state at the time of the call, and vkGetQueryPoolResults returns VK_NOT_READY. However, availability state is still written to pData for those queries if VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set." Signed-off-by:
Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com>
-
Samuel Iglesias Gonsálvez authored
If the query is not available and VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not set, the spec doesn't allow to modify its result. From Vulkan spec: "If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not set then no result values are written to pData for queries that are in the unavailable state at the time of the call, and vkGetQueryPoolResults returns VK_NOT_READY. However, availability state is still written to pData for those queries if VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set." v2: - Move VK_NOT_READY change to next patch (Samuel Pitoiset) Signed-off-by:
Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com>
-
Tapani Pälli authored
These enums match but compiler warns about implicit conversion. Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Dave Airlie <airlied@redhat.com>
-
Tapani Pälli authored
(warning: 'const' type qualifier on return type has no effect) Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Dave Airlie <airlied@redhat.com>
-
Dave Airlie authored
With vkpipelinedb Samuel discovered a regression since we stopped stripping types at the spir-v level. This adds a check to the var splitting for the case where it asserts the type hasn't changed, when it has just created a bare type, and it's different than the original type which has an explicit stride. This also removes a pointless assert that also triggers. Fixes: 3b3653c4 (nir/spirv: don't use bare types, remove assert in split vars for testing) Acked-by:
Jason Ekstrand <jason@jlekstrand.net>
-
- Mar 23, 2019
-
-
Caio Oliveira authored
Also handle GLSL_TYPE_INTERFACE the same way we do GLSL_TYPE_STRUCT in various places. Motivated by ARB_gl_spirv work, that will take advantage of the interface types when handling NIR coming from SPIR-V. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Caio Oliveira authored
Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Caio Oliveira authored
Also updates gl_spirv to pick the right one. At the moment nothing uses it, but upcoming functionality part of ARB_gl_spirv will use it, and we also later can be more assertful when handling certain features for each of the execution environments. Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com> Acked-by:
Karol Herbst <kherbst@redhat.com>
-
- Mar 22, 2019
-
-
Emma Anholt authored
The CTS requires a 565-no-depth-no-stencil (meaning d/s not-required, not not-present) config for ES 3.0, but at depth 24 of X11 we wouldn't do so. We can satisfy that bad requirement using a pbuffer-only visual with whatever other buffers the driver happens to have given us. I've tried to raise this as an absurd requirement with Khronos and made no progress. v2: Make sure it's single sample, no depth, no stencil. Comment typo fix Reviewed-by:
Adam Jackson <ajax@redhat.com>
-
Caio Oliveira authored
SPIR-V can produce those for SSBO and UBO access. Found when testing the ARB_gl_spirv series. Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com>
-
Rob Clark authored
Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
Report 320 for a6xx, which isn't *quite* true (no geom/tess, in particular), but other caps keep the reported GL and GLSL versions correct (3.1 / 3.10 es). But reporting 320 will switch on EXT_gpu_shader5, which is the goal. Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
For GLES2+ contexts, enable EXT_gpu_shader5 if the driver exposes a sufficiently high ESSL feature level, even if the GLSL feature level isn't high enough. This allows drivers to support EXT_gpu_shader5 in GLES contexts before they support all the additional features of ARB_gpu_shader5 in GL contexts. Signed-off-by:
Rob Clark <robdclark@gmail.com> Reviewed-by:
Ilia Mirkin <imirkin@alum.mit.edu>
-
Rob Clark authored
Adds a new cap to allow drivers to expose higher shading language versions in GLES contexts, to avoid having to report an artificially low version for the benefit of GL contexts. The motivation is to expose EXT_gpu_shader5 even though a driver may not support all the features needed for the corresponding GL extension (ARB_gpu_shader5). Signed-off-by:
Rob Clark <robdclark@gmail.com> Reviewed-by:
Ilia Mirkin <imirkin@alum.mit.edu>
-
Vinson Lee authored
Fix build error after llvm-9.0svn r352827 ("[opaque pointer types] Add a FunctionCallee wrapper type, and use it."). In file included from ./rasterizer/jitter/builder.h:158:0, from swr_shader.cpp:35: ./rasterizer/jitter/gen_builder_meta.hpp: In member function ‘llvm::Value* SwrJit::Builder::VGATHERPD(llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, const llvm: :Twine&)’: ./rasterizer/jitter/gen_builder_meta.hpp:51:117: error: no matching function for call to ‘cast(llvm::FunctionCallee)’ Function* pFunc = cast<Function>(JM()->mpCurrentModule->getOrInsertFunction("meta.intrinsic.VGATHERPD", pFuncTy)); ^ Suggested-by:
Philip Meulengracht <the_meulengracht@hotmail.com> Signed-off-by:
Vinson Lee <vlee@freedesktop.org> Reviewed-by:
Alok Hota <alok.hota@intel.com>
-
Dylan Baker authored
The previous patch tried to address a bug when DESTDIR is '', however, it introduces a bug when DESTDIR is not '', and fakeroot is used. This patch does fix that, and has been tested with the arch pkg-build to ensure it isn't regressed. Fixes: 093a1ade ("bin/install_megadrivers.py: Correctly handle DESTDIR=''") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110221 Reviewed-by:
Eric Engestrom <eric@engestrom.ch>
-