Commits · continue_opt · Timothy Arceri / mesa

Mar 26, 2019

nir: remove restrictions on opt_if_loop_last_continue() · 38c04e1f

Timothy Arceri authored 5 years ago

When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing extra regisiter pressure
in some shaders, however that no longer seems to be an issue.

Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

38c04e1f

softpipe: fix clears to only clear specified color buffers. · e77013fb
Dave Airlie authored 5 years ago
```
This fixes piglit clearbuffer-mixed-format

Reviewed-by: Brian Paul <brianp@vmware.com>
```
e77013fb

draw/vs: partly fix basevertex/vertex id · 7f7c9425

Dave Airlie authored 5 years ago


This gets the basevertex from the draw depending on whether
it's an indexed or non-indexed draw.

We still fail a transform feedback test for vertex id, as
the vertex id actually an index id, and isn't getting translated
properly to a vertex id, suggestions on how/where to fix that welcome.

Reviewed-by: Brian Paul <brianp@vmware.com>

7f7c9425

amd/surface: provide firstMipIdInTail for metadata surface calculations · e16ac33f

Nicolai Hähnle authored 6 years ago


This field was added in a recent addrlib update, and while there
currently seems to be no issue with skipping it, we will have to
set it correctly in the future.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

e16ac33f

ac/nir: Return frag_coord as integer. · 82075e3c

Bas Nieuwenhuizen authored 5 years ago


To preserve the invariant that nir ssa defs are integers or pointers
in LLVM.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

82075e3c

freedreno/ir3: Fix operand order for DSX/DSY · c7c43273

Kristian Høgsberg authored 5 years ago


Most cat5 instructions are constructed using ir3_SAM, which uses
regs[1] for the (sampler, tex) src. Not DSX/DSY though, so we look up
src1 and src2 differently for those two.

Fixes: 1dffb089 ("freedreno/ir3: fix sam.s2en encoding")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>

c7c43273

freedreno/ir3: Track whether shader needs derivatives · a752422b

Kristian Høgsberg authored 5 years ago

In 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") we
started counting number of samplers based on the uniform vars instead
of number of cat5 instructions. We used the number of samplers to
determine whether to enable derivatives, but when we only use
derivatives and no samplers, that now breaks. Track whether we need
derivatives explicitly and use that to enable the state.

Fixes: 1088b788 ("freedreno/ir3: find # of samplers from uniform vars")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>

a752422b

Mar 25, 2019

st/nine: enable csmt per default on iris · 12f11e6f

Andre Heider authored 5 years ago


iris is thread safe, enable csmt for a ~5% performace boost.

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>

12f11e6f

spirv: Handle the NonUniformEXT decoration · 8ed583fe
Faith Ekstrand authored 5 years ago

8ed583fe

nir: Add access flags to deref and SSBO atomics · e50ab2c0

Faith Ekstrand authored 5 years ago


We will need them for a new ACCESS_NON_UNIFORM flag that's about to be
added in the next commit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

e50ab2c0

nir: Add texture sources and intrinsics for bindless · 40074ebf

Faith Ekstrand authored 6 years ago

On Intel, we have both bindless and bindful and we'd like to use them at
the same time if we can so we need to be able to distinguish at the NIR
level between the two. This also fixes nir_lower_tex to properly handle
bindless in its tex_texture_size and get_texture_lod helpers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

40074ebf

intel/fs: Make alpha test work with MRT and sample mask · e0db0c74

Danylo Piliaiev authored 5 years ago and

Francisco Jerez committed 5 years ago


Fix the order of src0_alpha and sample mask in fb payload.
From SKL PRM Volume 7, "Data Payload Register Order
for Render Target Write Messages":
 Type   S0A  oM  sZ  oS  M2     M3       M4
 SIMD8   1   1   0   0   s0A    oM       R
 SIMD16  1   1   0   0   1/0s0A 3/2s0A   oM

It also fixes working of alpha to coverage with sample mask
on GEN6 since now they are in correct order.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

e0db0c74

i965,iris,anv: Make alpha to coverage work with sample mask · c8abe03f

Danylo Piliaiev authored 6 years ago and

Francisco Jerez committed 5 years ago

From "Alpha Coverage" section of SKL PRM Volume 7:
 "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
  hardware, regardless of the state setting for this feature."

From OpenGL spec 4.6, "15.2 Shader Execution":
 "The built-in integer array gl_SampleMask can be used to change
 the sample coverage for a fragment from within the shader."

From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
 "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
  is generated where each bit is determined by the alpha value at the
  corresponding sample location. The temporary coverage value is then
  ANDed with the fragment coverage value to generate a new fragment
  coverage value."

Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"

Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.

The following formula is used to compute final sample mask:
  m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
  dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
     0x0808 * (m & 2) | 0x0100 * (m & 1)
  sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <currojerez@riseup.net> for creating it.

It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.

GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743



Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

c8abe03f

nir: Add a lowering pass for non-uniform resource access · 3bd54576
Faith Ekstrand authored 5 years ago
```
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
```
3bd54576
nir/lower_io: Add a bounds-checked 64-bit global address format · 39da1deb
Faith Ekstrand authored 6 years ago
```
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
```
39da1deb

draw/gs: fix point size outputs from geometry shader. · 551950ca

Dave Airlie authored 5 years ago


If the geom shader emits a point size we failed to find it here,
use the correct API to look it up.

Fixes:
tests/spec/glsl-1.50/execution/geometry/point-size-out.shader_test

Reviewed-by: Brian Paul <brianp@vmware.com>

551950ca

draw: bail instead of assert on instance count (v2) · d3836510

Dave Airlie authored 5 years ago


With indirect rendering it's fine to set the instance count
parameter to 0, and expect the rendering to be ignored.

Fixes assert in KHR-GLES31.core.compute_shader.pipeline-gen-draw-commands
on softpipe

v2: return earlier before changing fpstate

Reviewed-by: Brian Paul <brianp@vmware.com>

d3836510

vl/dri3: remove the wait before getting back buffer · 382401aa

Leo Liu authored 5 years ago


The wait here is unnecessary since we got a pool of back buffers,
and the wait for swap buffer will happen before the present pixmap,
at the same time the previous back buffer will be put back to pool
for reuse after the check for PresentIdleNotify event

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

382401aa

compiler/nir: add lowering for 16-bit ldexp · 763c8aab

Iago Toral authored 6 years ago and

Juan A. Suárez committed 5 years ago


v2 (Topi):
 - Make bit-size handling order be 16-bit, 32-bit, 64-bit
 - Clamp lower exponent range at -28 instead of -30.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

763c8aab

compiler/nir: add lowering for 16-bit flrp · 37663349

Iago Toral authored 6 years ago and

Juan A. Suárez committed 5 years ago


And enable it on Intel.

v2:
 - Squash the change to enable it on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

37663349

compiler/nir: add lowering option for 16-bit fmod · ca31df6f

Iago Toral authored 6 years ago and

Juan A. Suárez committed 5 years ago


And enable it on Intel.

v2:
 - Squash the change to enable this lowering on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

ca31df6f

st/mesa: fix texture deletion context mix-up issues (v2) · 08d97aad

Brian Paul authored 5 years ago


When we destroy a context, we need to temporarily make that context
the current one for the thread.

That's because during context tear-down we make many calls to
_mesa_reference_texobj(&texObj, NULL).  Note there's no context
parameter.  If the texture's refcount goes to zero and we need to
delete it, we use the thread's current context.  But if that context
isn't the context we're tearing down, we get into trouble when
deallocating sampler views.  See patch 593e36f9 ("st/mesa:
implement "zombie" sampler views (v2)") for background information.

Also, we need to release any sampler views attached to the fallback
textures.

Fixes a crash on exit with a glretrace of the Nobel Clinician
application.

v2: at end of st_destroy_context(), check if save_ctx == ctx and
unbind the context if so.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

08d97aad

nir: fix a few signed/unsigned comparison warnings · d13167cd
Brian Paul authored 5 years ago
```
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
```
d13167cd

android: static link with libexpat with Android O+ · e1d80571

Kishore409 authored 5 years ago and

Tapani Pälli committed 5 years ago


In Android O, MESA needs to statically link libexpat so that
it's in same VNDK namespace.

v2: apply change also to anv driver (Tapani)
v3: use += in anv change (Eric Engestrom)

Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33
Signed-off-by: Kishore Kadiyala <kishore.kadiyala@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

e1d80571

radv: write availability status vkGetQueryPoolResults() when the data is not available · 01cf3900

Samuel Iglesias Gonsálvez authored 5 years ago


If VK_QUERY_RESULT_WITH_AVAILABILY_BIT is set and
VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not
set, we need return to VK_NOT_READY only and set the availability
status field for each query.

From Vulkan spec:

"If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both
not set then no result values are written to pData for queries that are
in the unavailable state at the time of the call, and
vkGetQueryPoolResults returns VK_NOT_READY. However, availability state
is still written to pData for those queries if
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set."

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

01cf3900

radv: don't overwrite results in VkGetQueryPoolResults() when queries are not available · cb3ea50e

Samuel Iglesias Gonsálvez authored 5 years ago


If the query is not available and VK_QUERY_RESULT_WAIT_BIT and
VK_QUERY_RESULT_PARTIAL_BIT are both not set, the spec doesn't
allow to modify its result.

From Vulkan spec:

"If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are
both not set then no result values are written to pData for queries
that are in the unavailable state at the time of the call, and
vkGetQueryPoolResults returns VK_NOT_READY. However, availability state
is still written to pData for those queries
if VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set."

v2:
- Move VK_NOT_READY change to next patch (Samuel Pitoiset)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

cb3ea50e

st/mesa: fix warnings about implicit conversion on enumeration type · 2c240a52

Tapani Pälli authored 5 years ago


These enums match but compiler warns about implicit conversion.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

2c240a52

st/mesa: fix compilation warning on storage_flags_to_buffer_flags · ec123164

Tapani Pälli authored 5 years ago


(warning: 'const' type qualifier on return type has no effect)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

ec123164

nir/split_vars: fixup some more explicit_stride related issues. · 9417793f

Dave Airlie authored 5 years ago


With vkpipelinedb Samuel discovered a regression since we stopped
stripping types at the spir-v level.

This adds a check to the var splitting for the case where it
asserts the type hasn't changed, when it has just created a bare
type, and it's different than the original type which has an explicit
stride.

This also removes a pointless assert that also triggers.

Fixes: 3b3653c4 (nir/spirv: don't use bare types, remove assert in split vars for testing)

Acked-by: Jason Ekstrand <jason@jlekstrand.net>

9417793f

Mar 23, 2019

spirv: Use interface type for block and buffer block · 9d0ae777

Caio Oliveira authored 5 years ago

Also handle GLSL_TYPE_INTERFACE the same way we do GLSL_TYPE_STRUCT in
various places. Motivated by ARB_gl_spirv work, that will take
advantage of the interface types when handling NIR coming from SPIR-V.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

9d0ae777

intel/compiler: handle GLSL_TYPE_INTERFACE as GLSL_TYPE_STRUCT · fb024f5e
Caio Oliveira authored 5 years ago
```
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
```
fb024f5e

spirv: Add an execution environment to the options · 15012077

Caio Oliveira authored 5 years ago


Also updates gl_spirv to pick the right one.  At the moment nothing
uses it, but upcoming functionality part of ARB_gl_spirv will use it,
and we also later can be more assertful when handling certain features
for each of the execution environments.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Karol Herbst <kherbst@redhat.com>

15012077

Mar 22, 2019

egl: Add a 565 pbuffer-only EGL config under X11. · dacb11a5

Emma Anholt authored 6 years ago


The CTS requires a 565-no-depth-no-stencil (meaning d/s not-required, not
not-present) config for ES 3.0, but at depth 24 of X11 we wouldn't do so.
We can satisfy that bad requirement using a pbuffer-only visual with
whatever other buffers the driver happens to have given us.

I've tried to raise this as an absurd requirement with Khronos and made no
progress.

v2: Make sure it's single sample, no depth, no stencil.  Comment typo fix

Reviewed-by: Adam Jackson <ajax@redhat.com>

dacb11a5

nir: Handle array-deref-of-vector case in loop analysis · e5830e11

Caio Oliveira authored 5 years ago


SPIR-V can produce those for SSBO and UBO access.  Found when testing
the ARB_gl_spirv series.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

e5830e11

docs: update freedreno status · cdd90a75
Rob Clark authored 5 years ago
```
Signed-off-by: Rob Clark <robdclark@gmail.com>
```
cdd90a75

freedreno: add ESSL cap · 6fd5a7ff

Rob Clark authored 5 years ago


Report 320 for a6xx, which isn't *quite* true (no geom/tess, in
particular), but other caps keep the reported GL and GLSL versions
correct (3.1 / 3.10 es).  But reporting 320 will switch on
EXT_gpu_shader5, which is the goal.

Signed-off-by: Rob Clark <robdclark@gmail.com>

6fd5a7ff

mesa/st: use ESSL cap top enable gpu_shader5 · 6cd98760

Rob Clark authored 5 years ago


For GLES2+ contexts, enable EXT_gpu_shader5 if the driver exposes a
sufficiently high ESSL feature level, even if the GLSL feature level
isn't high enough.

This allows drivers to support EXT_gpu_shader5 in GLES contexts before
they support all the additional features of ARB_gpu_shader5 in GL
contexts.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

6cd98760

gallium: add PIPE_CAP_ESSL_FEATURE_LEVEL · de481947

Rob Clark authored 5 years ago


Adds a new cap to allow drivers to expose higher shading language
versions in GLES contexts, to avoid having to report an artificially
low version for the benefit of GL contexts.

The motivation is to expose EXT_gpu_shader5 even though a driver may
not support all the features needed for the corresponding GL extension
(ARB_gpu_shader5).

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

de481947

swr: Fix build with llvm-9.0. · 93c81ca3

Vinson Lee authored 5 years ago


Fix build error after llvm-9.0svn r352827 ("[opaque pointer types] Add a
FunctionCallee wrapper type, and use it.").

In file included from ./rasterizer/jitter/builder.h:158:0,
                 from swr_shader.cpp:35:
./rasterizer/jitter/gen_builder_meta.hpp: In member function ‘llvm::Value* SwrJit::Builder::VGATHERPD(llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, const llvm:
:Twine&)’:
./rasterizer/jitter/gen_builder_meta.hpp:51:117: error: no matching function for call to ‘cast(llvm::FunctionCallee)’
     Function* pFunc = cast<Function>(JM()->mpCurrentModule->getOrInsertFunction("meta.intrinsic.VGATHERPD", pFuncTy));
                                                                                                                     ^

Suggested-by: Philip Meulengracht <the_meulengracht@hotmail.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alok Hota <alok.hota@intel.com>

93c81ca3

bin/install_megadrivers.py: Fix regression for set DESTDIR · ed96038e

Dylan Baker authored 5 years ago

The previous patch tried to address a bug when DESTDIR is '', however,
it introduces a bug when DESTDIR is not '', and fakeroot is used. This
patch does fix that, and has been tested with the arch pkg-build to
ensure it isn't regressed.

Fixes: 093a1ade
       ("bin/install_megadrivers.py: Correctly handle DESTDIR=''")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110221


Reviewed-by: Eric Engestrom <eric@engestrom.ch>

ed96038e

Admin message

Admin message