Commits · wip/VK_EXT_vertex_attribute_divisor · Faith Ekstrand / mesa

Jul 03, 2018
- anv: Implement VK_EXT_vertex_attribute_divisor · 33fd46b2
  Faith Ekstrand authored Jul 02, 2018
```
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
```
  33fd46b2
- anv/pipeline: Add a per-VB instance divisor · 4a851498
  Faith Ekstrand authored Jul 02, 2018
```
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
```
  4a851498
- anv/pipeline: Use a per-VB struct instead of separate arrays · 9a38fcc5
  Faith Ekstrand authored Jul 02, 2018
```
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
```
  9a38fcc5
Jul 02, 2018

anv: Add support for the on-disk shader cache · afa8f589

Faith Ekstrand authored Jun 29, 2018

The Vulkan API provides a mechanism for applications to cache their own
shaders and manage on-disk pipeline caching themselves. Generally, this
is what I would recommend to application developers and I've resisted
implementing driver-side transparent caching in the Vulkan driver for a
long time. However, not all applications do this and, for some
use-cases, it's just not practical.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

afa8f589

anv/pipeline_cache: Add a _locked suffix to a function · e0f7a3aa
Faith Ekstrand authored Jun 29, 2018
```
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
```
e0f7a3aa
anv: Add device-level helpers for searching for and uploading kernels · f5c38f4a
Faith Ekstrand authored Jun 29, 2018
```
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
```
f5c38f4a

anv/pipeline: Stop optimizing for not having a cache · eae192bf

Faith Ekstrand authored Jun 29, 2018

Before, we were only hashing the shader if we had a shader cache to
cache things in. This means that if we ever get it wrong, we could end
up trying to cache a shader with an undefined hash. Since not having a
shader cache is an extremely uncommon case, let's optimize for code
clarity and obvious correctness over avoiding a hash operation.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

eae192bf

anv: Use a default pipeline cache if none is specified · 76fdc8a8

Faith Ekstrand authored Jun 29, 2018



If a client is dumb enough to not specify a pipeline cache, give it a
default.  We have to create one anyway for blorp so we may as well let
the client cache shaders in it.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

76fdc8a8

anv: Be more careful about hashing pipeline layouts · d1c778b3

Faith Ekstrand authored Jun 29, 2018



Previously, we just hashed the entire descriptor set layout verbatim.
This meant that a bunch of extra stuff such as pointers and reference
counts made its way into the cache.  It also meant that we weren't
properly hashing in the Y'CbCr conversion information information from
bound immutable samplers.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

d1c778b3

anv,intel: Enable nir_opt_large_constants for Vulkan · 06412bfc

Faith Ekstrand authored Jun 28, 2018

According to RenderDoc, this shaves 99.6% of the run time off of the
ambient occlusion pass in Skyrim Special Edition when running under DXVK
and shaves 92% off the runtime for a reasonably representative frame.
When running the actual game, Skyrim goes from being a slide-show to a
very stable and playable framerate on my SKL GT4e machine.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

06412bfc

anv: Add state setup support for shader constants · 70ce8804

Faith Ekstrand authored Jun 28, 2018



Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

70ce8804

anv: Add support for shader constant data to the pipeline cache · 3a5ed18c

Faith Ekstrand authored Jun 28, 2018



Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

3a5ed18c

nir: Add a large constants optimization pass · 12358505

Faith Ekstrand authored Jun 28, 2018



This pass searches for reasonably large local variables which can be
statically proven to be constant and moves them into shader constant
data.  This is especially useful when large tables are baked into the
shader source code because they can be moved into a UBO by the driver to
reduce register pressure and make indirect access cheaper.

v2 (Jason Ekstrand):
 - Use a size/align function to ensure we get the right alignments
 - Use the newly added deref offset helpers

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

12358505

nir: Add a concept of constant data associated with a shader · c90f221e

Faith Ekstrand authored Jun 28, 2018



This commit adds a concept to NIR of having a blob of constant data
associated with a shader.  Instead of being a UBO or uniform that can be
manipulated by the client, this constant data considered part of the
shader and remains constant across all invocations of the given shader
until the end of time.  To access this constant data from the shader, we
add a new load_constant intrinsic.  The intention is that drivers will
eventually lower load_constant intrinsics to load_ubo, load_uniform, or
something similar.  Constant data will be used by the optimization pass
in the next commit but this concept may also be useful for OpenCL.

v2 (Jason Ekstrand):
 - Rename num_constants to constant_data_size (anholt)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

c90f221e

nir/deref: Add helpers for getting offsets · e8e159e9

Faith Ekstrand authored Jun 29, 2018



These are very similar to the related function in nir_lower_io except
that they don't handle per-vertex or packed things (that could be added,
in theory) and they take a more detailed size/align function pointer.
One day, we should consider switching nir_lower_io over to using the
more detailed size/align functions and then we could make it use these
helpers instead of having its own.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

e8e159e9

nir/types: Add a natural size and alignment helper · 2bf8be99

Faith Ekstrand authored Jun 29, 2018



The size and alignment are "natural" in the sense that everything is
aligned to a scalar.  This is a bit tighter than std430 where vec3s are
required to be aligned to a vec4.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

2bf8be99

nir: Add a deref_instr_has_indirect helper · 893fc2d0

Faith Ekstrand authored Jun 28, 2018



Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

893fc2d0

util/macros: Import ALIGN_POT from ralloc.c · 70b16963

Faith Ekstrand authored Jun 29, 2018



v2 (Jason Ekstrand):
 - Rename y to pot_align (Brian)
 - Also use ALIGN_POT in build_id.c and slab.c (Brian)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

70b16963

v3d: Claim PIPE_CAP_TGSI_CAN_READ_OUTPUTS. · 4819da23

Emma Anholt authored Jul 02, 2018

Fixes warning at screen creation. We store our outputs in normal temps
and just emit them to shader I/O at the end, due to our I/O ordering
requirements, so reading "outputs" in NIR is fine.

4819da23

ac: move all LLVM module initialization into ac_create_module · 32e413ca
Marek Olšák authored Jun 30, 2018
```
This removes some ugly code around module initialization.

Reviewed-by: Dave Airlie <airlied@redhat.com>
```
32e413ca

v3d: Emit a TF flush after each draw using TF. · 49f7631c

Emma Anholt authored Jun 25, 2018

This fixes GPU hangs on 7278 in transform feedback tests such as
GTF-GLES3.gtf.GL3Tests.transform_feedback2.transform_feedback2_basic

49f7631c

nv50/ir: handle clipvertex for geom and tess shaders as well · c7726fbf

Karol Herbst authored Jun 30, 2018



this will be needed for compatibility profiles

v2: handle tess shaders

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>

c7726fbf

gallium/u_vbuf: drop min/max-scanning for empty indirect draws · 4c877057

Erik Faye-Lund authored Jun 25, 2018



When building with asserts enabled, we'll end up triggering an assert
in pipe_buffer_map_range down this code-path, due to trying to map
an empty range. Even if we avoid that, we'll trigger another assert
a bit later, because u_vbuf_get_minmax_index returns a min-index of
-1 here, which gets promoted to an unsigned value, and gives us an
out-of-bounds buffer-mapping offset.

Since we can't really have a well-defined min/max range here when
the range is empty anyway, we should just drop this dance in the
first place. After all, no rendering is going to be produced.

This fixes a crash in dEQP-GLES31.functional.draw_indirect.random.0
on VirGL for me.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

4c877057

radv: reset the image's predicate after a color decompression pass · 02db2363

Samuel Pitoiset authored Apr 18, 2018



After performing a fast-clear eliminate, a FMASK decompress,
or a DCC decompress, we can reset the predicate to FALSE.

With that, the GPU should be able to skip unnecessary color
decompression passes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

02db2363

radv: enable/disable predication for the DCC decompression pass · ff7daadc

Samuel Pitoiset authored Apr 18, 2018



Performing a DCC decompression pass is currently pretty rare,
but using predication allows the GPU to skip unnecessary passes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

ff7daadc

radv: add padding for the UMR disassembler · 939e5a38

Samuel Pitoiset authored Jun 27, 2018



Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

939e5a38

virgl: Add support for glGetMultisample · 91f48cdf

Gert Wollny authored Jun 29, 2018 and

Gert Wollny committed Jul 02, 2018



Use caps to obtain the multisample sample positions for up to 16
positions and implement the according Gallium interface.

This implemenation (plus its counterpart in virglrenderer) assume that
the fixed sample position are always the same for a given number of samples
over the whole live time of a qemu session. It also assumes that sample
series are only given for 2, 4, 8, and 16 samples, and for intermediate
numbers N of samples the next higher supported set from above list is picked
and the sample positions for the first N samples are returned accordingly.

Fixes (when run on GL host):
    dEQP-GLES31.functional.texture.multisample.samples_1.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_2.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_3.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_4.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_8.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_10.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_12.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_13.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_16.sample_position

v2: remove unrelated chunk (thanks Ilia Mirkin)
v3: - also return positions for intermediate sample counts
    - fix unused varible warning
    - update description
v4: explain better what this patch assumes and how it handles sample numbers
    that are not directly advertised (thanks go to Erik Faye-Lund for making
    me aware that this should be documented)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>

91f48cdf

st/mesa: Also check for PIPE_FORMAT_A8R8G8B8_SRGB for texture_sRGB · ba78e78c

Tomeu Vizoso authored Jun 22, 2018 and

Gert Wollny committed Jul 02, 2018



and PIPE_FORMAT_R8G8B8A8_SRGB, as well.

The reason for this is that when Virgl runs with GLES on the host, it
cannot directly upload textures in BGRA.

So to avoid a conversion step, consider the RGB sRGB formats as well for
this extension.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ba78e78c

st/mesa: Fall back to R8G8B8A8_SRGB for ETC2 · 71867a0a

Tomeu Vizoso authored Jun 22, 2018 and

Gert Wollny committed Jul 02, 2018



If the driver doesn't support PIPE_FORMAT_B8G8R8A8_SRGB, fall back to
PIPE_FORMAT_R8G8B8A8_SRGB.

Drivers such as Virgl will have a hard time supporting
PIPE_FORMAT_B8G8R8A8_SRGB when the host runs GLES, as GL_BGRA isn't as
well suported there.

So go with PIPE_FORMAT_R8G8B8A8_SRGB so these drivers can avoid a
conversion copy.

v2: Fix typo in commit message

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

71867a0a

st/mesa/i965: Allow decompressing ETC2 to GL_RGBA · e5604ef7

Tomeu Vizoso authored Jun 22, 2018 and

Gert Wollny committed Jul 02, 2018



When Mesa itself implements ETC2 decompression, it currently
decompresses to formats in the GL_BGRA component order.

That can be problematic for drivers which cannot upload the texture data
as GL_BGRA, such as Virgl when it's backed by GLES on the host.

So this commit adds a flag to _mesa_unpack_etc2_format so callers can
specify the optimal component order.

In Gallium's case, it will be requested if the format isn't in
PIPE_FORMAT_B8G8R8A8_SRGB format.

For i965, it will remain GL_BGRA, as before.

v2: * Remove unnecesary include (Emil Velikov)

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

e5604ef7

anv/cmd_buffer: make descriptors dirty when emitting base state address · 1b548246

Iago Toral authored Jun 28, 2018



Every time we emit a new state base address we will need to re-emit our
binding tables, since they might have been emitted with a different base
state adress.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>

1b548246

anv/cmd_buffer: clean dirty push constants flag after emitting push constants · 6a1d8350

Iago Toral authored Jun 28, 2018



Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>

6a1d8350

anv/cmd_buffer: never shrink the push constant buffer size · 198a7222

Iago Toral authored Jun 28, 2018



If we have to re-emit push constant data, we need to re-emit all
of it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>

198a7222

Jul 01, 2018

gallium/llvmpipe: Enable support bptc format. · 2854c0f7

Denis Pauk authored Jun 26, 2018



v2: none
v3: none

Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Rhys Perry <pendingchaos02@gmail.com>
CC: Matt Turner <mattst88@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

2854c0f7

gallium/softpipe: Enable support bptc format. · 530130e7

Denis Pauk authored Jun 26, 2018



v2: none
v3: none

Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

530130e7

gallium/auxiliary: Add helper support for bptc format compress/decompress · f69bc797

Denis Pauk authored Jun 26, 2018



Reuse code shared with mesa/main/texcompress_bptc.

v2: Use block decompress function
v3: Include static bptc code from texcompress_bptc_tmp.h
Suggested-by: Marek Olšák <maraeo@gmail.com>

Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

f69bc797

mesa: add header for share bptc decompress functions · bf4871f9

Denis Pauk authored Jun 26, 2018



Move shared bptc functions to texcompress_bptc_tmp.h:
* fetch_rgba_unorm_from_block
* fetch_rgb_float_from_block
* compress_rgba_unorm
* compress_rgb_float

Create decompress functions:
* decompress_rgba_unorm
* decompress_rgb_float

Functions will be reused in gallium/auxiliary code.

v2: Add block decompress function
v3: Move all shared code to header
Suggested-by: Marek Olšák <maraeo@gmail.com>

Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

bf4871f9

Jun 30, 2018

glsl/cache: save and restore ExternalSamplersUsed · 99c6cae2

Marek Olšák authored Jun 30, 2018



Shaders that need special code for external samplers were broken if
they were loaded from the cache.

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

99c6cae2

nir: fix selection of loop terminator when two or more have the same limit · 463f8490

Timothy Arceri authored Jun 04, 2018



We need to add loop terminators to the list in the order we come
across them otherwise if two or more have the same exit condition
we will select that last one rather than the first one even though
its unreachable.

This fix is for simple unrolls where we only have a single exit
point. When unrolling these type of loops the unreachable
terminators and their unreachable branch are removed prior to
unrolling. Because of the logic change we also switch some
list access in the complex unrolling logic to avoid breakage.

Fixes: 6772a17a ("nir: Add a loop analysis pass")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

463f8490

Jun 29, 2018
- radeonsi: enable OpenGL 4.4 compat profile · 18293be6
  Timothy Arceri authored Jun 25, 2018
```
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
```
  18293be6

Admin message