Commits · ext_amd_depth_clamp_sep · Sagar Ghuge / mesa

Aug 28, 2018

i965: enable AMD_depth_clamp_separate · 70810349

Sagar Ghuge authored Aug 21, 2018



Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

70810349

i965: add functional changes for AMD_depth_clamp_separate · c3c19e56

Sagar Ghuge authored Aug 21, 2018



Gen >= 9 have ability to control clamping of depth values separately at
near and far plane.

z_w is clamped to the range [min(n,f), 0] if clamping at near plane is
enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f)
max(n,f)] if clamping at both plane is enabled.

v2: 1) Use better coding style (Ian Romanick)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

c3c19e56

mesa: add EXTRA_EXT for AMD_depth_clamp_separate · a29bb879

Sagar Ghuge authored Jul 27, 2018



Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

a29bb879

mesa: add support for GL_AMD_depth_clamp_separate tokens · 6793765b

Sagar Ghuge authored Aug 21, 2018



_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two
tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD.

v2: Remove unnecessary parentheses (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

6793765b

mesa: Add support for AMD_depth_clamp_separate · 2772a867

Sagar Ghuge authored Jul 27, 2018



Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle
GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens.

Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it,
as suggested by Marek Olsak.

Driver that enables AMD_depth_clamp_separate will only ever look at
DepthClampNear and DepthClampFar, as suggested by Ian Romanick.

v2: 1) Remove unnecessary parentheses (Marek Olsak)
    2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE
       GL_DEPTH_CLAMP only (Marek Olsak)
    3) Clamp against near and far plane separately (Marek Olsak)
    4) Clip point separately for near and far Z clipping plane (Marek
       Olsak)

v3: Clamp raster position zw to the range [min(n,f), 0] for near plane
    and [0, max(n,f)] for far plane (Marek Olsak)

v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

2772a867

mesa: Add types for AMD_depth_clamp_separate. · d8b4890a

Sagar Ghuge authored Jul 26, 2018



Add some basic types and storage for the AMD_depth_clamp_separate
extension.

v2: 1) Drop unnecessary definition (Marek Olsak)
    2) Expose extension in compatibility profile (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

d8b4890a

glapi: define AMD_depth_clamp_separate · 6fad0c87

Sagar Ghuge authored Jul 27, 2018



Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

6fad0c87

meson: Actually load translation files · 7c00db95

Dylan Baker authored Aug 24, 2018



Currently we run the script but don't actually load any files, even in a
tarball where they exist.

Fixes: 3218056e
       ("meson: Build i965 and dri stack")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

7c00db95

nir: Remove outdated comment · f172a77d
Caio Oliveira authored Aug 27, 2018
```
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
```
f172a77d

i965: Add INTEL_fragment_shader_ordering support. · 03ecec9e

Kevin Rogovin authored Aug 27, 2018 and

Plamena Manolova committed Aug 28, 2018



Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>

03ecec9e

mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering · 119435c8

Kevin Rogovin authored Aug 27, 2018 and

Plamena Manolova committed Aug 28, 2018



This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().

One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>

119435c8

i965/gen6/xfb: handle case where transform feedback is not active · 1b0df8a4

Andrii Simiklit authored Aug 15, 2018

When the SVBI Payload Enable is false I guess the register R1.4
which contains the Maximum Streamed Vertex Buffer Index is filled by zero
and GS stops to write transform feedback when the transform feedback
is not active.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579


Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

1b0df8a4

docs: add forgotten features to 18.2.0 release notes · 743e11c1

Rhys Perry authored Aug 21, 2018



Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewied-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 18.2: <mesa-stable@lists.freedesktop.org>

743e11c1

virgl: add debug-switch to output TGSI · a4e60ccb

Erik Faye-Lund authored Aug 20, 2018



This is quite useful for debugging shader-transpiling issues in
virglrenderer.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

a4e60ccb

virgl: introduce $VIRGL_DEBUG=verbose · 4ab06cc5

Erik Faye-Lund authored Aug 20, 2018

This adds an environment-varaible that can be used for driver-specific
flags, as well as a flag for it to enable verbose output.

While we're at it, quiet some overly chatty debug-output by default.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

4ab06cc5

virgl: replace fprintf-call with debug_printf · 1b2444df

Erik Faye-Lund authored Aug 20, 2018

This is the only direct call-site for fprintf in virgl; all other
call-sites call debug_printf instead. So let's follow in style here.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

1b2444df

virgl: delete commented out fprintf-call · 2ebfa90a

Erik Faye-Lund authored Aug 20, 2018



This is just debug-cruft left over. Let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>

2ebfa90a

Aug 27, 2018

meson: Don't enable any vulkan drivers on arm, aarch64 · 9de34b4d

Guido Günther authored Aug 26, 2018 and

Dylan Baker committed Aug 27, 2018



There's no Vulkan support for arm atm.

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

9de34b4d

meson: Be a bit more helpful when arch or OS is unknown · 05e2fc68
Guido Günther authored Aug 26, 2018 and Dylan Baker committed Aug 27, 2018
```
V2: Add one missing @0@

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
```
05e2fc68

intel/eu: print bytes instead of 32 bit hex value · a1e3305f

Sagar Ghuge authored Aug 27, 2018 and

Matt Turner committed Aug 27, 2018



INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.

v2: Print blank spaces in order to vertically align output of compacted
    instructions hex value with uncompacted instructions hex value.
    (Matt Turner)

v3: Fix line wrap at correct length

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

a1e3305f

intel: decoder: handle 0 sized structs · 440a988b

Lionel Landwerlin authored Aug 25, 2018



Gen7.5 has a BLEND_STATE of size 0 which includes a variable length
group. We did not deal with that very well, leading to an endless
loop.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544


Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

440a988b

nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+ · e56e600b

Rhys Perry authored Aug 03, 2018



Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.

total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs    : 669901 -> 669373 (-0.08%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21064 (-0.02%)

                local     shared        gpr       inst      bytes
    helped           1           0         152         274         274
      hurt           0           0           0           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

e56e600b

nv50/ir: optimize multiplication by 16-bit immediates into two xmads · d27c7918

Rhys Perry authored Aug 18, 2018

Rather than the usual three that would be created.

total instructions in shared programs : 5796385 -> 5786560 (-0.17%)
total gprs used in shared programs : 670103 -> 669968 (-0.02%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21068 (-0.45%)

local shared gpr inst bytes
helped 1 0 64 1040 1040
hurt 0 0 27 0 0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

d27c7918

nv50/ir: optimize near power-of-twos into shladd · 400a4eb9

Rhys Perry authored Aug 18, 2018

total instructions in shared programs : 5819319 -> 5796385 (-0.39%)
total gprs used in shared programs : 670571 -> 670103 (-0.07%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21164 (0.00%)

local shared gpr inst bytes
helped 0 0 318 1758 1758
hurt 0 0 63 0 0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

400a4eb9

nv50/ir: move a * b -> a << log2(b) code into createMul() · 2f52925f

Rhys Perry authored Jun 13, 2018

With this commit, OP_MAD is handled on nv50 too. This commit is also
useful for later commits.

Also, instead of creating a shladd, it relies on LateAlgebraicOpt to
create one. This simplifies the code and helps shader-db slightly overall.

total instructions in shared programs : 5820882 -> 5819319 (-0.03%)
total gprs used in shared programs : 670595 -> 670571 (-0.00%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21164 (0.00%)

local shared gpr inst bytes
helped 0 0 18 230 230
hurt 0 0 8 263 263

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

2f52925f

nv50/ir: optimize imul/imad to xmads · b60bc7a4

Rhys Perry authored Jun 13, 2018

This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.

total instructions in shared programs : 5768871 -> 5820882 (0.90%)
total gprs used in shared programs : 669919 -> 670595 (0.10%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21164 (0.46%)

local shared gpr inst bytes
helped 0 0 38 0 0
hurt 1 0 365 3076 3076

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

b60bc7a4

gm107/ir: add support for OP_XMAD on GM107+ · bcbcdf84

Rhys Perry authored Jun 13, 2018



v4: make the immediate field 16 bits
v5: don't ever emit h1 flags for immediates

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

bcbcdf84

nv50/ir: add preliminary support for OP_XMAD · 5d6952d2

Rhys Perry authored Jun 13, 2018



v4: remove uint16_t(...)
v4: don't allow immediates outside [0,65535] in insnCanLoad()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

5d6952d2

glsl/linker: Allow unused in blocks which are not declated on previous stage · 4a8444d5

Vadym Shovkoplias authored Aug 23, 2018 and

Alejandro Piñeiro committed Aug 27, 2018

>From Section 4.3.4 (Inputs) of the GLSL 1.50 spec:

    "Only the input variables that are actually read need to be written
     by the previous stage; it is allowed to have superfluous
     declarations of input variables."

Fixes:
    * interstage-multiple-shader-objects.shader_test

v2:
  Update comment in ir.h since the usage of "used" field
  has been extended.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101247


Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

4a8444d5

nir: Pull block_ends_in_jump into nir.h · 07a227f5

Faith Ekstrand authored Aug 24, 2018



We had two different implementations in different files.  May as well
have one and put it in nir.h.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

07a227f5

anv: Add support for protected memory properties on anv_GetPhysicalDeviceProperties2() · 59a8e0db

Samuel Iglesias Gonsálvez authored Aug 24, 2018



VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1.

Fixes Vulkan CTS CL#2849.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

59a8e0db

Aug 25, 2018

intel/tools: Add 0x in front of a couple of hex values · aad501f1
Faith Ekstrand authored Aug 25, 2018
```
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
```
aad501f1

anv: Fill holes in the VF VUE to zero · 76b0e4d8

Faith Ekstrand authored Aug 25, 2018

This fixes a GPU hang in DOOM 2016 running under wine.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104809


Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

76b0e4d8

intel: tools: Fix aubinator_error's fprintf call (format-security) · b2313ef4

Kai Wasserbäch authored Aug 25, 2018 and

Lionel Landwerlin committed Aug 25, 2018



The recent commit 4616639b introduced
the new function aubinator_error() which is a trivial wrapper around
fprintf() to STDERR. The call to fprintf() however is passed the message
msg directly:
  fprintf(stderr, msg);

This is a format-security violation and leads to an FTBFS with
-Werror=format-security (GCC 8):
  ../../../src/intel/tools/aubinator.c: In function 'aubinator_error':
  ../../../src/intel/tools/aubinator.c:74:4: error: format not a string literal and no format arguments [-Werror=format-security]
      fprintf(stderr, msg);
      ^~~~~~~

This patch fixes this trivially by introducing a catch-all "%s" format
argument.

Fixes: 4616639b ("intel: tools: split aub parsing from aubinator")
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

b2313ef4

intel/batch_decoder: Print blend states properly · 70de31d0
Faith Ekstrand authored Aug 24, 2018
```
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
```
70de31d0

intel/batch_decoder: Fix dynamic state printing · cbd4bc13

Faith Ekstrand authored Aug 24, 2018

Instead of printing addresses like everyone else, we were accidentally
printing the offset from state base address. Also, state_map is a void
pointer so we were incrementing in bytes instead of dwords and every
state other than the first was wrong.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

cbd4bc13

intel/decoder: Print ISL formats for vertex elements · d1971be6
Faith Ekstrand authored Aug 24, 2018
```
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
```
d1971be6

intel/decoder: Clean up field iteration and fix sub-dword fields · 2abd7ae1

Faith Ekstrand authored Aug 24, 2018

First of all, setting iter->name in advance_field is unnecessary because
it gets set by gen_decode_field which gets called immediately after
gen_decode_field in the one call-site. Second, we weren't properly
initializing start_bit and end_bit in the initial condition of
gen_field_iterator_next so the first field of a struct would get printed
wrong if it doesn't start on the first bit. This is fixed by adding a
iter_start_field helper which sets the field and also sets up the other
bits we need. This fixes decoding of 3DSTATE_SBE_SWIZ.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

2abd7ae1

gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE. · 12816088

Kenneth Graunke authored Jun 23, 2018

Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not
PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER.

Drivers for such hardware would like to advertise support for
ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp.

This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit,
changes the extension enable to be based on that, and enables it
in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP
(so they continue supporting this mode).

12816088

Aug 24, 2018

intel: decoder: unify MI_BB_START field naming · f430a37f

Lionel Landwerlin authored Aug 14, 2018

The batch decoder looks for a field with a particular name to decide
whether an MI_BB_START leads into a second batch buffer level. Because
the names are different between Gen7.5/8 and the newer generation we
fail that test and keep on reading (invalid) instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

f430a37f

Admin message