Commits · nir-bitsize-validator-rewrite · Connor Abbott / mesa

Nov 29, 2018

nir/algebraic: Add unit tests for bitsize validation · db7f5623

Connor Abbott authored 6 years ago

The non-failure path can be tested by just compiling mesa and then
testing it, but the failure paths won't be hit unless you make a mistake,
so it's best to test them with some unit tests.

db7f5623

nir/algebraic: Rewrite bit-size inference · 7c4523fd

Connor Abbott authored 6 years ago

Before this commit, there were two copies of the algorithm: one in C,
that we would use to figure out what bit-size to give the replacement
expression, and one in Python, that emulated the C one and tried to
prove that the C algorithm would never fail to correctly assign
bit-sizes. That seemed pretty fragile, and likely to fall over if we
make any changes. Furthermore, the C code was really just recomputing
more-or-less the same thing as the Python code every time. Instead, we
can just store the results of the Python algorithm in the C
datastructure, and consult it to compute the bitsize of each value,
moving the "brains" entirely into Python. Since the Python algorithm no
longer has to match C, it's also a lot easier to change it to something
more closely approximating an actual type-inference algorithm. The
algorithm used is based on Hindley-Milner, although deliberately
weakened a little. It's a few more lines than the old one, judging by
the diffstat, but I think it's easier to verify that it's correct while
being as general as possible.

We could split this up into two changes, first making the C code use the
results of the Python code and then rewriting the Python algorithm, but
since the old algorithm never tracked which variable each equivalence
class, it would mean we'd have to add some non-trivial code which would
then get thrown away. I think it's better to see the final state all at
once, although I could also try splitting it up.

7c4523fd

Nov 20, 2018

ac/nir: fix intrinsic name string size in visit_image_atomic() · f4563d8f

Samuel Pitoiset authored 6 years ago


Fixes an assertion in SoTTR.

Fixes: dd0172e8 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

f4563d8f

Nov 19, 2018

radv: Use structured intrinsics instead of indexing workaround for GFX9. · dd0172e8

Bas Nieuwenhuizen authored 6 years ago


These force the index to be used in the instruction so we don't need the
workaround.

Totals:
SGPRS: 1321642 -> 1321802 (0.01 %)
VGPRS: 943664 -> 943788 (0.01 %)
Spilled SGPRs: 28468 -> 28480 (0.04 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 52415292 -> 52338932 (-0.15 %) bytes
LDS: 400 -> 400 (0.00 %) blocks
Max Waves: 233903 -> 233803 (-0.04 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 238344 -> 238504 (0.07 %)
VGPRS: 232732 -> 232856 (0.05 %)
Spilled SGPRs: 13125 -> 13137 (0.09 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 15752712 -> 15676352 (-0.48 %) bytes
LDS: 139 -> 139 (0.00 %) blocks
Max Waves: 31680 -> 31580 (-0.32 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

dd0172e8

i965: Allow only one slot of clip distances to be set on Gen4-5. · 09901686

Kenneth Graunke authored 6 years ago

The existing backend code assumed that if VARYING_SLOT_CLIP_DIST0
was written, then VARYING_SLOT_CLIP_DIST1 would be as well. That's
true with the current lowering, but not necessary if there are 4 or
fewer clip distances. Separate out the checks to allow this.

The new NIR-based lowering will trigger this case, which would have
caused backend validation errors (src is null) without this patch.

Reviewed-by: Eric Anholt <eric@anholt.net>

09901686

nir: Make nir_lower_clip_vs optionally work with variables. · 5b682143

Kenneth Graunke authored 7 years ago


The way nir_lower_clip_vs() works with store_output intrinsics makes a
ton of assumptions about the driver_location field.

In i965 and iris, I'd rather do this lowering early and work with
variables.  v3d may want to switch to that as well, and ir3 could too,
but I'm not sure exactly what would need updating.  For now, handle
both methods.

Reviewed-by: Eric Anholt <eric@anholt.net>

5b682143

nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs. · d0f746b6
Kenneth Graunke authored 7 years ago
```
I'll want the variables in the next patch.

Reviewed-by: Eric Anholt <eric@anholt.net>
```
d0f746b6

nir: Inline lower_clip_vs() into nir_lower_clip_vs(). · 63c86968

Kenneth Graunke authored 7 years ago


It's now called exactly once, and there's not really any distinction.

Reviewed-by: Eric Anholt <eric@anholt.net>

63c86968

nir: Use nir_shader_get_entrypoint in nir_lower_clip_vs(). · bfa789ac
Kenneth Graunke authored 7 years ago
```
Reviewed-by: Eric Anholt <eric@anholt.net>
```
bfa789ac

nir: handle shared pointers in lowering indirect derefs. · c8a35285

Dave Airlie authored 6 years ago


Check if the base ends up with no variable, and continue
if we see that case outside the loop.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

c8a35285

nir: move getting deref from var after we check deref type. · 760859ca

Dave Airlie authored 6 years ago


I posted a load of hacks before to do this, Jason suggested this,
just check the deref mode, not the variable mode and delay getting
the variable until we know the type.

avoids crashes when derefing shared memory pointers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

760859ca

spirv/vtn: handle variable pointers without offset lowering · 2f4f5a50
Dave Airlie authored 6 years ago
```
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
```
2f4f5a50

intel/fs,vec4: Fix a compiler warning · dca35c59

Faith Ekstrand authored 6 years ago


../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
       assert(nir_intrinsic_write_mask(instr) ==
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
              (1 << instr->num_components) - 1);
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This was caused by 6339aba7 which added these completely valid
checks.  However clang likes to complain about signedness mismatches.

Fixes: 6339aba7 "intel/compiler: Lower SSBO and shared..."
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

dca35c59

intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values · 060817b2

Faith Ekstrand authored 6 years ago

It's not at all intel-specific; the formula is dictated by OpenGL and
Vulkan. The only intel-specific thing is that we need the lowering. As
a nice side-effect, the new version is variable-group-size ready.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>

060817b2

gbm: add missing comma between strings · 486091bc

Eric Engestrom authored 6 years ago and

Eric Engestrom committed 6 years ago


Fixes: d971a423 "loader: Factor out the common driver
                              opening logic from each loader."
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>

486091bc

radv: implement fast HTILE clears for depth or stencil only on GFX9 · 72410755

Samuel Pitoiset authored 6 years ago


This allows to fast clear the depth part (or the stencil part)
of a depth+stencil surface when HTILE is enabled. I didn't test
on GFX8, so it's disabled currently.

This gives a very nice boost, for example when clearing the depth
aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).

BEFORE: 235 us
AFTER: 13 us

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

72410755

radv: rewrite the condition that checks allowed depth/stencil values · 7dcddbe5

Samuel Pitoiset authored 6 years ago


Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

7dcddbe5

radv: check allowed fast HTILE clears a bit earlier · 9133bbf1

Samuel Pitoiset authored 6 years ago


Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

9133bbf1

radv: add radv_is_fast_clear_{depth,stencil}_allowed() helpers · 193ad474

Samuel Pitoiset authored 6 years ago


Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

193ad474

radv: add radv_get_htile_fast_clear_value() helper · c7e142ed

Samuel Pitoiset authored 6 years ago


Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

c7e142ed

radv: remove unnecessary goto in the fast clear paths · 6f3fbcc0

Samuel Pitoiset authored 6 years ago


Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

6f3fbcc0

radv/winsys: remove the max IBs per submit limit for the sysmem path · 36006e3c

Samuel Pitoiset authored 6 years ago


This path will be eventually improved later but as it's only
used on SI (or with RADV_DEBUG=noibs), I'm not sure if that
matters much.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

36006e3c

radv/winsys: remove the max IBs per submit limit for the fallback path · 4d30f2c6

Samuel Pitoiset authored 6 years ago


The chained submission is the fastest path and it should now
be used more often than before. This removes some EOP events.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

4d30f2c6

etnaviv: use dummy RT buffer when rendering without color buffer · 8ca8a6a7

Lucas Stach authored 6 years ago


At least GC2000 seems to push some dirt from the PE color cache into
the last bound render target when drawing depth only. Newer cores
seem to behave properly and don't do this, but I have found no way
to fix it on GC2000. Flushes and stalls don't seem to make any
difference.

In order to stop the core from pushing the dirt into a precious real
render target, plug in dummy buffer when rendering without a color
buffer.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>

8ca8a6a7

virgl: fix vtest regression since fencing changes. · 87062040

Dave Airlie authored 6 years ago


The in_fence_fd needs to be initialised to -1.

Fixes: d1a1c21e (virgl: native fence fd support)

Reviewed-by: Robert Foss <robert.foss@collabora.com>

87062040

radv: always clear the FCE predicate after DCC/FMASK/CMASK decompressions · 55c75d2b

Samuel Pitoiset authored 6 years ago


DCC and FMASK also imply a fast-clear eliminate, so it should be
safe to reset the predicate unconditionally. We still only skip
FMASK or CMASK decompressions for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

55c75d2b

radv: tidy up radv_set_dcc_need_cmask_elim_pred() · 483a28bf

Samuel Pitoiset authored 6 years ago


This is just a small cleanup.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

483a28bf

radeonsi: fix an out-of-bounds read reported by ASAN · 46a59ce0

Nicolai Hähnle authored 6 years ago


We read 4 values out of sample_locs_8x, so make sure the array is
big enough.

Fixes: ac76aeef ("radeonsi: switch back to standard DX sample positions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

46a59ce0

r600: Only set context streamout strides info from the shader that has outputs · d174cbcc

Gert Wollny authored 6 years ago and

Gert Wollny committed 6 years ago

With 5d517a streamout info is only attached to the shader for which the
transform feedback is actually recorded, but the driver set the context info
with each state submitted, thereby always using the info data that was
attached to the vertex shader.

Pass the streamout stride info to the context only from the shader that
actually has outputs. (Thanks to Marek Olšák for pointing me in the right
direction)

Fixes regresion with: dEQP-GLES31.functional.tessellation.invariance.*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108734


Fixes: 5d517a59
  st/mesa: Don't record garbage streamout information in the non-SSO case.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

d174cbcc

i965:use FRAMEBUFFER_UNSUPPORTED instead of FRAMEBUFFER_INCOMPLETE_DIMENSIONS · 18a8e11a

Gert Wollny authored 6 years ago and

Gert Wollny committed 6 years ago


FRAMEBUFFER_INCOMPLETE_DIMENSIONS is not supported for GLES 3.0 and later and
not defined for Desktop OpenGL. Instead use FRAMEBUFFER_UNSUPPORTED like it
was done before.

Thanks to Iago Toral and Andrey Simiklit for pointing out the problem and the
details.

Fixes:  ebcde345
   i965: be more specific about FBO completeness errors
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

18a8e11a

virgl: Use file descriptor instead of un-allocated object · 40eca7d3

Gert Wollny authored 6 years ago


The structure qdws is not allocated at this point, nor is the
file descriptor set to it's member. Use the fd directly instead.

Fixes:  d1a1c21e
    virgl: native fence fd support

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>

40eca7d3

i965: Add support for and expose EXT_texture_sRGB_R8 · 78fdc507

Gert Wollny authored 6 years ago and

Gert Wollny committed 6 years ago


Emulate MESA_FORMAT_R_SRGB8 by using L8_UNORM_SRGB. This is possible
because component swizzling is handled based on the mesa format and,
hence, the a r001 swizzling can be used to correct the components.

Enables and makes pass (tested on Kabylake)

  dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
  dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>

78fdc507

i965: Force zero swizzles for unused components in GL_RED and GL_RG · c5363869

Gert Wollny authored 6 years ago and

Gert Wollny committed 6 years ago


This makes it possible to use a hardware luminance format as RED format.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

c5363869

i965: be more specific about FBO completeness errors · ebcde345

Gert Wollny authored 6 years ago and

Gert Wollny committed 6 years ago


The driver was returning GL_FRAMEBUFFER_UNSUPPORTED for all cases of an
incomplete fbo, be a bit more specific about this following the description
of glCheckFramebufferStatus.

This helps to keeps dEQP happy when adding EXT_texture_sRGB_R8 support.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

ebcde345

i965: Correct L8_UNORM_SRGB table entry · 24a02157

Gert Wollny authored 6 years ago and

Gert Wollny committed 6 years ago


As the name says, the format is an sRGB format.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

24a02157

Nov 18, 2018

virgl: Clean up fences commit · 70692adf

Robert Foss authored 6 years ago


Remove a dead variable, a int->bool conversion and some
whitespace changes.

Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

70692adf

Nov 17, 2018

i915: Delete swizzling detection logic. · c2e3d0f1

Kenneth Graunke authored 6 years ago


This is all leftover from the i965 split.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

c2e3d0f1

nv50/ir/ra: enforce max register requirement, and change spill order · beb66d37

Ilia Mirkin authored 6 years ago


On nv50, certain operations must happen on regs below 64, due to
encoding requirements. First of all, we add infrastructure to enforce
this. Secondly we change the spill order to first spill RIG nodes that
are unconstrained, followed by ones that are.

This makes the gamecube logo shadertoy compile properly. Curiously, if
we adjust the spill order so that we first spill the constrained RIG
nodes instead, the RA also succeeds. However it seems more logical to
first spill the unconstrained ones.

While we're at it, drop the nv50 max register to reserve r127 as the
zero register of last resort (r63 is preferred).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Karol Herbst <kherbst@redhat.com>

beb66d37

nv50/ir/ra: improve condition for short regs, unify with cond for 16-bit · 799e0218

Ilia Mirkin authored 6 years ago


Instead of the size restriction existing in two places, and potentially
being applied twice, we move this together. Ops with 16-bit register
addresses can only take a short reg, and ops with immediates can only
take a short reg.

Of course we leave the immediate 0 in place since we know that it will
be replaced by r63/r127 down the line, so don't treat zeroes as an
immediate.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

799e0218

nv50/ir: delete MINMAX instruction that is no longer in the BB · 955d943c

Ilia Mirkin authored 6 years ago


We removed the op from the BB, but it was still listed in its sources'
uses. This could trip up some logic down the line which analyzes all the
uses of an l-value, e.g. spilling.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

955d943c

Admin message

Admin message