- Nov 29, 2018
-
-
Connor Abbott authored
The non-failure path can be tested by just compiling mesa and then testing it, but the failure paths won't be hit unless you make a mistake, so it's best to test them with some unit tests.
-
Connor Abbott authored
Before this commit, there were two copies of the algorithm: one in C, that we would use to figure out what bit-size to give the replacement expression, and one in Python, that emulated the C one and tried to prove that the C algorithm would never fail to correctly assign bit-sizes. That seemed pretty fragile, and likely to fall over if we make any changes. Furthermore, the C code was really just recomputing more-or-less the same thing as the Python code every time. Instead, we can just store the results of the Python algorithm in the C datastructure, and consult it to compute the bitsize of each value, moving the "brains" entirely into Python. Since the Python algorithm no longer has to match C, it's also a lot easier to change it to something more closely approximating an actual type-inference algorithm. The algorithm used is based on Hindley-Milner, although deliberately weakened a little. It's a few more lines than the old one, judging by the diffstat, but I think it's easier to verify that it's correct while being as general as possible. We could split this up into two changes, first making the C code use the results of the Python code and then rewriting the Python algorithm, but since the old algorithm never tracked which variable each equivalence class, it would mean we'd have to add some non-trivial code which would then get thrown away. I think it's better to see the final state all at once, although I could also try splitting it up.
-
- Nov 20, 2018
-
-
Samuel Pitoiset authored
Fixes an assertion in SoTTR. Fixes: dd0172e8 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.") Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
- Nov 19, 2018
-
-
Bas Nieuwenhuizen authored
These force the index to be used in the instruction so we don't need the workaround. Totals: SGPRS: 1321642 -> 1321802 (0.01 %) VGPRS: 943664 -> 943788 (0.01 %) Spilled SGPRs: 28468 -> 28480 (0.04 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 52415292 -> 52338932 (-0.15 %) bytes LDS: 400 -> 400 (0.00 %) blocks Max Waves: 233903 -> 233803 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238344 -> 238504 (0.07 %) VGPRS: 232732 -> 232856 (0.05 %) Spilled SGPRs: 13125 -> 13137 (0.09 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 15752712 -> 15676352 (-0.48 %) bytes LDS: 139 -> 139 (0.00 %) blocks Max Waves: 31680 -> 31580 (-0.32 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com>
-
Kenneth Graunke authored
The existing backend code assumed that if VARYING_SLOT_CLIP_DIST0 was written, then VARYING_SLOT_CLIP_DIST1 would be as well. That's true with the current lowering, but not necessary if there are 4 or fewer clip distances. Separate out the checks to allow this. The new NIR-based lowering will trigger this case, which would have caused backend validation errors (src is null) without this patch. Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Kenneth Graunke authored
The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Kenneth Graunke authored
I'll want the variables in the next patch. Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Kenneth Graunke authored
It's now called exactly once, and there's not really any distinction. Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Kenneth Graunke authored
Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Dave Airlie authored
Check if the base ends up with no variable, and continue if we see that case outside the loop. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Dave Airlie authored
I posted a load of hacks before to do this, Jason suggested this, just check the deref mode, not the variable mode and delay getting the variable until we know the type. avoids crashes when derefing shared memory pointers. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Dave Airlie authored
Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Faith Ekstrand authored
../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare] assert(nir_intrinsic_write_mask(instr) == ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ (1 << instr->num_components) - 1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This was caused by 6339aba7 which added these completely valid checks. However clang likes to complain about signedness mismatches. Fixes: 6339aba7 "intel/compiler: Lower SSBO and shared..." Reviewed-by:
Alejandro Piñeiro <apinheiro@igalia.com>
-
Faith Ekstrand authored
It's not at all intel-specific; the formula is dictated by OpenGL and Vulkan. The only intel-specific thing is that we need the lowering. As a nice side-effect, the new version is variable-group-size ready. Reviewed-by:
Plamena Manolova <plamena.manolova@intel.com>
-
Fixes: d971a423 "loader: Factor out the common driver opening logic from each loader." Signed-off-by:
Eric Engestrom <eric@engestrom.ch> Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Samuel Pitoiset authored
This allows to fast clear the depth part (or the stencil part) of a depth+stencil surface when HTILE is enabled. I didn't test on GFX8, so it's disabled currently. This gives a very nice boost, for example when clearing the depth aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster). BEFORE: 235 us AFTER: 13 us Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
This path will be eventually improved later but as it's only used on SI (or with RADV_DEBUG=noibs), I'm not sure if that matters much. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
The chained submission is the fastest path and it should now be used more often than before. This removes some EOP events. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Lucas Stach authored
At least GC2000 seems to push some dirt from the PE color cache into the last bound render target when drawing depth only. Newer cores seem to behave properly and don't do this, but I have found no way to fix it on GC2000. Flushes and stalls don't seem to make any difference. In order to stop the core from pushing the dirt into a precious real render target, plug in dummy buffer when rendering without a color buffer. Signed-off-by:
Lucas Stach <l.stach@pengutronix.de> Reviewed-by:
Philipp Zabel <p.zabel@pengutronix.de>
-
Dave Airlie authored
The in_fence_fd needs to be initialised to -1. Fixes: d1a1c21e (virgl: native fence fd support) Reviewed-by:
Robert Foss <robert.foss@collabora.com>
-
Samuel Pitoiset authored
DCC and FMASK also imply a fast-clear eliminate, so it should be safe to reset the predicate unconditionally. We still only skip FMASK or CMASK decompressions for now. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Samuel Pitoiset authored
This is just a small cleanup. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Nicolai Hähnle authored
We read 4 values out of sample_locs_8x, so make sure the array is big enough. Fixes: ac76aeef ("radeonsi: switch back to standard DX sample positions") Reviewed-by:
Marek Olšák <marek.olsak@amd.com>
-
With 5d517a streamout info is only attached to the shader for which the transform feedback is actually recorded, but the driver set the context info with each state submitted, thereby always using the info data that was attached to the vertex shader. Pass the streamout stride info to the context only from the shader that actually has outputs. (Thanks to Marek Olšák for pointing me in the right direction) Fixes regresion with: dEQP-GLES31.functional.tessellation.invariance.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108734 Fixes: 5d517a59 st/mesa: Don't record garbage streamout information in the non-SSO case. Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Reviewed-by:
Dave Airlie <airlied@redhat.com>
-
FRAMEBUFFER_INCOMPLETE_DIMENSIONS is not supported for GLES 3.0 and later and not defined for Desktop OpenGL. Instead use FRAMEBUFFER_UNSUPPORTED like it was done before. Thanks to Iago Toral and Andrey Simiklit for pointing out the problem and the details. Fixes: ebcde345 i965: be more specific about FBO completeness errors Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Reviewed-by:
Iago Toral Quiroga <itoral@igalia.com>
-
Gert Wollny authored
The structure qdws is not allocated at this point, nor is the file descriptor set to it's member. Use the fd directly instead. Fixes: d1a1c21e virgl: native fence fd support Signed-off-by:
Gert Wollny <gert.wollny@collabora.com>
-
Emulate MESA_FORMAT_R_SRGB8 by using L8_UNORM_SRGB. This is possible because component swizzling is handled based on the mesa format and, hence, the a r001 swizzling can be used to correct the components. Enables and makes pass (tested on Kabylake) dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.* dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8* Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Acked-by:
Eric Engestrom <eric.engestrom@intel.com>
-
This makes it possible to use a hardware luminance format as RED format. Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
The driver was returning GL_FRAMEBUFFER_UNSUPPORTED for all cases of an incomplete fbo, be a bit more specific about this following the description of glCheckFramebufferStatus. This helps to keeps dEQP happy when adding EXT_texture_sRGB_R8 support. Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
As the name says, the format is an sRGB format. Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
- Nov 18, 2018
-
-
Robert Foss authored
Remove a dead variable, a int->bool conversion and some whitespace changes. Signed-off-by:
Robert Foss <robert.foss@collabora.com> Reviewed-by:
Emil Velikov <emil.velikov@collabora.com>
-
- Nov 17, 2018
-
-
Kenneth Graunke authored
This is all leftover from the i965 split. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Ilia Mirkin authored
On nv50, certain operations must happen on regs below 64, due to encoding requirements. First of all, we add infrastructure to enforce this. Secondly we change the spill order to first spill RIG nodes that are unconstrained, followed by ones that are. This makes the gamecube logo shadertoy compile properly. Curiously, if we adjust the spill order so that we first spill the constrained RIG nodes instead, the RA also succeeds. However it seems more logical to first spill the unconstrained ones. While we're at it, drop the nv50 max register to reserve r127 as the zero register of last resort (r63 is preferred). Signed-off-by:
Ilia Mirkin <imirkin@alum.mit.edu> Acked-by:
Karol Herbst <kherbst@redhat.com>
-
Ilia Mirkin authored
Instead of the size restriction existing in two places, and potentially being applied twice, we move this together. Ops with 16-bit register addresses can only take a short reg, and ops with immediates can only take a short reg. Of course we leave the immediate 0 in place since we know that it will be replaced by r63/r127 down the line, so don't treat zeroes as an immediate. Signed-off-by:
Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by:
Karol Herbst <kherbst@redhat.com>
-
Ilia Mirkin authored
We removed the op from the BB, but it was still listed in its sources' uses. This could trip up some logic down the line which analyzes all the uses of an l-value, e.g. spilling. Signed-off-by:
Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by:
Karol Herbst <kherbst@redhat.com>
-