- 25 Jun, 2019 6 commits
-
-
Marek Olšák authored
Tested-by:
Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Marek Olšák authored
Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by:
Dieter Nützel <Dieter@nuetzel-hh.de>
-
Marek Olšák authored
otherwise the behavior is undefined Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by:
Dieter Nützel <Dieter@nuetzel-hh.de>
-
Nicolai Hähnle authored
We'll have to extend this at some point, and using a bitfield union in this way makes it easier to get the right index without excessive branching. Tested-by:
Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Nicolai Hähnle authored
The initial prototype used a processor-specific symbol type, but feedback suggests that an approach using processor-specific section name that encodes the alignment analogous to SHN_COMMON symbols is preferred. This patch keeps both variants around for now to reduce problems with LLVM compatibility as we switch branches around. This also cleans up the error reporting in this function. Tested-by:
Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Marek Olšák authored
Tested-by:
Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
- 24 Jun, 2019 34 commits
-
-
Dylan Baker authored
-
Dylan Baker authored
-
Dylan Baker authored
-
Ian Romanick authored
Incrementing the iteration count was intended to fix an off-by-one error when the first terminator was superseded by a later terminator. If there is no first terminator or later terminator, there is no off-by-one error. Incrementing the loop count creates one. This can be seen in loops like: do { if (something) { // No breaks or continues here. } } while (false); Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com> Tested-by:
Abel Briggs <abelbriggs1@hotmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110953 Fixes: 646621c6 ("glsl: make loop unrolling more like the nir unrolling path")
-
Eric Anholt authored
We were pessimistically uploading all of it in case of indirection, but we can just bump that when we encounter indirection. total constlen in shared programs: 2529623 -> 2485933 (-1.73%) Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by:
Rob Clark <robdclark@gmail.com>
-
Eric Anholt authored
ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we need to upload (all of it, since it will lower indirect UBO 0 accesses from load_ubo back to indirection on the constant buffer). Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by:
Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
If the NIR-level analysis decided to move UBO loads to the constant file, but the backend decided not to load those constants, we could upload past the end of constlen. This is particularly relevant for pre-a6xx, where we emit a different constlen between bin and render variants. (Fix by Rob, commit message by anholt) Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Alyssa Rosenzweig authored
This is the hardware max, as far as I can tell. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Just a little spring cleanup, extending UBOs to vertex shaders in the process. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
UBOs and uniforms now use a common code path with an explicit `index` argument passed, enabling UBO reads. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Prevents an assert(0) later in this (not so edge) case. We still have to have a dummy there. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
We've known about this for a while, but it was never formally in the machine header files / decoder, so let's add them in. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Now that all the counting is sorted, it's a matter of passing along a GPU address and going. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
We already uploaded UBOs, but only a fixed number (1) for uniforms; let's upload as many as we compute we need. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
We look at the highest set bit in the UBO enable mask to work out the maximum indexable UBO, i.e. the UBO count as we need to report to the hardware. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
We refactor panfrost_constant_buffer to mirror v3d's constant buffer handling, to enable UBOs as well as a single set of uniforms. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This doesn't handle Y-flipping, but it's good enough to render the stars in Neverball. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
In preparation for lowering point sprites, track them like we track alpha testing state. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Caio Marcelo de Oliveira Filho authored
Those either depend on information filled by the NIR linking steps OR are restricted by those: - gl_nir_lower_samplers: depends on UniformStorage being set by the linker. - brw_nir_lower_image_load_store: After 6981069f "i965: Ignore uniform storage for samplers or images, use binding info" we want this pass to happen after gl_nir_lower_samplers. - gl_nir_lower_buffers: depends on UniformBlocks and SharedStorageBlocks being set by the linker. For the regular GLSL code path, those datastructures are filled earlier. For NIR linking code path we need to generate the nir_shader first then process it -- and currently the processing works with all shaders together. So move the passes out of brw_create_nir into its own function, called by the brwProgramStringNotify and brw_link_shader(). This patch prepares ground for ARB_gl_spirv, that will make use of NIR linker. Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org>
-
Caio Marcelo de Oliveira Filho authored
The iterator `i` already walks the right amount now that is incremented by `dmul`, so no need to `* 2`. Fixes invalid memory access in upcoming ARB_gl_spirv tests. Failure bisected by Arcady Goldmints-Orlov. Fixes: b019fe8a "glsl/nir: Fix handling of 64-bit values in uniform storage" Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Caio Marcelo de Oliveira Filho authored
For n_columns == 1, we have a vector which is handled by the else case. Fixes invalid memory access in upcoming ARB_gl_spirv tests. Failure bisected by Arcady Goldmints-Orlov. Fixes: 81e51b41 "nir: Make nir_constant a vector rather than a matrix" Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Daniel Schürmann authored
Reviewed-by:
Connor Abbott <cwabbott0@gmail.com>
-
Daniel Schürmann authored
Reviewed-by:
Connor Abbott <cwabbott0@gmail.com>
-
Daniel Schürmann authored
bitfield_select is defined as: bitfield_select(mask, base, insert) = (mask & base) | (~mask & insert) matching the behavior of AMD's BFI instruction. Reviewed-by:
Connor Abbott <cwabbott0@gmail.com>
-
Daniel Schürmann authored
This lets us use the optimization pattern (('ult', 31, ('iand', b, 31)), False) to remove the bcsel instruction for code originating in D3D shaders. Reviewed-by:
Connor Abbott <cwabbott0@gmail.com>
-
Daniel Schürmann authored
The [iu]bfe and bfm instructions are defined to only use the five least significant bits. This optimizes a common pattern from D3D -> SPIR-V translation. Reviewed-by:
Connor Abbott <cwabbott0@gmail.com>
-
Daniel Schürmann authored
That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by:
Eric Anholt <eric@anholt.net> Reviewed-by:
Connor Abbott <cwabbott0@gmail.com>
-
Daniel Schürmann authored
These optimizations are based on the fact that 'and(a,b) <= umin(a,b)'. For AMD, this series moves the optimization from LLVM to NIR, so currently no vkpipeline-db changes here. Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com>
-
Andreas Baierl authored
Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
Andreas Baierl authored
Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
Andreas Baierl authored
Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
Eric Engestrom authored
Suggested-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Eric Engestrom <eric.engestrom@intel.com> Reviewed-by:
Jordan Justen <jordan.l.justen@intel.com> Reviewed-by:
Christian Gmeiner <christian.gmeiner@gmail.com>
-
Andreas Baierl authored
Since we cannot handle ffma in ppir, lower it on nir level already. Signed-off-by:
Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by:
Qiang Yu <yuq825@gmail.com>
-
Samuel Pitoiset authored
This simple extension might be useful for debugging purposes. GAPID has support for it. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-