- Aug 12, 2019
-
-
Andreas Baierl authored
Lower fddx and fddy and set the right bits in codegen. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
-
Bas Nieuwenhuizen authored
Reviewed-by: Dave Airlie <airlied@redhat.com>
-
Bas Nieuwenhuizen authored
Reviewed-by: Dave Airlie <airlied@redhat.com>
-
Bas Nieuwenhuizen authored
Reviewed-by: Dave Airlie <airlied@redhat.com>
-
Bas Nieuwenhuizen authored
Reviewed-by: Dave Airlie <airlied@redhat.com>
-
Bas Nieuwenhuizen authored
This allows enabling the shader info keeping on a per shader basis. Also disables the cache on a per shader basis. Reviewed-by: Dave Airlie <airlied@redhat.com>
-
Bas Nieuwenhuizen authored
So we can add the functions. Reviewed-by: Dave Airlie <airlied@redhat.com>
-
Bas Nieuwenhuizen authored
Reviewed-by: Dave Airlie <airlied@redhat.com> Allows us to easily dump all nir shaders for combined variants in vega and simplifies ownership.
-
Bas Nieuwenhuizen authored
Reviewed-by: Dave Airlie <airlied@redhat.com>
-
Bas Nieuwenhuizen authored
Not AC because a lot of it is data extraction out of radv structs. Reviewed-by: Dave Airlie <airlied@redhat.com>
-
Francisco Jerez authored
See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Francisco Jerez authored
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Francisco Jerez authored
The default pixel hashing mode settings used for slice and subslice load balancing are far from optimal under certain conditions (see the comments below for the gory details). The top-of-the-line GT4 parts suffer from a particularly severe performance problem currently due to a subslice load balancing issue. Fixing this seems to improve graphics performance across the board for most of the benchmarks in my test set, up to ~20% in some cases, e.g. from SKL GT4: unigine/valley: 3.44% ±0.11% gfxbench/gl_manhattan31: 3.99% ±0.13% gputest/pixmark_piano: 7.95% ±0.33% synmark/OglTexFilterAniso: 15.22% ±0.07% synmark/OglTexMem128: 22.26% ±0.06% Lower-end platforms are also affected by some subslice load imbalance to a lesser degree, especially during CCS resolve and fast clear operations, which are handled specially here due to rasterization ocurring in reduced CCS coordinates, which changes the semantics of the pixel hashing mode settings. No regressions seen during my tests on some SKL, KBL and BXT configurations. Additional benchmark reports welcome on any Gen9 platforms (that includes anything with Skylake, Broxton, Kabylake, Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your renderer string). P.S.: A similar problem is likely to be present on other non-Gen9 platforms, especially for CCS resolve and fast clear operations. Will follow-up with additional patches fixing the hashing mode for those once I have enough performance data to justify it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Alyssa Rosenzweig authored
This is a bit of a hack, but it'll hold us over until we have 64-bit support wired through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This helps RA be slightly more reasonable. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Fixes RA fails with multiple indirect SSBO writes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This used a delicate hack to try to find indirect inputs and skip them as candidates for pairing. Let's use a better criterion -- no sources -- and pair based on that. We could do better, but that would require more complex data flow analysis than we're interested in doing here. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Just a sysval to route through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
We implement gl_WorkGroupID and gl_LocalInvocationID, which map to ld_compute_id with special sources. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
It's used for more general loads within a compute shader. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This allows liveness analysis within a loop to be more fine grained, fixing RA failures with partial spilled movs within a loop, as well as enabling a slight reduction of register pressure more generally: total registers in shared programs: 350 -> 347 (-0.86%) registers in affected programs: 12 -> 9 (-25.00%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
They're not "sources" but they follow the same conventions. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Since we are seeing some use of MIR post-scheduling, let's get this printed right. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Hint for the RA to avoid infinite spilling loops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This now works for load/store and texture instructions as well as ALU. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Just laying the groundwork. Reads and writes should be supported (both direct and indirect, either int or float, vec1/2/3/4), but no bounds checking is done at the moment. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This is a corner case that happens a lot with SSBOs. Basically, if we only read a few components of a uniform, we need to only spill a few components or otherwise we try to spill what we spilled and RA hangs. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
We don't want to load a 128-bit sysval when 64-bits will do. Fixes RA failures with SSBO indirect writes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
We want to edit it after emission in some cases. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
Sometimes a sysval is used to facilitate an instruction but is not the instruction itself. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This is of course suboptimal for performance, forcing each glDispatchCompute call to be submitted separately to the kernel and finish to completion. However, for the initial bring-up of compute jobs, this simplifies quite a bit. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
For each SSBO index we get from Gallium/NIR, we need two pieces of information in the shader: 1. The address of the SSBO in GPU memory. Within the shader, we'll be accessing it with raw memory load/store, so we need the actual address, not just an index. 2. The size of the SSBO. This is not strictly necessary, but at some point, we may like to do bounds checking on SSBO accesses. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Alyssa Rosenzweig authored
This u_prim.h helper determines the number of outputs for stream output, given a particular primitive type and a vertex count. This is useful for statically calculating sizes of stream output buffers (i.e. when there is no geometry/tessellation shader in use). This helper will be used in Panfrost's transform feedback implementation, as you can probably guess since why else would I be submitting it.... See also dEQP's getTransformFeedbackOutputCount routine. v2: Simplify definition using new helpers, which also extends to non-ES2 primitive types (Eric). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>
-
Marek Olšák authored
tgsi_to_nir is no longer optional if NIR is enabled.
-
Marek Olšák authored
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
-
Marek Olšák authored
TCS system values for internal passthru TCS, needed by radeonsi NIR support Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
-
Marek Olšák authored
for radeonsi NIR support.
-