- 13 May, 2020 40 commits
-
-
Axel Davy authored
For now this parameter doesn't do anything. It means the implementation is allowed to use a cache on disk. Signed-off-by:
Axel Davy <davyaxel0@gmail.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Part-of: <!4993>
-
Eric Anholt authored
After fixing the power of two sizing, pitches worked, but 1-pixel high and unaligned height miplevels were off. Part-of: <!4931>
-
Eric Anholt authored
The HW requires a log2 width/height of the level 0 meta_* size in the descriptors, making it pretty clear that UBWC mipmapping is all power-of-two sized. Fixes a bunch of failures in the upcoming unit UBWC layout unit tests. Part-of: <!4931>
-
Eric Anholt authored
Using texturator on a P3A at 1024x1024, RG8 has log2w/h of 6x7 instead of R16I/UI's 6x8. The other blockw/h I verified other than cpp=1 (R8/R8I/R8UI didn't use UBWC) and 32 (would need a bigger type). Part-of: <!4931>
-
Eric Anholt authored
The r8g8 case UBWC alignment will be changing in the next commit, so fdl6_get_ubwc_blockwidth needs to start paying attention to r8g8 too. Part-of: <!4931>
-
Eric Anholt authored
These offsets are hand-computed referencing msm_media_info.h, and match our driver's current behavior. Part-of: <!4931>
-
Eric Anholt authored
Part-of: <!4931>
-
Eric Anholt authored
Noticed when poking around with texture layouts and found that my big texture layout from the blob buffer overflowed. Values come from http://vulkan.gpuinfo.org for Adreno 418, 512, 630. Part-of: <!4931>
-
Daniel Schürmann authored
Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!4062>
-
Daniel Schürmann authored
This patch adds some control flow information to the state to keep track whether a loop contains divergent continue or break statements to not having to recalculate this property for every phi. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!4062>
-
Daniel Schürmann authored
This patch splits the visit_phi() function into three different ones according to the kind of phi (merge-node, loop-header or loop-exit) and calls them when visiting the cf_nodes. This allows to revisit loops if the loop header's phis have changed, only. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!4062>
-
Daniel Schürmann authored
Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!4062>
-
Daniel Schürmann authored
Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Part-of: <!4062>
-
Jason Ekstrand authored
v2: fix usage in ACO (by Daniel Schürmann) Reviewed-by:
Rhys Perry <pendingchaos02@gmail.com> Part-of: <!4062>
-
Marek Olšák authored
Trivial. Part-of: <!4902>
-
Marek Olšák authored
Acked-by:
Eric Anholt <eric@anholt.net> Acked-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!4902>
-
Marek Olšák authored
Acked-by:
Eric Anholt <eric@anholt.net> Acked-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!4902>
-
Marek Olšák authored
Acked-by:
Eric Anholt <eric@anholt.net> Acked-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!4902>
-
Connor Abbott authored
Also, rewrite the format decision code so that we correctly decide when the linear fallback is needed, even if UBWC is disabled. As part of that, I also moved around some of the code to handle compressed formats to make sure that copying compressed formats with a linear staging blit works (this is now possible since we started allowing tiled compressed textures). Part-of: <!5007>
-
Connor Abbott authored
Part-of: <!5007>
-
Connor Abbott authored
This is simpler than a full-blown memory reuse mechanism, but is good enough to make sure that repeatedly doing a copy that requires the linear staging buffer workaround won't use excessive memory or be slowed down due to repeated allocations. Part-of: <!5007>
-
Rhys Perry authored
Totals from 5860 (4.59% of 127638) affected shaders: VGPRs: 460212 -> 460216 (+0.00%) CodeSize: 65554356 -> 65464816 (-0.14%) Instrs: 12655972 -> 12633578 (-0.18%) Copies: 1309994 -> 1292163 (-1.36%) Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Part-of: <!4990>
-
Rhys Perry authored
Totals from 8487 (6.65% of 127638) affected shaders: CodeSize: 62061988 -> 62058020 (-0.01%); split: -0.01%, +0.01% Instrs: 11910757 -> 11885409 (-0.21%); split: -0.21%, +0.00% Copies: 1065244 -> 1040945 (-2.28%); split: -2.30%, +0.02% Branches: 349665 -> 348914 (-0.21%) Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Part-of: <!4990>
-
Rhys Perry authored
Totals from 14340 (11.23% of 127638) affected shaders: SGPRs: 1251648 -> 1251512 (-0.01%) VGPRs: 994556 -> 994104 (-0.05%); split: -0.06%, +0.01% CodeSize: 122894528 -> 121099604 (-1.46%); split: -1.49%, +0.03% MaxWaves: 106039 -> 106103 (+0.06%); split: +0.06%, -0.00% Instrs: 23860066 -> 23414317 (-1.87%); split: -1.90%, +0.03% Copies: 2448228 -> 2049305 (-16.29%); split: -16.37%, +0.07% Branches: 789381 -> 757921 (-3.99%); split: -4.62%, +0.64% Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Part-of: <!4990>
-
Rhys Perry authored
If one VMEM instruction uses a sampler and the other doesn't, we can't do this optimization. Totals from 47 (0.04% of 127638) affected shaders: CodeSize: 271744 -> 271656 (-0.03%); split: -0.04%, +0.01% Instrs: 52783 -> 52761 (-0.04%); split: -0.05%, +0.01% Cycles: 5547040 -> 5546952 (-0.00%); split: -0.00%, +0.00% VMEM: 10022 -> 9887 (-1.35%) Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Part-of: <!4949>
-
Rhys Perry authored
This was unnecessary and messed with statistics Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Part-of: <!4949>
-
Andres Gomez authored
[dump_trace_images] Info: Dumping trace /tmp/tracie.test.ap5pshYcsg/traces-db/trace1/magenta.testtrace... ERROR [dump_trace_images] Debug: === Failure log start === invalid literal for int() with base 16: 'in' [dump_trace_images] Debug: === Failure log end === [check_image] Trace /tmp/tracie.test.ap5pshYcsg/traces-db/trace1/magenta.testtrace couldn't be replayed. See above logs for more information. Traceback (most recent call last): File "/tmp/tracie.test.ap5pshYcsg/tracie.py", line 176, in <module> main() File "/tmp/tracie.test.ap5pshYcsg/tracie.py", line 164, in main ok, result = gitlab_check_trace(project_url, commit_id, args.device_name, trace, expectation) TypeError: cannot unpack non-iterable bool object Fixes: efbbf8bb ("tracie: Print results in a machine readable format") Signed-off-by:
Andres Gomez <agomez@igalia.com> Reviewed-by:
Rohan Garg <rohan.garg@collabora.com> Part-of: <!4839>
-
Andres Gomez authored
Otherwise, we will fail when the traces description file doesn't contain any checksum for the specified device. Fixes: efbbf8bb ("tracie: Print results in a machine readable format") Signed-off-by:
Andres Gomez <agomez@igalia.com> Reviewed-by:
Rohan Garg <rohan.garg@collabora.com> Part-of: <!4839>
-
Samuel Pitoiset authored
When the LLVM version is too old or missing, SotTR applies shader workarounds and that reduces performance by 2-5% with ACO. SotTR workarounds are applied with LLVM 8 and older, so reporting LLVM 9.0.1 should be fine. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!4984>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Eric Engestrom <eric@engestrom.ch> Part-of: <!4987>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by:
Eric Engestrom <eric@engestrom.ch> Part-of: <!4987>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by:
Eric Engestrom <eric@engestrom.ch> Part-of: <!4987>
-
Samuel Pitoiset authored
ANV and RADV have similar Python code for generating extensions and dispatch tables. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by:
Eric Engestrom <eric@engestrom.ch> Part-of: <!4987>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!4886>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!4886>
-
Samuel Pitoiset authored
Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <!4886>
-
Marek Vasut authored
The GC880 on iMX6DL indicates in it's minorFeatures2 register that it does support SEAMLESS_CUBE_MAP, however when the TE.SAMPLER_CONFIG1 VIVS_TE_SAMPLER_CONFIG1_SEAMLESS_CUBE_MAP bit is set on GC880 on iMX6DL, the result is corrupted image. In particular, the following ~112 dEQPs are affected and fail: dEQP-GLES2.functional.texture.filtering.cube.* This only happens on MX6DL GC880, MX6Q GC2000 and STM32MP1 GC400(GCnano) do not report the minorFeatures2 SEAMLESS_CUBE_MAP bit and ignore the TE_SAMPLER_CONFIG1 VIVS_TE_SAMPLER_CONFIG1_SEAMLESS_CUBE_MAP bit (note that ss->seamless_cube_map is unconditionally set by mesa at times even PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE returns 0), so there is no visible problem and there are no failing dEQP tests on the GC2000 and GCnano. This might imply that the minorFeatures2 SEAMLESS_CUBE_MAP has some different meaning on GC880 or the SEAMLESS_CUBE_MAP behaves differently on the GC880. This patch does not set the SEAMLESS_CUBE_MAP bit on hardware which does not indicate support for seamless cube map and on GC880, which results in reduction in failed dEQPs: 635 to 186 on GC880, 274 to 270 on GC2000 and no change on GC400(GCnano). Fixes: 8dd26fa2 ("etnaviv: support GL_ARB_seamless_cubemap_per_texture") Reviewed-by:
Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by:
Marek Vasut <marex@denx.de> Part-of: <!4865>
-
Rob Clark authored
On a6xx we need a 0,0 based scissor in the binning pass, but can use the blit-scissor to avoid restore/resolve of untouched pixels, and use the conditional execution if the IB to bin to skip bins with no geometry (due to the scissor). Signed-off-by:
Rob Clark <robdclark@chromium.org> Part-of: <!5021>
-
Rob Clark authored
Similar to what we do in postsched. It is useful for pre-RA sched to be a bit aware of things that would cause syncs. In particular for the tex fetches, since the vecN src/dst tends to limit postsched's ability to re-order them. Signed-off-by:
Rob Clark <robdclark@chromium.org> Part-of: <!4923>
-
Rob Clark authored
If an instruction's only use is as an output, and it increases register pressure, then try to avoid scheduling it until there are no other options. A semi-common pattern is `fragcolN.a = 1.0`, this pushes all these immed loads to the end of the shader. Signed-off-by:
Rob Clark <robdclark@chromium.org> Part-of: <!4923>
-