 15 Aug, 2019 1 commit


Danylo Piliaiev authored
CID: 1452261 Fixes: 04a99515 "intel/compiler: add ability to override shader's assembly" Signedoffby: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewedby: Tapani Pälli <tapani.palli@intel.com>

 14 Aug, 2019 39 commits


Alyssa Rosenzweig authored
We started honouring the normalized_coords flag in the texture descriptor, but a bisection revealed that broke RECT textures  since we were *also* lowering them in the shader. So just remove the shaderbased lowering, use native RECT textures, and enjoy the nominal reduction in complexity and performance boost. Fixes: 3e47a118 ("panfrost: Add MALI_SAMP_NORM_COORDS flag") Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
It's a bit of a special case but that's fine. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
We only know how to promote aligned accesses, although theoretically we should be able to promote unaligned to swizzles in the future. Check this. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Different UBO reads have different shift requirements. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
We'll want to be smarter about unaligned reads, so let's get this code all in one place. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Helps the disassembly be clearer and maybe regalloc be smarter. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
It's the same thing, just shifted. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Our hardware supports independent (perRT) blending, but we need to route those settings through from Gallium. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
We'll need multiple branches for MRT, so we can't defer. Also, we need to track dependencies to ensure r0 is set to the correct value for each store_output. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
We need to treat fragment writes specially. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Fixes DATA_INVALID_FAULTs with multiple render targets. We do always allocate space for 4 cbufs just to keep things sane. This may not be strictly necessary. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Turns out the rt count is stuffed in here.. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
We don't have a good way to confirm this, but it parallels the kernel definitons for MMU faults nicely. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
This was supposed to read heap_start. It's the same value but still, better get this right. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
It's a chicken bit, as far as I can tell. Buck buck. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
We enable the standalone compiler, build the new files, and let it blast. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
What it says on the tin. Signedoffby: Ryan Houdek <Sonicadvance1@gmail.com> Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
IR printers. Signedoffby: Ryan Houdek <Sonicadvance1@gmail.com> Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
$ astyle *.c *.h style=linux s8 Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
We don't actually have a standalone compiler intree yet, but let's get prepared for when we do. Signedoffby: Ryan Houdek <Sonicadvance1@gmail.com> Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
The disassembler was updated to move common code with the compiler into a shared header. Additional, some new ops and control registers relating to rounding were added. Signedoffby: Ryan Houdek <Sonicadvance1@gmail.com> Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Now that panwrap has gained the ability to trace directly without dumping to the filesystem, there's no need to lug around this tool. I can assure you nobody will miss it. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Fixes: 863bdd1f ("pan/midgard: Break, not return, in disassembler") Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Otherwise we'll get memory junk. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
I don't think the hardware cares, but this adds a lot of noise to traces that we would rather not need to look at. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Some memory corruption / etc issues let to an accidental "fuzzing" of the disassembler ;) This uncovered some issues leading to a disassembler hang, so let's fix that. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
We want a defined ABI for tracing; this set of functions should be as small as strictly necessary to minimize panwrap shenanigans. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
This removes an unwanted dependency on panfrostjob.h Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
Multiple spill moves share a single spill slot. Issue found in Krita. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
This allows us to have multiple spill moves, whereas otherwise for N spill moves, the first N1 would be clobbered. Issue found in Krita. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Alyssa Rosenzweig authored
It doesn't... make a ton of sense to need to assert and this routine is hotter than you might expect. Doesn't matter for release builds, of course. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Marek Olšák authored
Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Eric Engestrom authored
Signedoffby: Eric Engestrom <eric.engestrom@intel.com> Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Eric Anholt <eric@anholt.net>

Eric Engestrom authored
Signedoffby: Eric Engestrom <eric.engestrom@intel.com>

Christian Gmeiner authored
Signedoffby: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewedby: Jonathan Marek <jonathan@marek.ca>

Christian Gmeiner authored
Fixes: 797a2e4f ("etnaviv: update logic to determine uniform limits") Signedoffby: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewedby: Jonathan Marek <jonathan@marek.ca>

Ian Romanick authored
v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 > 16249576 (<.01%) instructions in affected programs: 232627 > 231048 (0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: 3.12 2.65 95% mean confidence interval for instructions %change: 1.20% 1.06% Instructions are helped. total cycles in shared programs: 365924392 > 365372103 (0.15%) cycles in affected programs: 59207053 > 58654764 (0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: 1426.41 653.77 95% mean confidence interval for cycles %change: 1.66% 1.15% Cycles are helped. total spills in shared programs: 8870 > 8871 (0.01%) spills in affected programs: 104 > 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all preGen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 > 11955635 (<.01%) instructions in affected programs: 94110 > 93509 (0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: 6.62 4.72 95% mean confidence interval for instructions %change: 2.27% 1.64% Instructions are helped. total cycles in shared programs: 179296340 > 178788044 (0.28%) cycles in affected programs: 51009603 > 50501307 (1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: 7649.38 3773.00 95% mean confidence interval for cycles %change: 2.71% 1.99% Cycles are helped. Reviewedby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Ian Romanick authored
A common thing in many shaders: uniform vs { vec4 bones[...]; }; ... x = some_calculation(bones[i + 0]); y = some_calculation(bones[i + 1]); z = some_calculation(bones[i + 2]); This turns into stuff like vec1 32 ssa_12 = iadd ssa_11, ssa_0 vec1 32 ssa_13 = ishl ssa_12, ssa_3 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_15 = iadd ssa_11, ssa_1 vec1 32 ssa_16 = ishl ssa_15, ssa_3 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_18 = iadd ssa_11, ssa_2 vec1 32 ssa_19 = ishl ssa_18, ssa_3 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) By reassociating the shift and the add, we can reduce this to vec1 32 ssa_12 = ishl ssa_11, ssa_3 vec1 32 ssa_13 = iadd ssa_12, ssa_0 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_16 = iadd ssa_12, ssa_1 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_19 = iadd ssa_12, ssa_2 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) v2: Add some commentary from Rhys Perry's nearly identical patch. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16277758 > 16250704 (0.17%) instructions in affected programs: 1440284 > 1413230 (1.88%) helped: 4920 HURT: 6 helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4 helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79% HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3 HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55% 95% mean confidence interval for instructions value: 5.67 5.31 95% mean confidence interval for instructions %change: 2.26% 2.16% Instructions are helped. total cycles in shared programs: 367118526 > 365895358 (0.33%) cycles in affected programs: 93504145 > 92280977 (1.31%) helped: 2754 HURT: 1269 helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16 helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12% HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9 HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75% 95% mean confidence interval for cycles value: 387.31 220.78 95% mean confidence interval for cycles %change: 2.11% 1.68% Cycles are helped. LOST: 1 GAINED: 1 Reviewedby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>
