CID: 1452261 Fixes: 04a99515 "intel/compiler: add ability to override shader's assembly" Signedoffby: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewedby: Tapani Pälli <tapani.palli@intel.com>

We started honouring the normalized_coords flag in the texture descriptor, but a bisection revealed that broke RECT textures  since we were *also* lowering them in the shader. So just remove the shaderbased lowering, use native RECT textures, and enjoy the nominal reduction in complexity and performance boost. Fixes: 3e47a118 ("panfrost: Add MALI_SAMP_NORM_COORDS flag") Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

It's a bit of a special case but that's fine. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

We only know how to promote aligned accesses, although theoretically we should be able to promote unaligned to swizzles in the future. Check this. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Different UBO reads have different shift requirements. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

We'll want to be smarter about unaligned reads, so let's get this code all in one place. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Helps the disassembly be clearer and maybe regalloc be smarter. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

It's the same thing, just shifted. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Our hardware supports independent (perRT) blending, but we need to route those settings through from Gallium. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

We'll need multiple branches for MRT, so we can't defer. Also, we need to track dependencies to ensure r0 is set to the correct value for each store_output. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

We need to treat fragment writes specially. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Fixes DATA_INVALID_FAULTs with multiple render targets. We do always allocate space for 4 cbufs just to keep things sane. This may not be strictly necessary. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Turns out the rt count is stuffed in here.. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

We don't have a good way to confirm this, but it parallels the kernel definitons for MMU faults nicely. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

This was supposed to read heap_start. It's the same value but still, better get this right. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

It's a chicken bit, as far as I can tell. Buck buck. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

We enable the standalone compiler, build the new files, and let it blast. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

What it says on the tin. Signedoffby: Ryan Houdek <Sonicadvance1@gmail.com> Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

IR printers. Signedoffby: Ryan Houdek <Sonicadvance1@gmail.com> Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

$ astyle *.c *.h style=linux s8 Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

We don't actually have a standalone compiler intree yet, but let's get prepared for when we do. Signedoffby: Ryan Houdek <Sonicadvance1@gmail.com> Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

The disassembler was updated to move common code with the compiler into a shared header. Additional, some new ops and control registers relating to rounding were added. Signedoffby: Ryan Houdek <Sonicadvance1@gmail.com> Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Now that panwrap has gained the ability to trace directly without dumping to the filesystem, there's no need to lug around this tool. I can assure you nobody will miss it. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Fixes: 863bdd1f ("pan/midgard: Break, not return, in disassembler") Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Otherwise we'll get memory junk. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

I don't think the hardware cares, but this adds a lot of noise to traces that we would rather not need to look at. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Some memory corruption / etc issues let to an accidental "fuzzing" of the disassembler ;) This uncovered some issues leading to a disassembler hang, so let's fix that. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

We want a defined ABI for tracing; this set of functions should be as small as strictly necessary to minimize panwrap shenanigans. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

This removes an unwanted dependency on panfrostjob.h Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Multiple spill moves share a single spill slot. Issue found in Krita. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

This allows us to have multiple spill moves, whereas otherwise for N spill moves, the first N1 would be clobbered. Issue found in Krita. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

It doesn't... make a ton of sense to need to assert and this routine is hotter than you might expect. Doesn't matter for release builds, of course. Signedoffby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Signedoffby: Eric Engestrom <eric.engestrom@intel.com> Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Eric Anholt <eric@anholt.net>

Signedoffby: Eric Engestrom <eric.engestrom@intel.com>

Signedoffby: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewedby: Jonathan Marek <jonathan@marek.ca>

Fixes: 797a2e4f ("etnaviv: update logic to determine uniform limits") Signedoffby: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewedby: Jonathan Marek <jonathan@marek.ca>

v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 > 16249576 (<.01%) instructions in affected programs: 232627 > 231048 (0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: 3.12 2.65 95% mean confidence interval for instructions %change: 1.20% 1.06% Instructions are helped. total cycles in shared programs: 365924392 > 365372103 (0.15%) cycles in affected programs: 59207053 > 58654764 (0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: 1426.41 653.77 95% mean confidence interval for cycles %change: 1.66% 1.15% Cycles are helped. total spills in shared programs: 8870 > 8871 (0.01%) spills in affected programs: 104 > 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all preGen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 > 11955635 (<.01%) instructions in affected programs: 94110 > 93509 (0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: 6.62 4.72 95% mean confidence interval for instructions %change: 2.27% 1.64% Instructions are helped. total cycles in shared programs: 179296340 > 178788044 (0.28%) cycles in affected programs: 51009603 > 50501307 (1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: 7649.38 3773.00 95% mean confidence interval for cycles %change: 2.71% 1.99% Cycles are helped. Reviewedby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

A common thing in many shaders: uniform vs { vec4 bones[...]; }; ... x = some_calculation(bones[i + 0]); y = some_calculation(bones[i + 1]); z = some_calculation(bones[i + 2]); This turns into stuff like vec1 32 ssa_12 = iadd ssa_11, ssa_0 vec1 32 ssa_13 = ishl ssa_12, ssa_3 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_15 = iadd ssa_11, ssa_1 vec1 32 ssa_16 = ishl ssa_15, ssa_3 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_18 = iadd ssa_11, ssa_2 vec1 32 ssa_19 = ishl ssa_18, ssa_3 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) By reassociating the shift and the add, we can reduce this to vec1 32 ssa_12 = ishl ssa_11, ssa_3 vec1 32 ssa_13 = iadd ssa_12, ssa_0 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_16 = iadd ssa_12, ssa_1 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_19 = iadd ssa_12, ssa_2 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) v2: Add some commentary from Rhys Perry's nearly identical patch. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16277758 > 16250704 (0.17%) instructions in affected programs: 1440284 > 1413230 (1.88%) helped: 4920 HURT: 6 helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4 helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79% HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3 HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55% 95% mean confidence interval for instructions value: 5.67 5.31 95% mean confidence interval for instructions %change: 2.26% 2.16% Instructions are helped. total cycles in shared programs: 367118526 > 365895358 (0.33%) cycles in affected programs: 93504145 > 92280977 (1.31%) helped: 2754 HURT: 1269 helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16 helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12% HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9 HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75% 95% mean confidence interval for cycles value: 387.31 220.78 95% mean confidence interval for cycles %change: 2.11% 1.68% Cycles are helped. LOST: 1 GAINED: 1 Reviewedby: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>
