 07 May, 2019 35 commits


Sagar Ghuge authored
v1: Pass executable object from meson to test(Dylan Baker) v2: Ignore generated output files from git status(Matt Turner) Signedoffby: Sagar Ghuge <sagar.ghuge@intel.com> Reviewedby: Matt Turner <mattst88@gmail.com> Reviewedby: Dylan Baker <dylan@pnwbakers.com>

Mika Kuoppala authored
If we leave offset uninitialized, access to store will be random depending on stack value and can segfault. Signedoffby: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewedby: Sagar Ghuge <sagar.ghuge@intel.com> Reviewedby: Matt Turner <mattst88@gmail.com>

Mika Kuoppala authored
Signedoffby: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewedby: Sagar Ghuge <sagar.ghuge@intel.com> Reviewedby: Matt Turner <mattst88@gmail.com>

Sagar Ghuge authored
Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who mentored me through out this project. v2: Fix memory leaks and naming convention (Caio) v3: Fix meson changes (Dylan Baker) v4: Fix usage options (Matt Turner) Signedoffby: Sagar Ghuge <sagar.ghuge@intel.com> Reviewedby: Dylan Baker <dylan@pnwbakers.com> Reviewedby: Matt Turner <mattst88@gmail.com> Closes: mesa/mesa!141

Kenneth Graunke authored

Mike Blumenkrantz authored
this adds support for imports where the image data begins at an offset from the start of the buffer, as used in h/x264 fixes kwg/mesa#47Reviewedby: Kenneth Graunke <kenneth@whitecape.org>

Roland Scheidegger authored
Brian noticed there was an uninitialized var for the 8wide case and 128 bit blocks, which made it always crash. Likewise, the 64bit block case had another crash bug due to type mismatch. Color decode (used for all s3tc formats) also had a bogus shuffle for this case, leading to decode artifacts. Fix these all up, which makes the code actually work 8wide. Note that it's still not used  I've verified it works, and the generated assembly does look quite a bit simpler actually (2030% less instructions for the s3tc decode part with avx2), however in practice it still seems to be sligthly slower for some unknown reason (tested with openarena) on my haswell box, so for now continue to split things into 4wide vectors before decoding. Reviewedby: Brian Paul <brianp@vmware.com> Reviewedby: Jose Fonseca <jfonseca@vmware.com>

Vasily Khoruzhick authored
GP doesn't support sin/cos natively, so we have to lower them. Reviewedby: Qiang Yu <yuq825@gmail.com> Testedby: Qiang Yu <yuq825@gmail.com> Signedoffby: Vasily Khoruzhick <anarsoul@gmail.com>

Vasily Khoruzhick authored
Lower sin and cos using Nick's fast sin/cos approximation from https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fastandaccuratesinecosine/9648 It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests. Reviewedby: Connor Abbott <cwabbott0@gmail.com> Reviewedby: Qiang Yu <yuq825@gmail.com> Testedby: Qiang Yu <yuq825@gmail.com> Reviewedby: Christian Gmeiner <christian.gmeiner@gmail.com> Signedoffby: Vasily Khoruzhick <anarsoul@gmail.com>

Rob Clark authored
For a6xx, we construct/emit a single VS const state used for both binning pass and draw pass. So far we were mostly getting lucky that there were not (obvious) mismatches between the const_state (like different lowered immediates) between the binning and draw pass VS ir3_shader_variant. And I guess this situation will come up more as GS and tess is added into the equation. Since really everything about the const state is not specific to the variant, move this. The main exception is lowered immediates, but these are the last to appear in the layout, and it doesn't hurt for each new shader variant to just append any immed's it lowers to the end of the immediate state. Signedoffby: Rob Clark <robdclark@chromium.org>

Rob Clark authored
Next patch moves const_state to ir3_shader, before the compile context is created. So move the code around in prep to call it earlier. Signedoffby: Rob Clark <robdclark@chromium.org>

Rob Clark authored
They are really part of the constant state, and it will moving things from ir3_shader_variant to ir3_shader if we combine them. Signedoffby: Rob Clark <robdclark@chromium.org>

Rob Clark authored
Combine the offsets of differenet parts of the constant space with (what was formerly known as) ir3_driver_const_layout. Bunch of churn, but no functional change. Signedoffby: Rob Clark <robdclark@chromium.org>

Rob Clark authored
Move to ir3_compiler so it doesn't depend on the compile context. Prep work for moving constant state from variant (where we have compile context) to shader (where we do not). Signedoffby: Rob Clark <robdclark@chromium.org>

Lionel Landwerlin authored
Not quite sure what version of GCC/Clang produces errors (8.3.0 locally was fine). v2: also fix an integer literal issue (Karol) Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Tapani Pälli <tapani.palli@intel.com> (v1) Reviewedby: Eric Engestrom <eric.engestrom@intel.com>

Samuel Iglesias Gonsálvez authored
There are tests in CTS for alpha to coverage without a color attachment that are failing. This happens because we remove the shader color outputs when we don't have a valid color attachment for them, but when alpha to coverage is enabled we still want to preserve the the output at location 0 since we need the alpha component. In that case we will also need to create a null render target for RT 0. v2:  We already create a null rt when we don't have any, so reuse that for this case (Jason)  Simplify the code a bit (Iago) v3:  Take alpha to coverage from the key and don't tie this to depthonly rendering only, we want the same behavior if we have multiple render targets but the one at location 0 is not used. (Jason).  Rewrite commit message (Iago) v4:  Make sure we take into account the array length of the shader outputs, which we were no handling correctly either and make sure we also create null render targets for any invalid array entries too. v5:  Simplify removal of unused outputs by using rt_used[] so we don't have to special case alpha to coverage there too. Fixes the following CTS tests: dEQPVK.pipeline.multisample.alpha_to_coverage_no_color_attachment.* Signedoffby: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signedoffby: Iago Toral Quiroga <itoral@igalia.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Ian Romanick authored
No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8164367 > 8135551 (0.35%) instructions in affected programs: 3271235 > 3242419 (0.88%) helped: 13636 HURT: 90 helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1 helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97% HURT stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 2 HURT stats (rel) min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78% 95% mean confidence interval for instructions value: 2.13 2.07 95% mean confidence interval for instructions %change: 1.16% 1.13% Instructions are helped. total cycles in shared programs: 188719974 > 188586222 (0.07%) cycles in affected programs: 70415766 > 70282014 (0.19%) helped: 12563 HURT: 515 helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6 helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27% HURT stats (abs) min: 2 max: 54 x̄: 6.07 x̃: 4 HURT stats (rel) min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08% 95% mean confidence interval for cycles value: 10.56 9.90 95% mean confidence interval for cycles %change: 0.47% 0.45% Cycles are helped. LOST: 0 GAINED: 13 Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
In a previous verion of this patch, Jason commented, "Reassociating based on whether or not something has a constant value of 1.0 seems a bit sneaky. I think it's well within the rules but it seems like something that could bite you." That is possibly true. The reassociation will generate different results if fabs(b) >= 2**24 and fabs(c) < 0.5. The delta increases as fabs(c) approaches 0. However, i965 has done this same reassociation indirectly for years. We would previously allow nir_op_flrp on all preGen11 hardware even though Gen4 and Gen5 do not have a LRP instruction. Optimizations in nir_opt_algebraic would convert expressions like a+c(ba) into flrp(a, b, c). On Gen7+, the hardware performs the same arithmetic as a(1c)+bc. Gen6 seems to implement LRP as a+c(ba). On Gen4 and Gen5, we would lower LRP to a sequence of instructions that implement a(1c)+bc. The lowering happens after all constant folding, so we would litterally generate a 1+(1) instruction sequence in this scenario: one instruction to load either 1 or 1 in a register, and another instruction to add either 1 or 1 to it. This patch just cuts out the middle man. Do the reassociation that we've always done, but do it explicitly at a time when we can benefit from other optimizations. A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently" are restored by this patch. This includes a few shaders in ET:QW. I tried a similar thing for opencoded flrp(1, b, c), and it hurt instructions on 35 shaders for ILK without helping any. The helped / hurt cycles was about even. No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8172020 > 8164367 (0.09%) instructions in affected programs: 1089851 > 1082198 (0.70%) helped: 3285 HURT: 64 helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2 helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38% 95% mean confidence interval for instructions value: 2.32 2.25 95% mean confidence interval for instructions %change: 1.16% 1.09% Instructions are helped. total cycles in shared programs: 188758338 > 188719974 (0.02%) cycles in affected programs: 20004922 > 19966558 (0.19%) helped: 3012 HURT: 477 helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12 helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24% HURT stats (abs) min: 2 max: 328 x̄: 4.27 x̃: 4 HURT stats (rel) min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11% 95% mean confidence interval for cycles value: 11.38 10.62 95% mean confidence interval for cycles %change: 0.46% 0.41% Cycles are helped. Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
This doesn't help on Intel GPUs now because we always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
There is little effect on Intel GPUs now because we almost always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". No changes on any other Intel platforms. GM45 and Iron Lake had similar results. (Iron Lake shown) total cycles in shared programs: 188852500 > 188852484 (<.01%) cycles in affected programs: 14612 > 14596 (0.11%) helped: 4 HURT: 0 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11% 95% mean confidence interval for cycles value: 4.00 4.00 95% mean confidence interval for cycles %change: 0.13% 0.09% Cycles are helped. Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
This doesn't help on Intel GPUs now because we always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". No changes on any Intel platform. Before a number of large rebases this helped cycles in a couple shaders on Iron Lake and GM45. Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
No changes on any other Intel platforms. v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8189888 > 8153912 (0.44%) instructions in affected programs: 1199037 > 1163061 (3.00%) helped: 4124 HURT: 10 helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9 helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02% HURT stats (abs) min: 1 max: 2 x̄: 1.20 x̃: 1 HURT stats (rel) min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06% 95% mean confidence interval for instructions value: 8.84 8.56 95% mean confidence interval for instructions %change: 5.12% 4.77% Instructions are helped. total cycles in shared programs: 188606710 > 188426964 (0.10%) cycles in affected programs: 27505596 > 27325850 (0.65%) helped: 4026 HURT: 77 helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46 helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85% HURT stats (abs) min: 2 max: 376 x̄: 17.79 x̃: 6 HURT stats (rel) min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04% 95% mean confidence interval for cycles value: 44.75 42.87 95% mean confidence interval for cycles %change: 2.44% 2.17% Cycles are helped. LOST: 3 GAINED: 35 Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
If the magnitudes of #a and #b are such that (ba) won't lose too much precision, lower as a+c(ba). No changes on any other Intel platforms. v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8192503 > 8192383 (<.01%) instructions in affected programs: 18417 > 18297 (0.65%) helped: 68 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43% 95% mean confidence interval for instructions value: 2.48 1.05 95% mean confidence interval for instructions %change: 1.56% 0.63% Instructions are helped. total cycles in shared programs: 188662536 > 188661956 (<.01%) cycles in affected programs: 744476 > 743896 (0.08%) helped: 62 HURT: 0 helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6 helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06% 95% mean confidence interval for cycles value: 12.37 6.34 95% mean confidence interval for cycles %change: 0.48% 0.06% Cycles are helped. Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
Previously lower_flrp32 was only set for vertex shaders. Fragment shaders performed a(1c)+bc lowering during code generation. The shaders with loops hurt are SIMD8 and SIMD16 shaders for a textidentical fragment shader. v2: Rebase on 26391cce ("intel/compiler: Lower ffma on Gen4 and Gen5"). v3: Rebase on a004e95d ("radeonsi/nir: create si_nir_opts() helper") Iron Lake total instructions in shared programs: 8211385 > 8185974 (0.31%) instructions in affected programs: 2503898 > 2478487 (1.01%) helped: 9936 HURT: 921 helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2 helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11% HURT stats (abs) min: 1 max: 12 x̄: 3.24 x̃: 2 HURT stats (rel) min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89% 95% mean confidence interval for instructions value: 2.43 2.25 95% mean confidence interval for instructions %change: 1.41% 1.33% Instructions are helped. total cycles in shared programs: 188523186 > 188401198 (0.06%) cycles in affected programs: 71541604 > 71419616 (0.17%) helped: 11649 HURT: 1871 helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6 helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 13.38 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17% 95% mean confidence interval for cycles value: 9.42 8.63 95% mean confidence interval for cycles %change: 0.54% 0.50% Cycles are helped. total loops in shared programs: 852 > 856 (0.47%) loops in affected programs: 0 > 4 helped: 0 HURT: 4 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00% 95% mean confidence interval for loops value: 1.00 1.00 95% mean confidence interval for loops %change: 0.00% 0.00% Loops are HURT. LOST: 3 GAINED: 12 GM45 total instructions in shared programs: 5046407 > 5033694 (0.25%) instructions in affected programs: 1303584 > 1290871 (0.98%) helped: 5010 HURT: 464 helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2 helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08% HURT stats (abs) min: 1 max: 75 x̄: 3.39 x̃: 2 HURT stats (rel) min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87% 95% mean confidence interval for instructions value: 2.45 2.20 95% mean confidence interval for instructions %change: 1.40% 1.28% Instructions are helped. total cycles in shared programs: 128889476 > 128812366 (0.06%) cycles in affected programs: 44845402 > 44768292 (0.17%) helped: 6079 HURT: 940 helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8 helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 16.01 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17% 95% mean confidence interval for cycles value: 11.63 10.34 95% mean confidence interval for cycles %change: 0.58% 0.52% Cycles are helped. total loops in shared programs: 633 > 635 (0.32%) loops in affected programs: 0 > 2 helped: 0 HURT: 2 total spills in shared programs: 60 > 69 (15.00%) spills in affected programs: 54 > 63 (16.67%) helped: 0 HURT: 1 total fills in shared programs: 92 > 105 (14.13%) fills in affected programs: 80 > 93 (16.25%) helped: 0 HURT: 1 LOST: 15 GAINED: 15 Reviewedby: Jason Ekstrand <jason@jlekstrand.net> [v2] Reviewedby: Matt Turner <mattst88@gmail.com> [v2]

Ian Romanick authored
I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1c)+bc. On all other platforms 64bit nir_op_flrp and on Gen11 32bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 > 188647748 (<.01%) cycles in affected programs: 5096 > 5090 (0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
This pass will soon grow to include some optimizations that are difficult or impossible to implement correctly within nir_opt_algebraic. It also include the ability to generate strictly correct code which the current nir_opt_algebraic lowering lacks (though that could be changed). v2: Document the parameters to nir_lower_flrp. Rebase on top of 37663349 ("compiler/nir: add lowering for 16bit flrp") Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
All Intel platforms had similar results. (Skylake shown) total instructions in shared programs: 15342485 > 15337495 (0.03%) instructions in affected programs: 217456 > 212466 (2.29%) helped: 1539 HURT: 1 helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3 helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56% 95% mean confidence interval for instructions value: 3.39 3.09 95% mean confidence interval for instructions %change: 3.24% 2.96% Instructions are helped. total cycles in shared programs: 355734320 > 355728237 (<.01%) cycles in affected programs: 1851555 > 1845472 (0.33%) helped: 835 HURT: 575 helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14 helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81% HURT stats (abs) min: 1 max: 322 x̄: 48.40 x̃: 14 HURT stats (rel) min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43% 95% mean confidence interval for cycles value: 8.50 0.13 95% mean confidence interval for cycles %change: 0.48% 1.62% Inconclusive result (value mean confidence interval and %change mean confidence interval disagree). Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
v2: Augment the late optimization patterns with a couple preffma pass patterns. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15342982 > 15342485 (<.01%) instructions in affected programs: 56304 > 55807 (0.88%) helped: 235 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1 helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74% 95% mean confidence interval for instructions value: 2.31 1.92 95% mean confidence interval for instructions %change: 1.46% 1.09% Instructions are helped. total cycles in shared programs: 355734740 > 355734320 (<.01%) cycles in affected programs: 1028807 > 1028387 (0.04%) helped: 134 HURT: 104 helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8 helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61% HURT stats (abs) min: 1 max: 203 x̄: 29.06 x̃: 8 HURT stats (rel) min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46% 95% mean confidence interval for cycles value: 8.51 4.98 95% mean confidence interval for cycles %change: 0.35% 0.39% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10886815 > 10886390 (<.01%) instructions in affected programs: 36883 > 36458 (1.15%) helped: 147 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23% 95% mean confidence interval for instructions value: 3.12 2.67 95% mean confidence interval for instructions %change: 1.83% 1.38% Instructions are helped. total cycles in shared programs: 154188360 > 154186902 (<.01%) cycles in affected programs: 388094 > 386636 (0.38%) helped: 90 HURT: 58 helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15 helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83% HURT stats (abs) min: 1 max: 684 x̄: 31.97 x̃: 10 HURT stats (rel) min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51% 95% mean confidence interval for cycles value: 22.62 2.92 95% mean confidence interval for cycles %change: 0.68% 0.05% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8221239 > 8220357 (0.01%) instructions in affected programs: 54560 > 53678 (1.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3 helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17% 95% mean confidence interval for instructions value: 5.21 4.28 95% mean confidence interval for instructions %change: 2.23% 1.72% Instructions are helped. total cycles in shared programs: 188654442 > 188650364 (<.01%) cycles in affected programs: 1454384 > 1450306 (0.28%) helped: 204 HURT: 0 helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18 helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22% 95% mean confidence interval for cycles value: 22.38 17.60 95% mean confidence interval for cycles %change: 0.67% 0.46% Cycles are helped. Reviewedby: Matt Turner <mattst88@gmail.com>

Christian Gmeiner authored
At initial nir level all drivers are supporting ints. Signedoffby: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Christian Gmeiner authored
Driver which do not support native integers should use a lowering pass to go from integers to floats. Signedoffby: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Alyssa Rosenzweig authored
This commit does a fairly large cleanup of blend descriptors, although there should not be any functional changes. In particular, we split apart the Midgard and Bifrost blend descriptors, since they are radically different. From there, we can identify that the Midgard descriptor as previously written was really two render targets' descriptors stuck together. From this observation, we split the Midgard descriptor into what a single RT actually needs. This enables us to correctly dump blending configuration for MRT samples on Midgard. It also allows the Midgard and Bifrost blend code to peacefully coexist, with runtime selection rather than a #ifdef. So, as a bonus, this will help the future Bifrost effort, eliminating one major source of compiletime architectural divergence. Signedoffby: Alyssa Rosenzweig <alyssa@rosenzweig.io>

Vasily Khoruzhick authored
Reviewedby: Qiang Yu <yuq825@gmail.com> Signedoffby: Vasily Khoruzhick <anarsoul@gmail.com>

Vasily Khoruzhick authored
Reviewedby: Qiang Yu <yuq825@gmail.com> Signedoffby: Vasily Khoruzhick <anarsoul@gmail.com>

Vasily Khoruzhick authored
Neither GP nor PP in Mali4x0 support integers, so utilize new pass and set native_integers to true for now until this flag is dropped. Reviewedby: Qiang Yu <yuq825@gmail.com> Signedoffby: Vasily Khoruzhick <anarsoul@gmail.com>

Vasily Khoruzhick authored
This new pass lowers ints and bools to floats. It allows hardware that doesn't have native integers (e.g. Mali4x0) use the same code paths as modern hardware. It uses newly introduced pass to gather SSA types and should be used as late as possible. Reviewedby: Jason Ekstrand <jason@jlekstrand.net> Reviewedby: Christian Gmeiner <christian.gmeiner@gmail.com> Signedoffby: Vasily Khoruzhick <anarsoul@gmail.com>

 06 May, 2019 5 commits


Timothy Arceri authored
This fixes rendering issues with gun scopes which is rather important. Cc: "19.0" "19.1" <mesastable@lists.freedesktop.org> Ackedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100239

Vasily Khoruzhick authored
If PIPE_CAP_PACKED_UNIFORMS is not set uniforms are vec4 aligned, so lima_nir_lower_uniform_to_scalar should use first channel of vec4 for float uniforms. Reviewedby: Qiang Yu <yuq825@gmail.com> Signedoffby: Vasily Khoruzhick <anarsoul@gmail.com>

Erik FayeLund authored
We need to reprepare the middleend state to pick up changes to this state to react correctly to pausing/resuming streamout. So let's add a flush here. Signedoffby: Erik FayeLund <erik.fayelund@collabora.com> Fixes: ec8cbd79 "draw/softpipe: EXT_transform_feedback support (v2)" Reviewedby: Roland Scheidegger <sroland@vmware.com>

Erik FayeLund authored
We currently set this state in the drawmodule twice on each draw, but which trashes this state. So far that's not a problem, because we don't really do much from that function. But it turns out, we're going to have to do more; namely flush when the state changes. This will incur a large performance penalty due to the excessive setting. Instead, let's rely on the CSO caching making sure that llvmpipe_set_so_targets doesn't get called needlessly, and setup the state directly there instead. Signedoffby: Erik FayeLund <erik.fayelund@collabora.com> Reviewedby: Roland Scheidegger <sroland@vmware.com>

ChiaI Wu authored
Inline writes skip transfer map/unamp at the cost of an extra copy on the data during execbuffer. That is generally a win for small transfers. But the heuristic to use inline writes based on buffer sizes rather than transfer sizes makes little sense. More importantly, inline writes miss optimizations that are done for buffer transfers. Let's just use transfers. Signedoffby: ChiaI Wu <olvaffe@gmail.com> ReviewedBy: Gert Wollny <gert.wollny@collabora.com>
