 15 Jan, 2022 16 commits


Alyssa Rosenzweig authored
This reverts commit 29d319c7 . Now that we use nir_lower_bool_to_bitsize, we don't see 1bit booleans anymore, so the issue this fixed doesn't apply. Actually, that issue was (in part) why I started looking into boolean handling in the first place. Signedoffby: Alyssa Rosenzweig <alyssa@collabora.com>

Alyssa Rosenzweig authored
Instead of ingesting 1bit booleans and trying to force everything to be 16bit, except when it isn't, and creating a mess in the backend... just use the NIR pass designed to select bitsize for booleans. Yes, this means we need to handle more NIR instructions, but the handling is easier and the conversion is more obvious (except for some edge cases like 16bit vectorized b32csel). This generates noticeably better code, and the generated code will be easier to optimize. total instructions in shared programs: 90257 > 88941 (1.46%) instructions in affected programs: 49145 > 47829 (2.68%) helped: 201 HURT: 2 helped stats (abs) min: 1.0 max: 40.0 x̄: 6.57 x̃: 3 helped stats (rel) min: 0.29% max: 13.89% x̄: 2.57% x̃: 1.90% HURT stats (abs) min: 2.0 max: 2.0 x̄: 2.00 x̃: 2 HURT stats (rel) min: 2.15% max: 2.74% x̄: 2.45% x̃: 2.45% 95% mean confidence interval for instructions value: 7.71 5.26 95% mean confidence interval for instructions %change: 2.84% 2.20% Instructions are helped. total tuples in shared programs: 73740 > 72922 (1.11%) tuples in affected programs: 36564 > 35746 (2.24%) helped: 184 HURT: 7 helped stats (abs) min: 1.0 max: 74.0 x̄: 4.49 x̃: 2 helped stats (rel) min: 0.30% max: 16.67% x̄: 2.86% x̃: 1.89% HURT stats (abs) min: 1.0 max: 2.0 x̄: 1.29 x̃: 1 HURT stats (rel) min: 0.12% max: 12.50% x̄: 4.26% x̃: 3.33% 95% mean confidence interval for tuples value: 5.29 3.28 95% mean confidence interval for tuples %change: 3.06% 2.13% Tuples are helped. total clauses in shared programs: 15993 > 15928 (0.41%) clauses in affected programs: 2464 > 2399 (2.64%) helped: 35 HURT: 16 helped stats (abs) min: 1.0 max: 27.0 x̄: 2.31 x̃: 1 helped stats (rel) min: 0.49% max: 18.88% x̄: 7.63% x̃: 5.88% HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.79% max: 6.25% x̄: 1.91% x̃: 1.01% 95% mean confidence interval for clauses value: 2.46 0.09 95% mean confidence interval for clauses %change: 6.38% 2.90% Clauses are helped. total cycles in shared programs: 7622.13 > 7594.75 (0.36%) cycles in affected programs: 1078.67 > 1051.29 (2.54%) helped: 103 HURT: 4 helped stats (abs) min: 0.041665999999999315 max: 3.0833319999999986 x̄: 0.27 x̃: 0 helped stats (rel) min: 0.32% max: 21.05% x̄: 3.62% x̃: 2.44% HURT stats (abs) min: 0.0416669999999999 max: 0.0833330000000001 x̄: 0.05 x̃: 0 HURT stats (rel) min: 0.13% max: 7.14% x̄: 2.94% x̃: 2.25% 95% mean confidence interval for cycles value: 0.33 0.19 95% mean confidence interval for cycles %change: 4.14% 2.61% Cycles are helped. total arith in shared programs: 2762.46 > 2728.08 (1.24%) arith in affected programs: 1550.12 > 1515.75 (2.22%) helped: 197 HURT: 6 helped stats (abs) min: 0.041665999999999315 max: 3.0833319999999986 x̄: 0.18 x̃: 0 helped stats (rel) min: 0.32% max: 21.05% x̄: 2.93% x̃: 1.61% HURT stats (abs) min: 0.0416669999999999 max: 0.0833330000000001 x̄: 0.06 x̃: 0 HURT stats (rel) min: 0.13% max: 20.00% x̄: 5.78% x̃: 3.37% 95% mean confidence interval for arith value: 0.21 0.13 95% mean confidence interval for arith %change: 3.20% 2.15% Arith are helped. total quadwords in shared programs: 68155 > 67555 (0.88%) quadwords in affected programs: 27944 > 27344 (2.15%) helped: 151 HURT: 9 helped stats (abs) min: 1.0 max: 52.0 x̄: 4.09 x̃: 3 helped stats (rel) min: 0.23% max: 12.35% x̄: 2.87% x̃: 2.17% HURT stats (abs) min: 1.0 max: 5.0 x̄: 1.89 x̃: 1 HURT stats (rel) min: 0.20% max: 6.76% x̄: 1.91% x̃: 1.13% 95% mean confidence interval for quadwords value: 4.67 2.83 95% mean confidence interval for quadwords %change: 2.99% 2.21% Quadwords are helped. total threads in shared programs: 2232 > 2233 (0.04%) threads in affected programs: 1 > 2 (100.00%) helped: 1 HURT: 0 Signedoffby: Alyssa Rosenzweig <alyssa@collabora.com>

Alyssa Rosenzweig authored
lower_bool_to_bitsize can generate i2i32 from a 32bit source, which is trivial but needs to be handled explicitly to avoid going down the 8bit conversion path. Signedoffby: Alyssa Rosenzweig <alyssa@collabora.com>

Alyssa Rosenzweig authored
Bifrost's 16bit support comes in the form of vectorized instructions, so when we manipulate scalars, we usually replicate to both bottom and top halves of 32bit registers. Add an analysis pass that detects replication. Then, use that replication pass to optimize out useless swizzle instructions (by changing them to plain moves, which can be copypropped). This optimization is a slight shaderdb win on its own, and allows us to transition to lower_bool_to_bitsize without regressing shaderdb. total instructions in shared programs: 90323 > 90257 (0.07%) instructions in affected programs: 2513 > 2447 (2.63%) helped: 20 HURT: 0 helped stats (abs) min: 1.0 max: 16.0 x̄: 3.30 x̃: 2 helped stats (rel) min: 1.25% max: 11.11% x̄: 4.80% x̃: 4.29% 95% mean confidence interval for instructions value: 5.05 1.55 95% mean confidence interval for instructions %change: 6.06% 3.54% Instructions are helped. total tuples in shared programs: 73769 > 73740 (0.04%) tuples in affected programs: 1611 > 1582 (1.80%) helped: 17 HURT: 0 helped stats (abs) min: 1.0 max: 9.0 x̄: 1.71 x̃: 1 helped stats (rel) min: 0.58% max: 16.67% x̄: 4.80% x̃: 3.33% 95% mean confidence interval for tuples value: 2.70 0.71 95% mean confidence interval for tuples %change: 7.06% 2.54% Tuples are helped. total clauses in shared programs: 15997 > 15993 (0.03%) clauses in affected programs: 27 > 23 (14.81%) helped: 4 HURT: 0 helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 helped stats (rel) min: 7.69% max: 25.00% x̄: 18.17% x̃: 20.00% 95% mean confidence interval for clauses value: 1.00 1.00 95% mean confidence interval for clauses %change: 29.91% 6.44% Clauses are helped. total cycles in shared programs: 7623.13 > 7622.13 (0.01%) cycles in affected programs: 64.83 > 63.83 (1.54%) helped: 13 HURT: 0 helped stats (abs) min: 0.0416660000000002 max: 0.375 x̄: 0.08 x̃: 0 helped stats (rel) min: 1.02% max: 5.56% x̄: 2.82% x̃: 2.50% 95% mean confidence interval for cycles value: 0.13 0.02 95% mean confidence interval for cycles %change: 3.79% 1.85% Cycles are helped. total arith in shared programs: 2763.75 > 2762.46 (0.05%) arith in affected programs: 67.17 > 65.88 (1.92%) helped: 18 HURT: 0 helped stats (abs) min: 0.0416660000000002 max: 0.375 x̄: 0.07 x̃: 0 helped stats (rel) min: 1.02% max: 22.22% x̄: 5.68% x̃: 3.16% 95% mean confidence interval for arith value: 0.11 0.03 95% mean confidence interval for arith %change: 8.56% 2.80% Arith are helped. total quadwords in shared programs: 68173 > 68155 (0.03%) quadwords in affected programs: 1258 > 1240 (1.43%) helped: 14 HURT: 0 helped stats (abs) min: 1.0 max: 3.0 x̄: 1.29 x̃: 1 helped stats (rel) min: 0.42% max: 8.70% x̄: 3.88% x̃: 3.67% 95% mean confidence interval for quadwords value: 1.64 0.93 95% mean confidence interval for quadwords %change: 5.27% 2.49% Quadwords are helped. Signedoffby: Alyssa Rosenzweig <alyssa@collabora.com>

Alyssa Rosenzweig authored
This lets us avoid generating SWZ instructions. Those instructions could be constant folded but that complicates the replication analysis introduced in the next commit. Almost no shaderdb changes. quadwords HURT: shaders/glmark/122.shader_test MESA_SHADER_FRAGMENT: 718 > 722 (0.56%) total quadwords in shared programs: 68169 > 68173 (<.01%) quadwords in affected programs: 718 > 722 (0.56%) helped: 0 HURT: 1 Signedoffby: Alyssa Rosenzweig <alyssa@collabora.com>

Alyssa Rosenzweig authored
We'll generate this in a moment. Signedoffby: Alyssa Rosenzweig <alyssa@collabora.com>

Alyssa Rosenzweig authored
This is counterintuitive, but required for correct operation when CSEL.i32 takes a 1bit (stored 16bit) boolean argument. The impedance mismatch ultimately is between CSEL.b32 (nir's bcsel, nonexistant in the hardware) and the lowering CSEL.i32. However, a similar problem exists even with MUX.i32 which lacks a good way of zero/signextending booleans. Cherrypicked from my Valhall branch though the issue also affects Bifrost. Fixes piglit shaders@glslvsifbool on Bifrost. Unfortunately, shaderdb is quite unhappy :( The proper fix is to use lower_bool_to_bitsize, but that can't be backported to mesastable. total instructions in shared programs: 157539 > 158953 (0.90%) instructions in affected programs: 55621 > 57035 (2.54%) helped: 2 HURT: 259 helped stats (abs) min: 2.0 max: 2.0 x̄: 2.00 x̃: 2 helped stats (rel) min: 2.11% max: 2.67% x̄: 2.39% x̃: 2.39% HURT stats (abs) min: 1.0 max: 40.0 x̄: 5.47 x̃: 2 HURT stats (rel) min: 0.36% max: 16.13% x̄: 2.55% x̃: 1.59% 95% mean confidence interval for instructions value: 4.44 6.40 95% mean confidence interval for instructions %change: 2.21% 2.82% Instructions are HURT. total tuples in shared programs: 132322 > 132907 (0.44%) tuples in affected programs: 31806 > 32391 (1.84%) helped: 5 HURT: 152 helped stats (abs) min: 1.0 max: 2.0 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.39% max: 3.03% x̄: 1.70% x̃: 1.61% HURT stats (abs) min: 1.0 max: 42.0 x̄: 3.89 x̃: 2 HURT stats (rel) min: 0.29% max: 18.18% x̄: 2.50% x̃: 1.79% 95% mean confidence interval for tuples value: 2.88 4.58 95% mean confidence interval for tuples %change: 1.87% 2.85% Tuples are HURT. total clauses in shared programs: 28672 > 28698 (0.09%) clauses in affected programs: 869 > 895 (2.99%) helped: 1 HURT: 24 helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 helped stats (rel) min: 5.88% max: 5.88% x̄: 5.88% x̃: 5.88% HURT stats (abs) min: 1.0 max: 2.0 x̄: 1.12 x̃: 1 HURT stats (rel) min: 0.49% max: 33.33% x̄: 8.46% x̃: 3.59% 95% mean confidence interval for clauses value: 0.82 1.26 95% mean confidence interval for clauses %change: 3.84% 11.93% Clauses are HURT. total cycles in shared programs: 15119.04 > 15137.88 (0.12%) cycles in affected programs: 922.87 > 941.71 (2.04%) helped: 4 HURT: 79 helped stats (abs) min: 0.0416669999999999 max: 0.0833330000000001 x̄: 0.05 x̃: 0 helped stats (rel) min: 0.40% max: 3.17% x̄: 1.57% x̃: 1.35% HURT stats (abs) min: 0.041665999999999315 max: 1.75 x̄: 0.24 x̃: 0 HURT stats (rel) min: 0.30% max: 20.00% x̄: 2.83% x̃: 2.12% 95% mean confidence interval for cycles value: 0.17 0.29 95% mean confidence interval for cycles %change: 1.86% 3.37% Cycles are HURT. total arith in shared programs: 4922.71 > 4947.71 (0.51%) arith in affected programs: 1423.79 > 1448.79 (1.76%) helped: 5 HURT: 177 helped stats (abs) min: 0.0416669999999999 max: 0.0833330000000001 x̄: 0.06 x̃: 0 helped stats (rel) min: 0.40% max: 3.17% x̄: 1.82% x̃: 1.67% HURT stats (abs) min: 0.041665999999999315 max: 1.75 x̄: 0.14 x̃: 0 HURT stats (rel) min: 0.30% max: 22.22% x̄: 2.50% x̃: 1.52% 95% mean confidence interval for arith value: 0.11 0.17 95% mean confidence interval for arith %change: 1.86% 2.90% Arith are HURT. total quadwords in shared programs: 120605 > 120956 (0.29%) quadwords in affected programs: 26535 > 26886 (1.32%) helped: 6 HURT: 143 helped stats (abs) min: 1.0 max: 7.0 x̄: 2.83 x̃: 1 helped stats (rel) min: 0.93% max: 6.33% x̄: 2.29% x̃: 1.71% HURT stats (abs) min: 1.0 max: 21.0 x̄: 2.57 x̃: 2 HURT stats (rel) min: 0.34% max: 13.79% x̄: 2.02% x̃: 1.22% 95% mean confidence interval for quadwords value: 1.86 2.86 95% mean confidence interval for quadwords %change: 1.45% 2.24% Quadwords are HURT. total threads in shared programs: 4670 > 4669 (0.02%) threads in affected programs: 2 > 1 (50.00%) helped: 0 HURT: 1 Signedoffby: Alyssa Rosenzweig <alyssa@collabora.com> Cc: mesastable

Alyssa Rosenzweig authored
By software ABI, a blend shader is permitted to clobber registers R0R15. The scheduler needs to be aware of this, to avoid moving a write to one of these registers past the BLEND itself. Otherwise the schedule is invalid. This bug affects GLES3.0, but is rare enough in practice that we had missed it. It requires a fragment shader to write to multiple render targets with attached blend shaders, and have temporaries register allocated to R0R15 that are not read by the blend shader, but are sunk past the BLEND instruction by the scheduler. Prevents a regression when switching boolean representations on: dEQPGLES31.functional.shaders.builtin_functions.integer.uaddcarry.uvec4_lowp_fragment Signedoffby: Alyssa Rosenzweig <alyssa@collabora.com> Cc: mesastable

Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14545>

Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14545>

Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14545>

Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14545>

Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14545>

meta was the last user. Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14545>

Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14545>

Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14545>

 14 Jan, 2022 24 commits


For cmdstream traces from newer devices, we need to identify the gpu based on chipid. Signedoffby: Rob Clark <robdclark@chromium.org> Partof: <!14564>

A rule with executable_regexp tag would match every executable without this fix and force_glsl_extensions_warn would be always set to true which breaks some dEQP tests. Fixes: 5740ac37 ("xmlconfig: Add static driconfig support") Reviewedby: Rob Clark <robdclark@chromium.org> Partof: <!14562>

The switch statement in anv_descriptor_data_for_type() shows that this field isn't used on SKL+. On XeHP, this avoids assert failures by preventing isl_surf_fill_image_param() from being called. That function doesn't expect Tile4 surfaces. Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Partof: <!14546>

Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

v2: Fixup gpu_id computation, use minor of /dev/dri/* % 128 since we don't know whether we get card0 or renderD128 for instance. (Lionel) Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> (v1) Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

v2: Increase custom stall data (Felix) Fixup build (Felix) v3: Add API enum (Rohan) Fixup old comment (Rohan) Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <mesa/mesa!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

MultiGPU setups :) Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

This could lead to confusing if the 32bits roll over (every ~6mn or so). Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 4ef6698a ("intel/ds: drop timestamp correlation code") Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Rather than using always the same metric set, let the user choose when starting the producer with : INTEL_PERFETTO_METRIC_SET=RasterizerAndPixelBackend ./build/src/tool/pps/ppsproducer Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Will be useful to figure out when blorp operations end. Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

We'll want to copy timestamp buffers when commands buffers are resubmitted multiple times. v2: Merge a couple of #if GFX_VER >= 8 (Rohan) Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Rohan Garg <rohan.garg@intel.com> Ackedby: Antonio Caggiano <antonio.caggiano@collabora.com> Partof: <!13996>

As indicated by VkPhysicalDeviceFragmentShadingRatePropertiesKHR::fragmentShadingRateWithShaderSampleMask our implementation will clamp to 1x1 when reading samplemask or writing to samplemask. This fixes vkd3dproton tests test_sample_mask_dxbc & test_sample_mask_dxil Signedoffby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: b6332fc4 ("intel/compiler: handle coarse pixel in render target writes descriptors") Reviewedby: Jason Ekstrand <jason@jlekstrand.net> Partof: <!14553>

Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (anv) Reviewedby: Emma Anholt <emma@anholt.net> Reviewedby: Alejandro Piñeiro <apinheiro@igalia.com> (v3dv) ReviewedBy: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (lavapipe) Partof: <!14544>

This can be useful with VkBindImageMemorySwapchainInfoKHR. Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Emma Anholt <emma@anholt.net> Partof: <!14544>

Jesse Natalie authored
Reviewedby: Boris Brezillon <boris.brezillon@collabora.com> Partof: <!14504>
