 17 Jun, 2018 1 commit


Timothy Arceri authored
ARB_texture_float references US Patent #6,650,327 [1] which has a filing date of June 16 1998. According to [2], patents filed after 1995 expire 20 years from the filing date, giving an expiration of June 17 2018. [1] https://www.google.com/patents/US6650327 [2] https://en.wikipedia.org/wiki/Term_of_patent_in_the_United_StatesReviewedby: Matt Turner <mattst88@gmail.com> Reviewedby: Ian Romanick <ian.d.romanick@intel.com>

 16 Jun, 2018 24 commits


José Casanova Crespo authored
Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
As the previous use of shuffle_32bit_load_result_to_64bit_data had a source/destination overlap for 64bit. Now a temporary destination is used for 64bit cases to use shuffle_from_32bit_read that doesn't handle src/dst overlaps. Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
Previously, the shuffle function had a source/destination overlap that needs to be avoided to use shuffle_from_32bit_read. As we can use for the shuffle destination the destination of removed MOVs. This change also avoids the internal MOVs done by the previous shuffle to deal with possible overlaps. Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
shuffle_from_32bit_read manages 32bit reads to 32bit destination in the same way that the previous loop so now we just call the new function for all bitsizes, simplifying also the 64bit load_input. v2: Add comment about future 16bit support (Jason Ekstrand) Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
This implementation avoids two unneeded MOVs for each 64bit component. One was done in the old shuffle, to avoid cases of src/dst overlap but this is not the case. And the removed MOV was already being being done in the shuffle. Copy propagation wasn't able to remove them because shuffle destination values are defined with partial writes because they have stride == 2. v2: Reword commit log summary (Jason Ekstrand) Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
do_untyped_vector_read is used at load_ssbo and load_shared. The previous MOVs are removed because shuffle_from_32bit_read can handle storing the shuffle results in the expected destination just using the proper offset. Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
Using shuffle_from_32bit_read instead of 16bit shuffle functions avoids the need of retype. At the same time new function are ready for 8bit type SSBO reads. Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
shuffle_from_32bit_read can manage the shuffle/unshuffle needed for different 8/16/32/64 bitsizes at VARYING PULL CONSTANT LOAD. To get the specific component the first_component parameter is used. In the case of the previous 16bit shuffle, the shuffle operation was generating not needed MOVs where its results where never used. This behaviour passed unnoticed on SIMD16 because dead_code_eliminate pass removed the generated instructions but for SIMD8 they cound't be removed because of being partial writes. Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
These new shuffle functions deal with the shuffle/unshuffle operations needed for read/write operations using 32bit components when the read/written components have a different bitsize (8, 16, 64bits). Shuffle from 32bit to 32bit becomes a simple MOV. shuffle_src_to_dst takes care of doing a shuffle when source type is smaller than destination type and an unshuffle when source type is bigger than destination. So this new read/write functions just need to call shuffle_src_to_dst assuming that writes use a 32bit destination and reads use a 32bit source. As shuffle_for_32bit_write/from_32bit_read components take components in unit of source/destination types and shuffle_src_to_dst takes units of the smallest type component, we adjust components and first_component parameters. To enable this new functions it is needed than there is no source/destination overlap in the case of shuffle_from_32bit_read. That never happens on shuffle_for_32bit_write as it allocates a new destination register as it was at shuffle_64bit_data_for_32bit_write. v2: Reword commit log and add comments to explain why first_component and components parameters are adjusted. (Jason Ekstrand) Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

José Casanova Crespo authored
This new function takes care of shuffle/unshuffle components of a particular bitsize in components with a different bitsize. If source type size is smaller than destination type size the operation needed is a component shuffle. The opposite case would be an unshuffle. Component units are measured in terms of the smaller type between source and destination. As we are un/shuffling the smaller components from/into a bigger one. The operation allows to skip first_component number of components from the source. Shuffle MOVs are retyped using integer types avoiding problems with denorms and float types if source and destination bitsize is different. This allows to simplify uses of shuffle functions that are dealing with these retypes individually. Now there is a new restriction so source and destination can not overlap anymore when calling this shuffle function. Following patches that migrate to use this new function will take care individually of avoiding source and destination overlaps. v2: (Jason Ekstrand)  Rewrite overlap asserts.  Manage type_sz(src.type) == type_sz(dst.type) case using MOVs from source to dest. This works for 64bit to 64bits operation that on Gen7 as it doesn't support Q registers.  Explain that components units are based in the smallest type. v3:  Fix unshuffle overlap assert (Jason Ekstrand) Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Jose Fonseca authored
https://ci.appveyor.com/project/jrfonseca/mesa/build/47Reviewedby: Roland Scheidegger <sroland@vmware.com>

Bas Nieuwenhuizen authored
Somehow valgrind misses that the value is initialized by the ioctl. Reviewedby: Dave Airlie <airlied@redhat.com> Reviewedby: Samuel Pitoiset <samuel.pitoiset@gmail.com>

Samuel Pitoiset authored
The primitive ID is NULL and this generates an invalid select instruction which crashes because one operand is NULL. This fixes crashes in The Long Journey Home, Quantum Break and Just Cause 3 with DXVK. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106756 CC: <mesastable@lists.freedesktop.org> Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Ian Romanick authored
nir_ssa_def::parent_instr and nir_src::parent_instr have the same name, but they mean really different things. I choose to save the next person the hour+ that I just spent figuring that out. Even now that I know, I doubt I'd notice in code review that someone typed foo>parent_instr when they actually meant foo>ssa>parent_instr. v2: Minor wording tweak in nir_ssa_def::parent_instr. Suggested by Jason. Signedoffby: Ian Romanick <ian.d.romanick@intel.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Ian Romanick authored
Skylake total instructions in shared programs: 14399081 > 14399010 (<.01%) instructions in affected programs: 26961 > 26890 (0.26%) helped: 57 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 1.25 x̃: 1 helped stats (rel) min: 0.16% max: 0.80% x̄: 0.30% x̃: 0.18% 95% mean confidence interval for instructions value: 1.50 0.99 95% mean confidence interval for instructions %change: 0.35% 0.25% Instructions are helped. total cycles in shared programs: 532978307 > 532976050 (<.01%) cycles in affected programs: 468629 > 466372 (0.48%) helped: 33 HURT: 20 helped stats (abs) min: 3 max: 360 x̄: 116.52 x̃: 98 helped stats (rel) min: 0.06% max: 3.63% x̄: 1.66% x̃: 1.27% HURT stats (abs) min: 2 max: 172 x̄: 79.40 x̃: 43 HURT stats (rel) min: 0.04% max: 3.02% x̄: 1.48% x̃: 0.44% 95% mean confidence interval for cycles value: 81.29 3.88 95% mean confidence interval for cycles %change: 1.07% 0.12% Inconclusive result (%change mean confidence interval includes 0). All Gen6+ platforms, except Ivy Bridge, had similar results. (Haswell shown) total instructions in shared programs: 12973897 > 12973838 (<.01%) instructions in affected programs: 25970 > 25911 (0.23%) helped: 55 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.07 x̃: 1 helped stats (rel) min: 0.16% max: 0.62% x̄: 0.28% x̃: 0.18% 95% mean confidence interval for instructions value: 1.14 1.00 95% mean confidence interval for instructions %change: 0.32% 0.24% Instructions are helped. total cycles in shared programs: 410355841 > 410352067 (<.01%) cycles in affected programs: 578454 > 574680 (0.65%) helped: 47 HURT: 5 helped stats (abs) min: 3 max: 360 x̄: 85.74 x̃: 18 helped stats (rel) min: 0.05% max: 3.68% x̄: 1.18% x̃: 0.38% HURT stats (abs) min: 2 max: 242 x̄: 51.20 x̃: 4 HURT stats (rel) min: <.01% max: 0.45% x̄: 0.15% x̃: 0.11% 95% mean confidence interval for cycles value: 104.89 40.27 95% mean confidence interval for cycles %change: 1.45% 0.66% Cycles are helped. Ivy Bridge total instructions in shared programs: 11679351 > 11679301 (<.01%) instructions in affected programs: 28208 > 28158 (0.18%) helped: 50 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.12% max: 0.54% x̄: 0.23% x̃: 0.16% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %change: 0.27% 0.19% Instructions are helped. total cycles in shared programs: 257445362 > 257444662 (<.01%) cycles in affected programs: 419338 > 418638 (0.17%) helped: 40 HURT: 3 helped stats (abs) min: 1 max: 170 x̄: 65.05 x̃: 24 helped stats (rel) min: 0.02% max: 3.51% x̄: 1.26% x̃: 0.41% HURT stats (abs) min: 2 max: 1588 x̄: 634.00 x̃: 312 HURT stats (rel) min: 0.05% max: 2.97% x̄: 1.21% x̃: 0.62% 95% mean confidence interval for cycles value: 97.96 65.41 95% mean confidence interval for cycles %change: 1.56% 0.62% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. v2: Move 'if (cond != BRW_CONDITIONAL_Z && cond != BRW_CONDITIONAL_NZ)' check outside the loop. Suggested by Iago. Signedoffby: Ian Romanick <ian.d.romanick@intel.com>

Ian Romanick authored
Signedoffby: Ian Romanick <ian.d.romanick@intel.com>

Ian Romanick authored
Signedoffby: Ian Romanick <ian.d.romanick@intel.com>

Ian Romanick authored
All of the affected shaders are geometry shaders... the same ones from the similar fs changes. The "No changes on any other platforms" comment below is not quite right. Without the previous change to register coalescing, this optimization caused quite a few regressions in tests that either used gl_ClipVertex or used different interpolation modes. I observed that with both patches applied, glsl1.10/execution/interpolation/interpolationnonegl_BackSecondaryColorsmoothvertex.shader_test was one instruction shorter. I suspect other shaders would be similarly affected. Since this is all based on NOS, shaderdb does not reflect it. Haswell total instructions in shared programs: 12954955 > 12954918 (<.01%) instructions in affected programs: 3603 > 3566 (1.03%) helped: 37 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %change: 2.30% 1.69% Instructions are helped. total cycles in shared programs: 410012108 > 410012098 (<.01%) cycles in affected programs: 3540 > 3530 (0.28%) helped: 5 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %change: 0.28% 0.28% Cycles are helped. Ivy Bridge total instructions in shared programs: 11679387 > 11679351 (<.01%) instructions in affected programs: 3292 > 3256 (1.09%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %change: 2.34% 1.74% Instructions are helped. No changes on any other platforms. Signedoffby: Ian Romanick <ian.d.romanick@intel.com>

Ian Romanick authored
This prevents regressions in a bunch of clipping and interpolation tests caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV). Signedoffby: Ian Romanick <ian.d.romanick@intel.com>

Ian Romanick authored
fs_visitor::set_gs_stream_control_data_bits generates some code like "control_data_bits  stream_id << ((2 * (vertex_count  1)) % 32)" as part of EmitVertex. The first time this (dynamically) occurs in the shader, control_data_bits is zero. Many times we can determine this statically and various optimizations will collaborate to make one of the OR operands literal zero. Converting the OR to a MOV usually allows it to be copypropagated away. However, this does not happen in at least some shaders (in the assembly output of shaders/closed/UnrealEngine4/EffectsCaveDemo/301.shader_test, search for shl). All of the affected shaders are geometry shaders. Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14375452 > 14375413 (<.01%) instructions in affected programs: 6422 > 6383 (0.61%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.14% max: 2.56% x̄: 1.91% x̃: 2.56% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %change: 2.26% 1.57% Instructions are helped. total cycles in shared programs: 531981179 > 531980555 (<.01%) cycles in affected programs: 27493 > 26869 (2.27%) helped: 39 HURT: 0 helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 helped stats (rel) min: 0.60% max: 7.92% x̄: 5.94% x̃: 7.92% 95% mean confidence interval for cycles value: 16.00 16.00 95% mean confidence interval for cycles %change: 6.98% 4.90% Cycles are helped. No changes on earlier platforms. Signedoffby: Ian Romanick <ian.d.romanick@intel.com>

 15 Jun, 2018 15 commits


Eric Anholt authored
The min/maxes ended up producing a negative clip width/height for dEQPGLES3.functional.fragment_ops.scissor.outside_render_line. Just make sure they stay at 0 (or v3d 3.x's workaround) if that happens.

Eric Anholt authored
Fixes failing tests in dEQPGLES3.functional.texture.shadow

Eric Anholt authored
Fixes simulator assertion failures in dEQPGLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment and similar complicated cases.

Eric Anholt authored
The docs called this field "uses both center W and centroid W", but actually it's "do you need center W even if varyings don't obviously call for it?" Fixes dEQPGLES3.functional.shaders.builtin_variable.fragcoord_w

Dylan Baker authored

Dylan Baker authored

Rafael Antognolli authored
getopt_long flag parameter is an int pointer, so if we use bool to store those values, when getopt_long writes to one of them, it might end up overwriting the next one. Reviewedby: Ian Romanick <ian.d.romanick@intel.com>

Samuel Pitoiset authored
This fixes a rendering regression with RoTR. This reverts commit 4bdad9fa. Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Samuel Pitoiset authored
We don't enable CMASK for linear surfaces and addrlib only enables DCC for tiling surfaces. Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Samuel Pitoiset authored
Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Samuel Pitoiset authored
This allows to run the LLVM verifier pass. Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Samuel Pitoiset authored
Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Samuel Pitoiset authored
And replace _regs by _metadata because it makes more sense. Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Samuel Pitoiset authored
I don't think that matter much to emit both values and that makes the code a bit simpler. Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Samuel Pitoiset authored
It's unnecessary to update the fast depth/stencil clear values if the fast cleared depth/stencil image isn't currently bound. Signedoffby: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
