 16 Feb, 2019 3 commits


Timothy Arceri authored
Commit 8d822246 caused substantially more URB messages in geometry and tessellation shaders (due to enabling nir_lower_io_to_scalar_early). This combines io again to avoid this regression while still allowing link time optimisation of components. Shaderdb results (SKL): total instructions in shared programs: 13109035 > 13107191 (0.01%) instructions in affected programs: 66278 > 64434 (2.78%) helped: 242 HURT: 13 total cycles in shared programs: 332090418 > 332094364 (<.01%) cycles in affected programs: 285477 > 289423 (1.38%) helped: 39 HURT: 215 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510

Timothy Arceri authored
Once linking opts are done this pass recombines varying components. This patch is loosely based on Connor's vectorize alu pass. V2: skip fragment shaders V3:  dont accidentally vectorise local vars  pass correct component to create_new_store()

Timothy Arceri authored
This creates a new glsl_type with the specified number on components. We will use this in the following patch when vectorising io.

 15 Feb, 2019 37 commits


Alok Hota authored
Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
Reduce stack space used by clipper, which had lead to crashes in some versions for MSVC Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
 Ensure all threads have optimal floatingpoint control state  Disable autogeneration of fused FP ops for VERTEX shader stage  Disable "fast" FP ops for VERTEX shader stage Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
Reduces amount of compile churn when testing different default values Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
The intrinsic returns the number of leading zeros, not the bit number of the first nonzero, so just flip it based on the mask size Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
Fixes crashes on some compute shaders when running on AVX512 Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
 Was not useful to inline in release builds  FORCEINLINE can be used if absolutely necessary Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Alok Hota authored
Fulfills an unused internal interface Reviewedby: Bruce Cherniak <bruce.cherniak@intel.com>

Bas Nieuwenhuizen authored
normalized and scaled formats also return floats. Fixes: 4b3549c0 ("radv: reduce the number of loaded channels for vertex input fetches") Reviewedby: Samuel Pitoiset <samuel.pitoiset@gmail.com>

Ian Romanick authored
All of the affected shaders are Unreal4 demos. All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437170 > 15437001 (<.01%) instructions in affected programs: 21536 > 21367 (0.78%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4 helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: 4.07 3.79 95% mean confidence interval for instructions %change: 0.83% 0.77% Instructions are helped. total cycles in shared programs: 383007896 > 383007378 (<.01%) cycles in affected programs: 158640 > 158122 (0.33%) helped: 38 HURT: 4 helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6 helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19% HURT stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 HURT stats (rel) min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: 16.90 7.77 95% mean confidence interval for cycles %change: 0.39% 0.19% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8213746 > 8213745 (<.01%) instructions in affected programs: 127 > 126 (0.79%) helped: 1 HURT: 0 total cycles in shared programs: 187734146 > 187734144 (<.01%) cycles in affected programs: 2132 > 2130 (0.09%) helped: 1 HURT: 0 Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Ian Romanick authored
Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec says: It is undefined to convert a negative floatingpoint value to an uint. Assuming that (uint)some_float behaves like (uint)(int)some_float allows some optimizations in the i965 backend to proceed. This basically undoes the small amount of damage done by "intel/compiler: Avoid propagating inequality cmods if types are different". v2: Replicate part of the commit message as a comment in the code. Suggested by Jason. shaderdb results compairing *before* "intel/compiler: Avoid propagating inequality cmods if types are different" and after this commit: Skylake total cycles in shared programs: 383007996 > 383007896 (<.01%) cycles in affected programs: 85208 > 85108 (0.12%) helped: 13 HURT: 8 helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6 helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14% HURT stats (abs) min: 2 max: 12 x̄: 5.00 x̃: 3 HURT stats (rel) min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07% 95% mean confidence interval for cycles value: 9.31 0.21 95% mean confidence interval for cycles %change: 0.24% <.01% Cycles are helped. Broadwell total cycles in shared programs: 415251194 > 415251370 (<.01%) cycles in affected programs: 83750 > 83926 (0.21%) helped: 7 HURT: 13 helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12 helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30% HURT stats (abs) min: 2 max: 36 x̄: 19.69 x̃: 22 HURT stats (rel) min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47% 95% mean confidence interval for cycles value: 0.76 16.84 95% mean confidence interval for cycles %change: <.01% 0.37% Inconclusive result (%change mean confidence interval includes 0). Haswell total instructions in shared programs: 13823885 > 13823886 (<.01%) instructions in affected programs: 2249 > 2250 (0.04%) helped: 0 HURT: 1 total cycles in shared programs: 390094243 > 390094001 (<.01%) cycles in affected programs: 85640 > 85398 (0.28%) helped: 15 HURT: 6 helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18 helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42% HURT stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04% 95% mean confidence interval for cycles value: 17.36 5.69 95% mean confidence interval for cycles %change: 0.44% 0.14% Cycles are helped. Ivy Bridge total cycles in shared programs: 180986448 > 180986552 (<.01%) cycles in affected programs: 34835 > 34939 (0.30%) helped: 0 HURT: 10 HURT stats (abs) min: 2 max: 18 x̄: 10.40 x̃: 10 HURT stats (rel) min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30% 95% mean confidence interval for cycles value: 4.67 16.13 95% mean confidence interval for cycles %change: 0.20% 0.35% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154603969 > 154603970 (<.01%) cycles in affected programs: 171514 > 171515 (<.01%) helped: 25 HURT: 14 helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04% HURT stats (abs) min: 1 max: 8 x̄: 3.29 x̃: 3 HURT stats (rel) min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11% 95% mean confidence interval for cycles value: 0.91 0.96 95% mean confidence interval for cycles %change: 0.02% 0.04% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Matt Turner authored
v2 (idr): Move adding the test to after adding the fix. Reordering the two commits prevents possible headaches for gitbisect with scripts that always do 'ninja check'. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404Reviewedby: Ian Romanick <ian.d.romanick@intel.com>

Matt Turner authored
v2: Fix silly bug in logic. s//&&/ All but one of the affected shaders is in an Unreal4 demo. The other is in Tomb Raider. All of the cases that Ian investigated appear to be sequences like the following if (int(uint(some_float)) < 0) /* other relations too */ ... At least in Tomb Raider, it's not obvious that this sequence came from the original shader. In some of the Unreal demos, the shader contains code like if (int(uint(textureLod(...))) > 0) ... which explicitly generates the offending sequence. All Gen6+ platforms had similar results (Skylake shown): total instructions in shared programs: 15437170 > 15437187 (<.01%) instructions in affected programs: 4492 > 4509 (0.38%) helped: 0 HURT: 17 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %change: 0.57% 0.75% Instructions are HURT. total cycles in shared programs: 383007996 > 383007992 (<.01%) cycles in affected programs: 20542 > 20538 (0.02%) helped: 6 HURT: 7 helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6 helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27% 95% mean confidence interval for cycles value: 3.30 2.69 95% mean confidence interval for cycles %change: 0.19% 0.19% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404Reviewedby: Ian Romanick <ian.d.romanick@intel.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net> Testedby: nagrigoriadis@gmail.com Testedby: Danylo Piliaiev <danylo.piliaiev@gmail.com>

Matt Turner authored
We emit an FBL instruction which only exists since Gen7. This prevents the test from segfaulting when run with TEST_DEBUG=1. Reviewedby: Ian Romanick <ian.d.romanick@intel.com>

James Zhu authored
Add compute shader initilization, assign and cleanup in vl_compositor API. Set video compositor compute shader render as default when pipe support it. Signedoffby: James Zhu <James.Zhu@amd.com> Reviewedby: Christian König <christian.koenig@amd.com>

James Zhu authored
Add compute shader to support video compositor render. Signedoffby: James Zhu <James.Zhu@amd.com> Ackedby: Christian König <christian.koenig@amd.com>

James Zhu authored
Rename csc_matrix to shader_params, and increase shader_params size to store more constants for compute shader, Signedoffby: James Zhu <James.Zhu@amd.com> Reviewedby: Christian König <christian.koenig@amd.com>

James Zhu authored
Split vl_compositor graphic shaders from vl_compositor API in order to share vl_compositor API with vl_compositor compute shader later. Signedoffby: James Zhu <James.Zhu@amd.com> Reviewedby: Christian König <christian.koenig@amd.com>

James Zhu authored
Move dirty define to header file to share with compute shader. Signedoffby: James Zhu <James.Zhu@amd.com> Reviewedby: Christian König <christian.koenig@amd.com>

Juan Suárez Romero authored
In opt_peel_initial_if optimization, when moving the continue list to end of the continue block, before the jump, could happen that the continue list itself also ends with a jump. This would mean that we would have two jump instructions in a row: the first one from the continue list and the second one from the contine block. As inserting an instruction after a jump is not allowed (and it does not make sense, as it will not be executed), remove the jump from the continue block and keep the one from continue list, as it will be executed first. CC: Jason Ekstrand <jason@jlekstrand.net> Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Juan Suárez Romero authored
opt_split_alu_of_phi moves ALU instruction to the end of continue block. But if the continue block ends with a jump instruction (an explicit "continue" instruction) then the ALU must be inserted before the jump, as it is illegal to add instructions after the jump. CC: Ian Romanick <ian.d.romanick@intel.com> Fixes: 0881e90c ("nir: Split ALU instructions in loops that read phis") Reviewedby: Ian Romanick <ian.d.romanick@intel.com>

Andres Gomez authored
Instead of generating a GL_INVALID_ENUM error when the type or format is incorrect while using glClear{Named}Buffer{Sub}Data, generate GL_INVALID_VALUE. From page 72 (page 94 of the PDF) of the OpenGL 4.6 spec: " An INVALID_VALUE error is generated if type is not one of the types in table 8.2. An INVALID_VALUE error is generated if format is not one of the formats in table 8.3." Fixes the following test: KHRGL45.direct_state_access.buffers_errors v2: correct the doxygen documentation. Cc: Pi Tabred <servuswiegehtz@yahoo.de> Cc: Brian Paul <brianp@vmware.com> Signedoffby: Andres Gomez <agomez@igalia.com> Reviewedby: Tapani Pälli <tapani.palli@intel.com>

Gurchetan Singh authored
We've noticed the Team Fortress 2 engine seems to do many small calls to glSubData(..). Let's pick our heuristic based on the resource base width, not the size of a particular upload. This will cause transfers to be batched together in the transfer queue. Revelant glbench microbenchmark  Before: buffer_upload_dynamic_element_array_131072 = 131.17 mbytes_sec After: buffer_upload_dynamic_element_array_131072 = 6828.24 mbytes_sec Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
This improves Unigine Valley benchmark by 3 to 10 fps (depending on the scene). It also improves the Team Fortress 2 benchmark from 6 fps to 13 fps (host: 20 fps). Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
Transfers will be placed here at unmap time instead of incurring a VM exit. There's an attempt to deduplicate intersecting 1D transfers, which are surprisingly common. This can also help with mipmapped texture upload and smaller textures, where the majority of the time is spent in the guest kernel / QEMU  not virglrenderer. This is shown by the GLbench texture upload benchmark: Before: texture_upload_rgba_teximage2d_32 = 64.23 mtexel_sec After: texture_upload_rgba_teximage2d_32 = 367.44 mtexel_sec v2: Split up list iteration functions (@gerddie) v3: Support for optimizing glBufferSubData Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
Let's encode the new protocol with new helper functions. Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
The idea is to have two command buffers: 1) One for transfers 2) One for commands, which can include transfers At flush time, (2) will be filled. Otherwise, (1) will be used to submit transfers if there are enough of them. v2: Pass size directly to cmd_buf_create (@gerddie) Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
This is motivated by the following scenario: glSubBufferData(GL_ARRAY_BUFFER, ...) glFlush(..) glSubBufferData(GL_ARRAY_BUFFER, ...) glSubBufferData(GL_ARRAY_BUFFER, ...) glSubBufferData(GL_ARRAY_BUFFER, ...) This increases @davidriley's Team Fortress 2 apitrace from 1 fps to 6 fps and helps with the Chromium glbench microbenchmarks: Before: texture_update_rgba_texsubimage2d_2048 = 554.96 mtexel_sec buffer_upload_dynamic_array_12 = 0.02 mbytes_sec buffer_upload_dynamic_array_576 = 1.07 mbytes_sec After: texture_update_rgba_texsubimage2d_2048 = 612.29 mtexel_sec buffer_upload_dynamic_array_12 = 2.22 mbytes_sec buffer_upload_dynamic_array_576 = 164.89 mbytes_sec Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
It's good to keep track of these things. Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
Much of our logic is based around the idea the upper 16 bits of a command dword can encode the length of the command. Now that the command buffer >= 2^16  1, we should check for this. v2: alignment, and only check VIRGL_ENCODE_MAX_DWORDS Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
Let's define a helper function and use it. This commit also allows resources to be emitted into different command buffers. Like the ioctls, send 0 for layer_stride and stride. If we actually send the real values, there are various assumptions in virglrenderer for non1D buffers that may need to be modified. Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
Mostly similar to VIRGL_CCMD_RESOURCE_INLINE_WRITE. However, this uses the resource's already attached iovecs rather than the command buffer to transfer the data. v2: Used (1 << 16) not (1 << 15) [@gerddie] Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
This will allow us to destroy transfers w/o having a pointer to the context. Reviewedby: Gert Wollny <gert.wollny@collabora.com>

Gurchetan Singh authored
This should save some memory when allocating and freeing transfers. Reviewedby: Gert Wollny <gert.wollny@collabora.com>
