- Mar 13, 2015
-
-
Emil Velikov authored
Signed-off-by:
Emil Velikov <emil.l.velikov@gmail.com>
-
Emil Velikov authored
Signed-off-by:
Emil Velikov <emil.l.velikov@gmail.com>
-
- Mar 12, 2015
-
-
Turns out there are scenarios where we need to insert mov's in "front" of an input. Triggered by shaders like: VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], GENERIC[9] DCL SAMP[0] DCL TEMP[0], LOCAL 0: MOV TEMP[0].xy, IN[1].xyyy 1: MOV TEMP[0].w, IN[1].wwww 2: TXF TEMP[0], TEMP[0], SAMP[0], 1D_ARRAY 3: MOV OUT[1], TEMP[0] 4: MOV OUT[0], IN[0] 5: END Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit 27648efa)
-
We may not need this for later a4xx patchlevels, but we do at least need this for patchlevel 0. Bypass bary.f for fetching varyings when flat shading is needed (rather than configure via cmdstream). This requires a special dummy bary.f w/ (ei) flag to signal to scheduler when all varyings are consumed. And requires shader variants based on rasterizer flatshade state to handle TGSI_INTERPOLATE_COLOR. Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit e9f2abe3)
-
Scheduled basically the same as texture (cat5) instructions, using (sy) flag for synchronization. Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit 9d732d31)
-
I think there is at least one more sub-encoding, but these two should be enough to cover the common load/store instructions. Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit 20b50a07)
-
Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit dd70e786)
-
Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit c70097ae)
-
Fixes xonotic, some webgl stuff, and really pretty much anything with more than 4 varyings. Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit 51e33574)
-
Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit fb1301e4) Conflicts: src/gallium/drivers/freedreno/a3xx/a3xx.xml.h
-
Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit bdf02348)
-
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88883 Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit 68552266)
-
- Mar 11, 2015
-
-
The piglit test glsl-fs-uniform-array-loop-unroll.shader_test was designed to do an out of bounds access into an uniform array to make sure that we handle that situation gracefully inside the driver, however, as Ken describes in bug 79202, Valgrind reports that this is leading to an out-of-bounds access in fs_visitor::demote_pull_constants(). Before accessing the pull_constant_loc array we should make sure that the uniform we are trying to access is valid. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79202 Reviewed-by:
Matt Turner <mattst88@gmail.com> (cherry picked from commit 6ac1bc90) Nominated-by:
Matt Turner <mattst88@gmail.com>
-
We used to loop over all color attachments, and emit FB writes for each one, even if the shader didn't write to a corresponding output variable. Those color attachments would be filled with garbage (undefined values). Football Manager binds a framebuffer with 4 color attachments, but draws to it using a shader that only writes to gl_FragData[0..2]. This meant that color attachment 3 would be filled with garbage, resulting in rendering artifacts. Now we skip writing to it, fixing rendering. Writes to gl_FragColor initialize outputs[0..nr_color_regions-1] to GRFs, while writes to gl_FragData[i] initialize outputs[i]. Thanks to Jason Ekstrand for tracking this down. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86747 Signed-off-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Jason Ekstrand <jason.ekstrand@intel.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit e95969cd) Conflicts: src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
-
Previously, we emitted the shader-time epilogue from emit_fb_writes(), during the middle of looping through color regions (or emit_urb_writes for the VS). This is duplicated several times and rather awkward. I need to fix a bug in our FB write handling, and it will be a lot easier if we move emit_shader_time_end() out of there. Now, we simply emit FB writes/URB writes, and subsequently have emit_shader_time_end() insert instructions before the final SEND with EOT. Not only is this simpler, it's actually a slight improvement: we now include the MOVs to set up the final FB write payload in our shader-time measurements. Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses send-from-GRF. (In the past, we might have hit trouble where both attempt to use MRFs for messages; that's not a problem now.) v2: Rebase on v3 of the previous patch and other shader_time fixes. Signed-off-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> [v1] Acked-by:
Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit 4ebeb715) Conflicts: src/mesa/drivers/dri/i965/brw_fs.cpp
-
This makes another part of the INTEL_DEBUG=shader_time code emittable at arbitrary locations, rather than just at the end of the instruction stream. v2: Don't lose smear! Caught by Topi Pohjolainen. v3: Don't set smear on the destination of the MOV. Thanks Topi! Signed-off-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit e43af8d0)
-
Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)). The advantage is that we can also insert a shader time write at an arbitrary location in the instruction stream, rather than being restricted to emitting at the end. Signed-off-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by:
Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit bea854c7)
-
The ADD(diff, diff, fs_reg(-2u)) instruction reads diff, which is a width 1 register. We need to read it as <0,1,0> with a subreg of 0, which is what smear accomplishes. Fixes assertion: brw_eu_emit.c:285: validate_reg: Assertion `hstride == 0' failed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Signed-off-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit f1adc45d) Conflicts: src/mesa/drivers/dri/i965/brw_fs.cpp
-
These computations don't have anything to do with the currently executing channels, so they should use force_writemask_all. This fixes assert failures. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Signed-off-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit ef9cc7d0) Conflicts: src/mesa/drivers/dri/i965/brw_fs.cpp
-
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> (cherry picked from commit c939231e)
-
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 9953586a)
-
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 74a757f9)
-
This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01 test as well as the precision part of Wine's 3dc format test (fd.o bug 89156). The Z component seems to contain a lower precision version of the result, probably a temporary value from the decompression computation. The Y and W component contain different data that depends on the input values as well, but I could not make sense of them (Not that I tried very hard). GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in piglit, and both formats are affected by a compiler bug if they're sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx, which returns random garbage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156 Signed-off-by:
Marek Olšák <marek.olsak@amd.com> Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> (cherry picked from commit f710b990)
-
Was resulting in gl_PointSize write being optimized out, causing particle system type shaders to hang if hw binning enabled. Fixes neverball, OGLES2ParticleSystem, etc. Signed-off-by:
Rob Clark <robclark@freedesktop.org> (cherry picked from commit 60096ed9)
-
This fixes ARB_texture_query_levels to actually return the desired value. Signed-off-by:
Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by:
Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> (cherry picked from commit cb3eb43a)
-
Signed-off-by:
Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by:
Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 8ac957a5)
-
Fixes: 1f3ca56b ("freedreno: use util_copy_framebuffer_state()") Signed-off-by:
Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by:
Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> (cherry picked from commit f3dfe651)
-
Piglit's spec/glsl-1.20/compiler/structure-and-array-operations/ array-selection.vert test contains the following code: gl_Position = (pick_from_a_or_b ? a : b)[i]; where "a" and "b" are uniform vec4[2] variables. ast_to_hir creates a temporary vec4[2] variable, conditional_tmp, and generates an if-block to copy one or the other: (declare (temporary) (array vec4 2) conditional_tmp) (if (var_ref pick_from_a_or_b) ((assign () (var_ref conditional_tmp) (var_ref a))) ((assign () (var_ref conditional_tmp) (var_ref b)))) However, we failed to update max_array_access for "a" and "b", so it remained 0 - here, the whole array is being accessed. At link time, update_array_sizes() used this bogus information to change the types of "a" and "b" to vec4[1]. We then had assignments from a vec4[1] to a vec4[2], which is highly illegal. This tripped assertions in nir_split_var_copies with scalar VS. Signed-off-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Jason Ekstrand <jason.ekstrand@intel.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit 9f1e250e)
-
The yoffset needs to be interpreted as a slice offset for 1D array textures. This patch implements that by moving the yoffset into zoffset similar to how it moves the height into depth. Reviewed-by:
Jason Ekstrand <jason.ekstrand@intel.com> Cc: "10.5" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 7286a689)
-
Now that a layered source PBO is interpreted as a single tall 2D image it's quite easy to accept the image height packing option by just creating an image that is tall enough to include the image padding. I'm not sure whether the image height property should affect 1D_ARRAY textures. My intuition and interpretation of the GL spec (which is a bit vague) would be that it shouldn't. However the software fallback path in Mesa uses the property for packing but not for unpacking. The binary NVidia driver uses it for both. This patch doesn't use it for either case so it is different from the software fallback. There is some discussion about this here: http://lists.freedesktop.org/archives/mesa-dev/2015-February/077925.html This is tested by the texsubimage Piglit test with the array and pbo arguments. Previously this test was skipping this code path because it always sets the image height. I've also tested it by modifying the getteximage-targets test. It wasn't using this code path before because it was using the default texture object so this code couldn't successfully create a frame buffer. I also modified it to add some image padding with the image height in the PBO. Reviewed-by:
Jason Ekstrand <jason.ekstrand@intel.com> Cc: "10.5" <mesa-stable@lists.freedesktop.org> (cherry picked from commit a08bff1e)
-
This reverts commit 546aba14. I think the changes to the calls to glBlitFramebuffer from this patch are no different to what it was doing previously because it used to set height to 1 before doing the blits. However it was introducing some problems with the blit for layer 0 because this was no longer special cased. It didn't fix problems with the yoffset which needs to be interpreted as a slice offset. I think a better solution would be to modify the original if statement to cope with the yoffset. Conflicts: src/mesa/drivers/common/meta_tex_subimage.c Cc: "10.5" <mesa-stable@lists.freedesktop.org> Reviewed-by:
Jason Ekstrand <jason.ekstrand@intel.com> (cherry picked from commit 7d10d2fe)
-
A layered PBO image is now interpreted as a single tall 2D image so the z argument in _mesa_meta_bind_fbo_image is ignored. Therefore this was just redundantly rebinding the same image repeatedly. Reviewed-by:
Jason Ekstrand <jason.ekstrand@intel.com> (cherry picked from commit a44606eb)
-
- Mar 07, 2015
-
-
For some given GLSL IR like (+ (neg x) (* 1.2 x)), the try_emit_mad function would see that one of the +'s sources was a negate expression and set mul_negate = true without confirming that it was actually a multiply. Cc: 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89315 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89095 Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit d528907f) [Emil Velikov: drop the changes in brw_vec4_visitor.cpp] Signed-off-by:
Emil Velikov <emil.l.velikov@gmail.com> Conflicts: src/mesa/drivers/dri/i965/brw_fs_visitor.cpp src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
-
This fixes a dEQP test failure. In the test, glCopyTexSubImage2D was called with target = 0 and failed to throw INVALID ENUM. This failure was caused by _mesa_get_current_tex_object(ctx, target) being called before the target checking. To remedy this, target checking was separated from the main error-checking function and called prior to _mesa_get_current_tex_object. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89312 Reviewed-by:
Anuj Phogat <anuj.phogat@gmail.com> (cherry picked from commit ca65764d)
-
This fixes a dEQP test failure. In the test, glCompressedTexSubImage2D was called with target = 0 and failed to throw INVALID ENUM. This failure was caused by _mesa_get_current_tex_object(ctx, target) being called before the target checking. To remedy this, target checking was made into its own function and called prior to _mesa_get_current_tex_object. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89311 Reviewed-by:
Anuj Phogat <anuj.phogat@gmail.com> (cherry picked from commit 549078cb)
-
Correctly set _BaseFormat field when creating a gl_renderbuffer with EGLImage storage. Change-Id: I8c9f7302d18b617f54fa68304d8ffee087ed8a77 Signed-off-by:
Frank Henigman <fjhenigman@google.com> Reviewed-by:
Stéphane Marchesin <marcheu@chromium.org> Reviewed-by:
Chad Versace <chad.versace@intel.com> (cherry picked from commit e4372994) Nominated-by:
Chad Versace <chad.versace@intel.com>
-
The shader-cache isn't finished, so the configure checks are a bit premature and will only stand to confuse users of Mesa 10.5.0. This is a squash of the follow four reverts: Revert "Rename sha1.c and sha1.h to mesa-sha1.c and mesa-sha1.h" Revert "configure: Add machinery for --enable-shader-cache (and --disable-shader-cache)" Revert "sha1: Fix gcry_md_hd_t typo." Revert "mesa: Add mesa SHA-1 functions" Reviewed-by:
Carl Worth <cworth@cworth.org>
-
Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89224 Reviewed-by:
Matt Turner <mattst88@gmail.com> Reviewed-by:
Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit 0dfec59a)
-
A while back I switched intel_blit_framebuffer to prefer Meta over the BLT. This meant that Gen8 platforms would start using the 3D engine for blits, just like we do on Gen6-7.5. However, I hadn't considered Gen4-5 when making that change. The BLT engine appears to be substantially faster on 965GM than using Meta to drive the 3D engine. This isn't too surprising: original Gen4 doesn't support tile offsets (that came on G45), and the level/layer fields don't work for cubemap rendering, so for inconvenient miplevel alignments, we end up blitting or copying data to/from temporaries in order to render to it. We may as well just use the blitter. I chose to use the BLT on Gen4-5 because they use the same ring for both 3D and BLT; Gen6+ splits it out. Fixes regressions on 965GM due to botched tile offset code (we should fix those properly as well, but they're longstanding bugs - for now, put things back to the status quo). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89430 Signed-off-by:
Kenneth Graunke <kenneth@whitecape.org> Reviewed-by:
Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by:
Jordan Justen <jordan.l.justen@intel.com> Cc: "10.5" <mesa-stable@lists.freedesktop.org> (cherry picked from commit aa0705c0)
-
The SSSE3 swizzling code was written for fast uploads to the GPU and assumed the destination was always 16-byte aligned. When we began using this code for fast downloads as well we didn't do anything to account for the fact that the destination pointer given by glReadPixels() or glGetTexImage() is not guaranteed to be suitably aligned. With SSSE3 enabled (at compile-time), some applications would crash when an SSE aligned-store instruction tried to store to an unaligned destination (or an assertion that the destination is aligned would trigger). To remedy this, tell intel_get_memcpy() whether we're uploading or downloading so that it can select whether to assume the destination or source is aligned, respectively. Cc: 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89416 Tested-by:
Uriy Zhuravlev <stalkerg@gmail.com> Reviewed-by:
Jason Ekstrand <jason.ekstrand@intel.com> (cherry picked from commit 2e4c95df)
-