- May 04, 2019
-
-
Rob Clark authored
Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Basically, when the conditions of a csel diverge, we scalarize to avoid going into weird code paths during emit. We could be doing better, but this case can't occur organically from GLSL as far as I can, though it does fix lowered atan2. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
A previous commit by Tomeu aborted RA early, which solves the memory corruption issue, but then generates an incorrect compile. This fixes that. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
This handles the usual case. 8-bit register access parallels 16-bit access, but with one major caveat: in 8-bit mode, only half of the register file is actually (directly) accessible as sources. In particular, for each 16-bit integer register (hrN), we can only index a *single* 8-bit integer (qrN), corresponding to the lower 8-bits. To get the upper 8-bits, it is required to do an explicit shift. For example, to add the bytes of a 16-bit integer hr0.x and get the result as an 8-bit qr0, you'd need to do something like: ilsr hr1.x, hr0.x, #8 iadd qr0.x, qr0.x, qr1.x This scheme diverges from 32-bit registers, in that both the upper and lower halves of a 32-bit register are individually accessible as a pair of half registers. For contrast, to add the lower and upper 16-bits of a 32-bit integer r0.x, you can just: iadd hr0.x, hr0.x, hr1.x Since hr1.x = upper 16-bit of r0.x. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Meanwhile, we're forced to disable dest_override, since it's not yet clear how this interacts with other bitnesses (it'll likely need to be overhauled in any case). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Per OpenCL. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
We silently ignored certain bits of the mask, which causes issues when disassembly 8/64-bit ops. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Alyssa Rosenzweig authored
In preparation for 8-bit and 64-bit operands, let's not reinforce the 32-bit-centric biases in the ISA. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Rob Clark authored
Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
Since it is dependent on the tile mode (ie. disabled for smaller mipmap levels), we should handle it a similar way to fd_resource_level_linear(). The code previously mostly did the right thing because the old helper took the tile mode. Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
Best to keep it encapsulated in the helper which returns layer/level offset (and actually use that helper everywhere) rather than spreading the logic around the code. Also add a helper to find UBWC offset, to complete the encapsulation. Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
Small cleanup. They are just an array of data and only ever linear/ uncompressed. Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
If someone is importing a buffer, we can't really know the state of it's contents, so assume it is valid. Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
There are still some fallbacks we'll need to handle before we can enable UBWC by default. I think we may need to fallback to uncompressed if image atomic operations are used. And we still need to sort out how to handle image and sampler views of compressed resources if the image/ sampler view is using a format that does not support compression. (I think the latter should hopefully be uncommon outside of deqp/piglit.) But at least this gets us to the point where supertuxkart works properly with UBWC enabled ;-) Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
A few fixes that get UBWC working for the games/benchmarks where I noticed problems before (in particular and manhattan, and stk (modulo image support for UBWC when compute shaders are used for post-process effects): + fix the size of the UBWC meta buffer (ie, the offset to color pixel data) that is returned by ->fill_ubwc_buffer_sizes() + correct size/layout for 8 and 16 byte per pixel formats + limit the supported formats.. Note all formats that can be tiled can be compressed. Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
Corrects tex state ubwc pitch/size Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
Fixes dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 and .20 ca3eb5db went from silently truncating the constant state, which was also the wrong thing to do, to an assert. Which then showed up in a couple of dEQPs. Actually there is nothing wrong with larger constant file so just drop the assert. Signed-off-by: Rob Clark <robdclark@chromium.org>
-
Karol Herbst authored
Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
-
Karol Herbst authored
with that we can simplify code where nir vectors are created v2: merge both lines in nir_vec Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
-
Karol Herbst authored
v2: rename to nir_build_alu_src_arr Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
-
Karol Herbst authored
v2: use vtn_push_ssa and vtn_ssa_value Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
-
Mathias Fröhlich authored
Now that dlist compilation again knows if it is inside glBegin/glEnd, we can leave the decision if aliasing should occur to the vertex attribute setter functions instead of doing that at glArrayElement time. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
We have to use _mesa_inside_dlist_begin_end instead of _mesa_inside_begin_end to see if we are inside a glBegin/glEnd block in case of display lists. So split the is_vertex_position function used in vertex attribute processing into a imm and dlist variant and use the appropriate _mesa_inside_begin_end variant. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
That seems to be lost somewhere. Is needed for correct outside begin/end detection in display list compilation. And is needed for correct aliasing in dlists restablished in the next changes. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
The value is now unused. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
Is no longer used, so we have less occasions where NewState is non zero. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
Now this part of gl_context state is unused and can be removed. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
In glArrayElement, use the bitmask trick to just walk the enabled vao arrays. This should be about equivalent in execution time to walk the prepare aelt_context list. Finally this will allow us to reduce the _mesa_update_state calls in a few patches. v2: Add comments. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
In the glArrayElement implementation, use glVertexAttrib*NV type functions for fixed function attributes. We do the same in display execution when the list is replayed using immediate mode attribute functions. Using a single set of function pointers enables to use a unified loop to walk the vertex array attributes. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-
Mathias Fröhlich authored
For access to glArrayElement methods factor out a function to get the table lookup index for normalized/integer/double access. The function will be used in the next patch at least twice. v2: Use vertex_format_to_index instead of NORM_IDX. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
-