- Oct 30, 2018
-
-
Kenneth Graunke authored
Kabylake hardware appears to hang when BLEND_STATE references SRC1 blend factors, but the shader issues regular render target write messages rather than dual source RT writes. Presumably, the hardware needs it to provide both colors. This can happen in broken applications that leave blending enabled with SRC1 factors, but don't properly call glBindFragDataLocationIndexed or set index = 1 in their shader. The results are undefined in this case, but we'd like to avoid crashing the GPU. To fix this, we introduce NOS in the shader key - if dual source blend factors are enabled, we force the use of dual source RT write messages. (In fact, we already had a bit for Unigine shader workarounds, so we reuse and rename that.) If no secondary color is defined (index > 0), we look for location = 1 for maximum compatibility with broken apps (as other drivers appear to do this), and finally just reuse color 0 for both outputs. We guess the key based on the existence of a shader variable marked with index = 1. This may cause recompiles if the index is specified via glBindFragDataLocationIndexed rather than a layout qualifier in the shader.
-
- Oct 29, 2018
-
-
Kenneth Graunke authored
Apparently, we're supposed to look at the texture object's built-in sampler object's sRGB decode setting in order to decide whether to decode/downsample/re-encode, or simply downsample as-is. Previously, I had always done the decoding/encoding. Fixes SKQP's Skia_Unit_Tests.SRGBMipMaps test.
-
- Oct 28, 2018
-
-
Rob Clark authored
In the 'inorder' case (ie. FD_MESA_DEBUG=inorder, or old kernel), if the u_blitter clear path is used (a3xx, a4xx, and some fallback cases on newer gens), util_blitter_restore_fb_state() will set_framebuffer_state() to something that is identical to the current fb state, which triggers an unnecessary flush, and then eventually an assert: (gdb) bt #0 0x0000007fbf24a078 in kill () from /lib64/libc.so.6 #1 0x0000007fbe061278 in _debug_assert_fail (expr=0x7fbe93a820 "!batch->flushed", file=0x7fbe93a628 "../src/gallium/drivers/freedreno/freedreno_batch.c", line=491, function=0x7fbe93a990 <__func__.17380> "fd_batch_check_size") at ../src/gallium/auxiliary/util/u_debug.c:322 #2 0x0000007fbe1ccb8c in fd_batch_check_size (batch=0x55556d5a70) at ../src/gallium/drivers/freedreno/freedreno_batch.c:491 #3 0x0000007fbe1d0e08 in fd_clear (pctx=0x55555c61e0, buffers=5, color=0x55556e388c, depth=1, stencil=0) at ../src/gallium/drivers/freedreno/freedreno_draw.c:463 #4 0x0000007fbe57afa4 in st_Clear (ctx=0x55556e17b0, mask=18) at ../src/mesa/state_tracker/st_cb_clear.c:452 The assert was introduced in 4b847b38, so from a functionality standpoint this patch fixes that commit. But it should also avoid an unnecessary flush in the 'inorder' case, fixing a performance bug. Fixes: 4b847b38 freedreno: make fd_batch a one-shot thing Signed-off-by: Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
ZSA state can change whether depth or stencil is enabled This plus previous patch fix stk, and various things w/ FD_MESA_DEBUG=inorder Fixes: ec717fc6 freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
The problem isn't directly with ec717fc6 but rather that commit exposes the problem. When we switch batch we cannot assume previous state is clean so we should mark all state dirty. Fixes: ec717fc6 freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <robdclark@gmail.com>
-
- Oct 27, 2018
-
-
Faith Ekstrand authored
We were previously using relative timeouts and decrementing the user-provided timeout as we waited. Instead, this commit refactors things to use absolute timeouts throughout. This should fix a subtle bug in the waitAll case where we aren't decrementing the timeout after a successful GPU wait. Since pthread_cond_timedwait already takes an absolute timeout, it's also significantly simpler. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Faith Ekstrand authored
It probably doesn't actually break anything but it does cause some assertions in debug builds. Fixes: 7a89a0d9 "anv: Use separate MOCS settings for external BOs" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Faith Ekstrand authored
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
- Oct 26, 2018
-
-
Rob Clark authored
Now that it is just called once per draw (instead of once for binning and once for draw), let's just inline it. If nothing else, it makes perf-annotate easier to look at. Signed-off-by: Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
Signed-off-by: Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
Historically this wasn't in fdN_emit_state(), because prior to addition of blitter in a5xx, fdN_emit_state() was also used in the clear path. These days that is only true for a2xx (a3xx and a4xx use u_blitter). So the reason for it not to be in fd6_emit_state() no longer exists. Signed-off-by: Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
Noticed that with webgl (in chromium, at least) we end up generating a lot of no-op submits just to get a fence. Tracking the last fence and returning that if there is no rendering since last flush avoids this. Signed-off-by: Rob Clark <robdclark@gmail.com>
-
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
-
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
-
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
-
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
-
The scissor maxx/maxy are non-inclusive, so don't subtract one from framebuffer width and height. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
-
We get a warning here for assigning a const char * pointer to char *swizzle in struct ir2_src_register. The constructor strdups a 4 byte string here, so just memcpy to that instead. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
-
Move it to a header and use it where possible to avoid vfunc call. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
-
Rob Clark authored
In the pursuit of lowering driver overhead, it became clear that some amount of redesign of how libdrm_freedreno constructs the submit ioctl would be needed. In particular, as the gallium driver is starting to make heavier use of CP_SET_DRAW_STATE state groups/objects, the over- head of tracking cmd buffers and relocs becomes too much. And for "streaming" state, which isn't ever reused (like uniform uploads) the overhead of allocating/freeing ringbuffer[1] objects is too high. This redesign makes two main changes: 1) Introduces a fd_submit object for tracking bos and cmds table for the submit ioctl, making ringbuffer objects more light- weight. This was previously done in the ringbuffer. But we have many ringbuffer instances involved in a submit (gmem + draw + potentially 1000's of state-group rbs), and only need a single bos and cmds table. (Reloc table is still per-rb) The submit is also a convenient place for a slab allocator for ringbuffer objects. Other options would have required locking because, while we can guarantee allocations will only happen on a single thread, free's could happen either on the application thread or the flush_queue thread. With the slab allocator in the submit object, any frees that happen on the flush_queue thread happen after we know that the application thread is done with the submit. 2) Introduce a new "softpin" msm_ringbuffer_sp implementation that does not use relocs and only has cmds table entries for IB1 (ie. the cmdstream buffers that kernel needs to CP_INDIRECT_BUFFER to from the RB). To do this properly will require some updates on the kernel side, so whether you get the softpin or legacy submit/ringbuffer implementation at runtime depends on your kernel version. To make all these changes in libdrm would basically require adding a libdrm_freedreno2, so this is a good point to just pull the libdrm code into mesa. Plus it allows for using mesa's hashtable, slab allocator, etc. And it lets us have asserts enabled for debug mesa buids but omitted for release builds. And it makes life easier if further API changes become necessary. At this point I haven't tried to pull in the kgsl backend. Although I left the level of vfunc indirection which would make it possible to have other backends. (And this was convenient to keep to allow for the "softpin" ringbuffer to coexist.) NOTE: if bisecting a build error takes you here, try a clean build. There are a bunch of ways things can go wrong if you still have libdrm_freedreno cflags. [1] "ringbuffer" is probably a bad name, the only level of cmdstream buffer that is actually a ring is RB managed by kernel. User- space cmdstream is all IB1/IB2 and state-groups. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Rob Clark <robdclark@gmail.com>
-
Faith Ekstrand authored
This reverts commit 0fa9e6d7. The real issue appears to have been that HiZ ops don't like having WM thread dispatch force-enabled. The previous commit fixes that problem so we can go back to using the ForceThreadDispatchEnable bit even on SKL+. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Faith Ekstrand authored
Cc: mesa-stable@lists.freedesktop.org Suggested-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
-
Axel Davy authored
Usually when a window is resized, the app calls d3d to resize the back buffer to the window size. In some cases, it is not done, and it expects the output resizes to the window size, even if the back buffer size is unchanged. This patch introduces the behaviour when a presentation buffer is used. ID3DPresent_GetWindowInfo is a function available with D3DPresent v1.0, and thus we don't need to check if the function is available. The function had been introduced to implement this very feature. Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
GetWindowInfo used to be GetWindowSize before gallium nine was merged. A left-over remained... Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
Windows drivers don't set this flag (which affects ff) to more than 8. Do the same in case some games check for 8. v2: Remove any dependence on MaxSimultaneousTextures. For non-ff the number of textures is 16 when the device is able of vs/ps3. Add this requirement of 16 textures to the driver requirements. Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
We didn't implement shadow textures for ps 1.X, assuming the case couldn't happen... Well it does. Fixes: https://github.com/iXit/Mesa-3D/issues/261 Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
A lot of these states are used only for the context, and are unused for stateblocks (which just uses the changed.* fields instead for a lot of them). Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
If NINE_STATE_FF_MATERIAL is set, the stateblock will upload its recorded materials matrix. If NINE_STATE_FF_LIGHTING is set, the lighting set is uploaded. These flags could be set by a NineDevice9_SetTransform call or by setting some states related to ff, but that shouldn't trigger these stateblock behaviours. We don't need to follow the context states dirtied by render states. NINE_STATE_FF_VSTRANSF is exactly the state controlling stateblock updates of transformation matrices, NINE_STATE_FF is too broad. These two changes avoid setting the two mentionned states when we shouldn't. Fixes: https://github.com/iXit/Mesa-3D/issues/320 Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
The device state changed.* field are never used. These fields are used only for stateblocks. Avoid setting them at all for clarity. Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
We avoid allocating space for never unused matrices. However we must do as if we had captured them. Thus when a D3DSBT_ALL stateblock apply has fewer matrices than device state, allocate the default matrices for the stateblock before applying. Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
D3DSBT_ALL stateblocks capture the transform matrices. Fixes some d3d test programs not displaying properly. Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
While to the application we have to track accurately all 256 world matrices (including in stateblocks), hw vertex processing enables to set a limit to the number of world matrices the hardware can access to in the advertised caps, which is 8 for nine. Thus don't bother in the stateblock code to send the updated values for the unreachable matrices. Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
NINE_STATE_MATERIAL was used incorrectly at one location. Replace it with the correct state. Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Axel Davy authored
At some point the project was to adapt the commented version to csmt. The csmt rework enabled to fix some state aliasing issues between stateblocks and internal state updates. The commented version needs a lot of work to work with that. Just drop it. Signed-off-by: Axel Davy <davyaxel0@gmail.com>
-
Brian Paul authored
Empty initializer is not standard C. This fixes MSVC build. Trivial.
-
Faith Ekstrand authored
This lets us get rid of a bunch of duplicated error messages. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
-
Faith Ekstrand authored
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
-
Brian Paul authored
This reverts commit a5fd54f8. The whole point was to add a way to pass -DVMX86_STATS to the build, but we can do that with a command line argument when we invoke scons. Reviewed-by: José Fonseca <jfonseca@vmware.com>
-
Nanley Chery authored
Follow the restriction of making sure the clear value is between the min and max values defined in CC_VIEWPORT. Avoids a simulator warning for some piglit tests, one of them being: ./bin/depthstencil-render-miplevels 146 d=z32f_s8 Jason found this to fix incorrect clearing on SKL. Fixes: 09948151 ("intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Jason Ekstrand <jason@jlekstrand.net>
-
Eric Engestrom authored
MESA_GIT_SHA1 resolves to either an empty "" string if not build from git, or " (git-DEADBEEF)" if it is. No need to wrap it in additional "()". Fixes: 9d40ec2c "radv: Add support for VK_KHR_driver_properties." Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-