- 11 Dec, 2019 31 commits
-
-
Jason Ekstrand authored
We've been keeping up with the spec updates. Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Jason Ekstrand authored
Vulkan 1.1 requires VK_KHR_external_fence which requires syncobj support to be actually usable. However, it doesn't strictly require that we support any external handle types. We should be able to advertise 1.1 even on old kernels that don't have syncobj support. Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Jason Ekstrand authored
When we have syncobj_wait, we can trust in WAIT_FOR_SUBMIT but when we don't, we only have BO waits and those aren't quite as nice. This commit adds a flag to _anv_queue_submit to wait for the queue to drain before returning. This gives us the behavior we need to implement DeviceWaitIdle. Fixes: 246261f0 "anv: prepare the driver for delayed submissions" Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Fixes: 11f736a6 "nir/tests: add serializer tests" Signed-off-by:
Karol Herbst <kherbst@redhat.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Jan Zielinski authored
This is initial commit on the way to implement ARB_tessellation_shader extension in OpenSWR. It introduces tessellator implementation taken from Microsoft GitHub (published under MIT license): https://github.com/microsoft/DirectX-Specs/blob/master/d3d/archive/images/d3d11/tessellator.cpp https://github.com/microsoft/DirectX-Specs/blob/master/d3d/archive/images/d3d11/tessellator.hpp It also adds some glue code that connects the tessellator to the internals of SWR rasterizer. Acked-by:
Dave Airlie <airlied@redhat.com> Acked-by:
Bruce Cherniak <bruce.cherniak@intel.com> Reviwed-by:
Alok Hota <alok.hota@intel.com>
-
Samuel Pitoiset authored
This is used to validate if the driver emits correct LLVM IR. Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Eric Engestrom authored
Signed-off-by:
Eric Engestrom <eric.engestrom@intel.com> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
lavacli 0.9.8 is now available in Debian Testing. Ref: https://tracker.debian.org/news/1066828/lavacli-098-1-migrated-to-testing/ Fixes: 555c0de8 ("gitlab-ci: Move LAVA-related files into top-level ci dir") Signed-off-by:
Rohan Garg <rohan.garg@collabora.com> Acked-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com>
-
Erico Nunes authored
Otherwise we may lower some fdot to fdph which is not implemented in pp. Fixes #2126 Signed-off-by:
Erico Nunes <nunes.erico@gmail.com> Reviewed-by:
Vasily Khoruzhick <anarsoul@gmail.com>
-
Karol Herbst authored
Signed-off-by:
Karol Herbst <kherbst@redhat.com> Reviewed-by:
Connor Abbott <cwabbott0@gmail.com>
-
Karol Herbst authored
Nir serializes uses nir_ssa_alu_instr_src_components in a few places to determine how many components a src has, but that's not what this function returns. It simply returns how many channels are used, which is still fine for most of the code. This was breaking code like this: vec16 32 ssa_1 = intrinsic load_global vec1 32 ssa_2 = fmax ssa_1.a, ssa_2.b v2: make the 16bit encoding work for identify swizzles again Signed-off-by:
Karol Herbst <kherbst@redhat.com> Reviewed-by:
Connor Abbott <cwabbott0@gmail.com>
-
Bas Nieuwenhuizen authored
This is correct per the Vulkan spec format equivalence table. Fixes: f36b5274 "radv/android: Add android hardware buffer queries." Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Tomeu Vizoso authored
Sometimes it's useful to get information about GPU faults in the console, so it's synchronized with other messages. This commit will cause Mesa to wait for completion and check if there are any faults raised by the GPU. Signed-off-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
-
Kenneth Graunke authored
A lot of the brw_*_prog_key fields are for emulating features on legacy hardware that iris doesn't support. In particular, all of the texture swizzle fields take up a lot of space. These dead fields make hashing the shader keys more expensive than it ought to be. We introduce iris-specific keys with only the information we need, and translate them to brw keys when actually compiling new variants. This way, key comparisons can use the small keys. The size reductions are: VS: 328 bytes -> 8 bytes TCS: 312 bytes -> 24 bytes TES: 304 bytes -> 24 bytes GS: 284 bytes -> 8 bytes FS: 304 bytes -> 16 bytes CS: 280 bytes -> 4 bytes Scores for the Piglit drawoverhead microbenchmark case with a shader program change improve by roughly 30%. Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Fixes: a24d6fba ("meson: Add -Werror=gnu-empty-initializer to MSVC compat args") Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by:
Vinson Lee <vlee@freedesktop.org> Signed-off-by:
Pierre Moreau <dev@pmoreau.org>
-
Vinson Lee authored
macOS does not have pthread_getcpuclockid. src/util/u_thread.h:156:4: error: implicit declaration of function 'pthread_getcpuclockid' is invalid in C99 [-Werror,-Wimplicit-function-declaration] pthread_getcpuclockid(thread, &cid); ^ Fixes: 4913215d ("util/u_thread: don't restrict u_thread_get_time_nano() to __linux__") Closes: mesa/mesa#2171 Signed-off-by:
Vinson Lee <vlee@freedesktop.org> Acked-by:
Eric Engestrom <eric@engestrom.ch>
-
Emma Anholt authored
This gets us shared non-UBWC layout code between gallium and turnip. Until I fix up the rest of gallium to handle UBWC mipmapping, we do the single-level UBWC setup in gallium as a fixup after layout. Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Emma Anholt authored
Prevents regressions on argb1555 and rgb565 when making turnip use freedreno's layout. Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Emma Anholt authored
We pass in all the parameters for setting up the layout, though freedreno still sets a few of them up early (since it uses layout helpers in making some decisions about the layout setup parameters that will be cleaned up once krh's blitter work lands).
-
Emma Anholt authored
This lets us start using some of the fdl_* helpers and have more obviously matching code between gallium and turnip. We can't yet use the fdl_* UBWC helpers, since the gallium driver doesn't do UBWC mipmaps (which I'm working on in another branch). Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Emma Anholt authored
This is a little refactor in preparation for UBWC mipmapping support. Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Emma Anholt authored
It's the same logic for each of these being emitted, and I was about to change the rsc->layout.* for UBWC. Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Emma Anholt authored
We can just bake the UBWC-goes-first delta into the slices at setup time. I did have to fix up the resource shadowing swap path to swap the slice fields, as it was missing and regressed the format reinterpets otherwise. Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Kenneth Graunke authored
i965 wants to use an offset from a base because everything is in a single buffer whose address may be relocated, and all base addresses are set to the start of that buffer. iris wants to use a full 64-bit address, because state lives in separate buffers which may be in the shader, surface, and dynamic memory zones, where addresses grow downward from the top of a 4GB zone, So it's very possible for a 32-bit offset to exist relative to multiple bases, leading to the wrong state size.
-
low-level implementation of INTEL-performance-query APIs in Intel iris driver. Most of functions and procedures defined here are adopted from i965 driver (brw_performance_query.c) v2: - replace genX_init_performance_query with iris_init_perfquery_functions which is gen's version agnositic - general code clean-up v3: include gen_perf_gens.h as some of defines were moved to this new header file v4: - checking for kernel 4.13+ won't be needed here as Iris won't be loaded anyway without DRM_SYNCOBJ that is enabled after Kernel 4.13. - checking whether gen < 8 or is_cherryview won't be required as well because those cases are screened in iris_screen_create. v5: remove genX(init_performance_query) v6: - remove oa_metrics_kernel_support as iris works only with kernel 4.18 and newer. - use perf functions defined in separate file, iris_perf.h/c Signed-off-by:
Dongwon Kim <dongwon.kim@intel.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org>
-
The configuration of the gen_perf vtable will be the same for INTEL_performance_query and AMD_performance_monitor. Initialize the table in a single routine that can be called from both implementations. Signed-off-by:
Dongwon Kim <dongwon.kim@intel.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org>
-
new state tracker APIs added for INTEL_performance_query This extension is enabled if all vendor specific functions for it exist. v2: add st_cb_perfquery.* to the list of sources in Makefile v3: minor code clean-up v4: - add driver hooks for intel-performance-query apis - add PIPE level performance counter and type enums that match to OpenGL enums - do conversion of pipe_perf_counter_type and pipe_perf_counter_data_type enums to GL defines in state_tracker Signed-off-by:
Dongwon Kim <dongwon.kim@intel.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org>
-
Dylan Baker authored
Fixes: 1ae8018a ("meson: Add support for the vc4 driver.") Reviewed-by:
Eric Anholt <eric@anholt.net>
-
Kenneth Graunke authored
TCCNTLREG contains additional L3 cache write merging optimizations. The default value on my system appears to be: - URB Partial Write Merging (bit 0) - L3 Data Partial Write Merging (bit 2) - TC Disable (bit 3) Windows drivers appear to set bit 1 as well to enable "Color/Z Partial Write Merging". This should solve an issue we were seeing where MRT benchmarks were using substantially more bandwidth than they ought. However, we have not observed it to cause measurable FPS gains. It is unclear whether we should be setting bit 0 or bit 3, so for now we leave those at the hardware default value. Acked-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Kenneth Graunke authored
TCCNTLREG contains additional L3 cache write merging optimizations. The default value on my system appears to be: - URB Partial Write Merging (bit 0) - L3 Data Partial Write Merging (bit 2) - TC Disable (bit 3) Windows drivers appear to set bit 1 as well to enable "Color/Z Partial Write Merging". This should solve an issue we were seeing where MRT benchmarks were using substantially more bandwidth than they ought. However, we have not observed it to cause measurable FPS gains. It is unclear whether we should be setting bit 0 or bit 3, so for now we leave those at the hardware default value. Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed frequency, according to Felix Degrood. I didn't see any improvements at out-of-the-box power management settings, however. Acked-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Kenneth Graunke authored
TCCNTLREG contains additional cache programming settings. In particular, there are several write combining controls we'd like to use. Acked-by:
Jason Ekstrand <jason@jlekstrand.net>
-
- 10 Dec, 2019 9 commits
-
-
Kenneth Graunke authored
This makes simple_mtx_destroy set the counter to an invalid canary value and then makes lock/unlock assert that the value is legal. That way, calling lock/unlock after destroy will assert fail, rather than deadlocking or potentially even working. This has caught real deadlocks in dEQP multithreaded tests (in st/mesa shader variant zombie list handling), which have since been fixed. Reviewed-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Eric Engestrom <eric@engestrom.ch>
-
Now that dEQP should be happy, lets flip the switch. Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
In particular, we need to invalidate the LRZ state when we cannot be confident in what the Z state would be during rendering: 1) depth test modes not supported by LRZ 2) stencil test, which would require full rasterization and stencil test in the binning pass (whereas LRZ normally just needs to determine the min and max z value in an 8x8 quad) Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Seems to be a bit different for a6xx, so let's split this out. Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Marek Olšák authored
Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
-
Marek Olšák authored
Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
-
Marek Olšák authored
based on PAL. Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
-