- 02 May, 2019 40 commits
-
-
Erik Faye-Lund authored
This roughly mirrors what we get from autotools. There's a few differences, though: 1. The "exec_prefix" output has been dropped. Meson doesn't support this, so it makes no sense here. 2. The "llvm-config" output has been dropped. Meson abstracts dependency discovery a bit more than our autotools build-system does, so it's not easy to get this information as-is. 3. HUD extra stats, SWR archs, Shared/Static libs and CFLAGS / CXXFLAGS / LDFLAGS has been dropped. These can be inspected by "meson configure". 4. How we set defines works quite differently in our Meson build-system, and the result isn't quite the same. In particular, the DEFINES output has been dropped, to avoid having to refactor the code too much. Signed-off-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109326 Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com> Acked-by:
Dylan Baker <dylan@pnwbakers.com>
-
Erik Faye-Lund authored
Variables are cheap, and there's little reason for the dri and gallium drivers to work on the same variable for the driver list. So let's split these in two separate lists instead. This makes it easier to inspect these after-the fact, for instance for generating a summary of build-settings. Signed-off-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com> Acked-by:
Dylan Baker <dylan@pnwbakers.com>
-
Erik Faye-Lund authored
This way we can mark the dri_drivers and dri_link arrays as temporary, as all knowledge about them are contained in a single build-file with clearly visible limited life-span. Signed-off-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com> Acked-by:
Dylan Baker <dylan@pnwbakers.com>
-
Rob Clark authored
Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
We just need to do a sequence of commands to flush the cache. Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Rob Clark authored
Wire up support to sample from the fb (and force GMEM rendering when we have fb reads). The existing GLSL IR lowering for blend_equation_advanced does the rest. Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Rob Clark authored
Lower load_output to txf_ms_fb and add support for the new texture fetch instruction. Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Rob Clark authored
Needed for sampling from tile buffer (GMEM). Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Rob Clark authored
Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Rob Clark authored
Apparently we never hit this path. Or at least haven't for a rather long time. But in either case (load_deref or load_frag_coord), we can just directly use the intrinsic's ssa dest. So stop passing the nir_variable (which would be NULL in the load_frag_coord case) around and instead just use &intr->dest.ssa. (This ofc means we need to setup the cursor to insert *after* the instruction, which seems to be another bug of the original implementation.) Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Rob Clark authored
The extra comma at the end was annoying me. Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com>
-
Rob Clark authored
And a comment.. since we are mixing units of bytes/dwords/vec4, hopefully this will avoid some unit confusion. Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
It isn't quite as simple as not running the pass, since with packed varyings we get load_ubo for block==0 (ie. the "real" uniforms). So instead run the pass normally but decline to lower anything in block > 0 Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
Since we emit UBO regions INDIRECTly (ie. not copied into cmdstream but emit by EXT_SRC_ADDR) we need to keep them 4*vec4 aligned. Which the code already mostly did, except for aligning the first UBO region itself (ie. the one after block==0 which is the "real" uniforms). Fixes: 893425a6 freedreno/ir3: Push UBOs to constant file Fixes: 3c8779af freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Rob Clark authored
Otherwise we zero out the state again, but all the UBO loads that we could lower are already lowered. End result is that we didn't emit the uniforms for lowered UBO access in any case where multiple shader variants are used. Fixes: 893425a6 freedreno/ir3: Push UBOs to constant file Fixes: 3c8779af freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by:
Rob Clark <robdclark@chromium.org>
-
Lionel Landwerlin authored
Keen on having other people contribute. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Lionel Landwerlin authored
And fix the unused CmdDrawIndirect. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Lionel Landwerlin authored
Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Lionel Landwerlin authored
This is useful to normalize the numbers written into the output file as those number are accumulated over a period of time and number of frames. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Lionel Landwerlin authored
The output looks something like this (csv style) : fps, frame, frame_timing(us), submit, draw_indexed, pipeline_graphics, acquire_timing(us), vert_invocations, frag_invocations, gpu_timing(ns) 480.55, 242, 501512, 247, 1444, 1204, 714, 5827272, 113043296, 121424174 467.80, 234, 500214, 234, 1412, 1176, 648, 5635680, 109436188, 117743760 424.37, 213, 501923, 213, 2130, 1704, 623, 5132448, 99657292, 105474683 472.15, 237, 501962, 237, 2370, 1896, 667, 5710752, 110924644, 122226004 411.32, 206, 500826, 206, 2060, 1648, 709, 4963776, 96491764, 95333273 458.87, 230, 501228, 230, 2300, 1840, 634, 5542080, 107758204, 123112090 475.01, 238, 501044, 238, 2380, 1904, 631, 5734848, 111477480, 122087426 471.08, 236, 500972, 236, 2360, 1888, 655, 5686656, 110498496, 114816162 Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Lionel Landwerlin authored
Looks a bit better. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Lionel Landwerlin authored
In case you're just interested in data being record to the output file. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Lionel Landwerlin authored
v2: switch to VkBase{In,Out}Structure v3: Add timestamps at begin/end of primary command buffers to estimate gpu time spent per submission (Lionel) Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
-
Lionel Landwerlin authored
This significantly reworks how numbers displayed are computed. We accumulate operations written into command buffers and add those to the device when submitted to a queue. These collected values are then used to compute per frame overlay data. We also accumulate the data over the sampling fps period to produce numbers for that period of time. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-
Lionel Landwerlin authored
Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Lionel Landwerlin authored
This will be used to copy chains of structures so that we can alterate some of them. v2: Drop vk_util.h include (Eric) Use VkBaseInStructure directly (Eric) v3: Drop --platforms= param to generator script, instead produce a file with #ifdef based what platforms are compiled. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
[Alyssa: Add comment explanation] Signed-off-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Signed-off-by:
Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io>
-
Signed-off-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Signed-off-by:
Eric Engestrom <eric.engestrom@intel.com> Reviewed-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Emil Velikov <emil.velikov@collabora.com>
-
Connor Abbott authored
This was useful while debugging the previous commit. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Connor Abbott authored
nir_opt_algebraic is currently one of the most expensive NIR passes, because of the many different patterns we've added over the years. Even though patterns are already sorted by opcode, there are still way too many patterns for common opcodes like bcsel and fadd, which means that many patterns are tried but only a few actually match. One way to fix this is to add a pre-pass over the code that scans it using an automaton constructed beforehand, similar to the automatons produced by lex and yacc for parsing source code. This automaton has to walk the SSA graph and recognize possible pattern matches. It turns out that the theory to do this is quite mature already, having been developed for instruction selection as well as other non-compiler things. I followed the presentation in the dissertation cited in the code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep the naming similar. To create the automaton, we have to perform something like the classical NFA to DFA subset construction used by lex, but it turns out that actually computing the transition table for all possible states would be way too expensive, with the dissertation reporting times of almost half an hour for an example of size similar to nir_opt_algebraic. Instead, we adopt one of the "filter" approaches explained in the dissertation, which trade much faster table generation and table size for a few more table lookups per instruction at runtime. I chose the filter which resulted the fastest table generation time, with medium table size. Right now, the table generation takes around .5 seconds, despite being implemented in pure Python, which I think is good enough. Based on the numbers in the dissertation, the other choice might make table compilation time 25x slower to get 4x smaller table size, but I don't think that's worth it. As of now, we get the following binary size before and after this patch: text data bss dec hex filename 11979455 464720 730864 13175039 c908ff before i965_dri.so text data bss dec hex filename 12037835 616244 791792 13445871 cd2aef after i965_dri.so There are a number of places where I've simplified the automaton by getting rid of details in the LHS patterns rather than complicate things to deal with them. For example, right now the automaton doesn't distinguish between constants with different values. This means that it isn't as precise as it could be, but the decrease in compile time is still worth it -- these are the compilation time numbers for a shader-db run with my (admittedly old) database on Intel skylake: Difference at 95.0% confidence -42.3485 +/- 1.375 -7.20383% +/- 0.229926% (Student's t, pooled s = 1.69843) We can always experiment with making it more precise later. Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net>
-
Samuel Pitoiset authored
According to RadeonSI, this seems to be required by the hardware to avoid GPU hangs. I think I just forgot to set that bit when I implemented VK_EXT_transform_feedback. This fixes a GPU hang with Space Engineers and DXVK. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110291 Fixes: b4eb0290 ("radv: implement VK_EXT_transform_feedback") Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
-
Brian Paul authored
Trivial. Spotted by Eric Engestrom.
-
Brian Paul authored
valgrind crashes when we try to initialize host logging. This env var can be used to disable logging. v2: rebase onto "svga: move host logging to winsys". Cc: mesa-stable@lists.freedesktop.org Reviewed-by:
Neha Bhende <bhenden@vmware.com>
-
This patch adds a host_log interface to svga_winsys and moves the host logging code to the winsys layer. Cc: mesa-stable@lists.freedesktop.org Reviewed-by:
Brian Paul <brianp@vmware.com> Reviewed-by:
Neha Bhende <bhenden@vmware.com>
-
Signed-off-by:
Eric Engestrom <eric.engestrom@intel.com>
-
If the client has requested that AcquireNextImage not block at all, with a timeout of 0, then don't make any non-blocking calls. This will still potentially block infinitely given a non-infinte timeout, but the fix for that is much more involved. Signed-off-by:
Daniel Stone <daniels@collabora.com> Cc: mesa-stable@lists.freedesktop.org Cc: Chad Versace <chadversary@chromium.org> Cc: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108540 Acked-by:
Jason Ekstrand <jason@jlekstrand.net> Reviewed-by:
Chad Versace <chadversary@chromium.org> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Erik Faye-Lund authored
All other pages has the heading as ghe first thing in the article. Let's clean this up for consistency. Signed-off-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-
Erik Faye-Lund authored
The FAQ is the only article we have that uses a centered heading, which makes it look odd compared to the other articles. Let's drop the centering for consistency. Signed-off-by:
Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com>
-