- 12 Mar, 2020 26 commits
-
-
Bas Nieuwenhuizen authored
There are multiple LLVM passes that very much move the intrinsic using the descriptor outside of the loop, defeating the entire point of creating the loop. Defeat the optimizer by splitting the break into a separate if-statement and putting an optimization barrier on the bool in between. v2: Move from a callback based system to begin/end loop. This does not make it significantly less intrusive but is a bit nicer with all the extra struct and callback stubs. v3: Deal with non-divergent values in divergent path. Closes: #2160 Fixes: 028ce527 "radv: Add non-uniform indexing lowering." Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Marge Bot <!4109> Part-of: <!4109>
-
Ian Romanick authored
We considered moving this down near the call to insert_gen4_send_dependency_workarounds. By that point it's too late for a couple reasons. One, we're potentially increasing resiter pressure that may lead to anoter spill. Two, fixup_3src_null_dest tries to allocate a VGRF, but the post-register allocation shader uses physical registers. Closes: #2621 Fixes: ba2fa1ce ("intel/fs: Do cmod prop again after scheduling") Reviewed-by:
Matt Turner <mattst88@gmail.com> Tested-by: Marge Bot <!4155> Part-of: <!4155>
-
Timur Kristóf authored
Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Marge Bot <mesa/mesa!4159> Part-of: <mesa/mesa!4159>
-
Timur Kristóf authored
This will lower dynamic quad broadcasts into something that both LLVM and ACO can understand. On hardware which supports shuffles, they are lowered to shuffle, on older hardware (GFX6-7) they will get lowered to constant quad broadcasts. Fixes dEQP-VK.subgroups.quad.*.subgroupquadbroadcast_nonconst_* Cc: mesa-stable@lists.freedesktop.org Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Marge Bot <!4147> Part-of: <!4147>
-
Timur Kristóf authored
Some hardware doesn't support subgroup shuffle, and on such hardware it makes no sense to lower quad broadcasts to shuffle. Instead, let's lower them to four const quad broadcasts, paired with bcsel instructions. Cc: mesa-stable@lists.freedesktop.org Signed-off-by:
Timur Kristóf <timur.kristof@gmail.com> Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <!4147>
-
Eric Engestrom authored
gen_release_notes: resolve ambiguity by renaming `version` to `previous_version` and `next_version` to `this_version` Signed-off-by:
Eric Engestrom <eric@engestrom.ch> Reviewed-by:
Dylan Baker <dylan@pnwbakers.com> Tested-by: Marge Bot <mesa/mesa!4113> Part-of: <mesa/mesa!4113>
-
Eric Engestrom authored
Fixes: 86079447 ("scripts: Add a gen_release_notes.py script") Signed-off-by:
Eric Engestrom <eric@engestrom.ch> Reviewed-by:
Dylan Baker <dylan@pnwbakers.com> Part-of: <!4113>
-
Alyssa Rosenzweig authored
Once LCRA has run, we have a map from IR indices to byte offsets into the register file, so we need to "install" these results, rewriting the IR to use native registers and fixing up writemasks/swizzles to substitute vectorization for adjacent registers (for LCRA, we're modeling in terms of real vectors). Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <!4158> Part-of: <!4158>
-
Alyssa Rosenzweig authored
We model the machine as vector (with restrictions) to natively handle mixed types and I/O and other goodies. We use LCRA for the heavylifting. This commit adds only the modeling to feed into LCRA and spit LCRA solutions back; next commit will integrate it with the IR. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4158>
-
Alyssa Rosenzweig authored
We want types to be consistent throughout the IR so we don't have to make exceptions to parse things out. These cases just got missed. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4158>
-
Alyssa Rosenzweig authored
The issue was messing with liveness analysis... with Midgard we look at the writemask to decide how the instruction behaves. Here, since our ALU is scalar (except for subdivision which doesn't have proper writemasks anyway) we just look at the component count directly -- either 4 for vector instructions (essentially - for smaller loads we can replicate manually without much burden), or 1 for scalar. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <mesa/mesa!4158>
-
Alyssa Rosenzweig authored
Found during RA bringup. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4158>
-
Alyssa Rosenzweig authored
Ironically, this comment was mistakenly added by the commit that fixed the purported issue in the comment (1bce7fde - found by `git blame`) Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4158>
-
Alyssa Rosenzweig authored
We'll want to use it for the Bifrost RA as well. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4158>
-
Rhys Perry authored
This fixes UBSan warnings when foreach_list_typed_safe() passes NULL: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffa8 Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Matt Turner <mattst88@gmail.com> Tested-by: Marge Bot <!4157> Part-of: <!4157>
-
Rhys Perry authored
Shouldn't create any incorrect waitcnts but may create suboptimial waitcnts in rare cases. Signed-off-by:
Rhys Perry <pendingchaos02@gmail.com> Reviewed-by:
Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <!4133> Part-of: <!4133>
-
Samuel Pitoiset authored
Otherwise, LLVM optimizes it but it's actually incorrect. Fixes: 0f45d4dc ("ac: add ac_build_readlane without optimization barrier") Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Tested-by: Marge Bot <!3585> Part-of: <!3585>
-
Tapani Pälli authored
This enables additional EGL configs where we have depth/stencil buffer with different number of bits per pixel than color buffer has. This enables some Android games to work that require such config. Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Marge Bot <!4127> Part-of: <!4127>
-
Hyunjun Ko authored
Follow the way that freedreno is doing so that we could see the whole layout of the scratch buffer. Signed-off-by:
Hyunjun Ko <zzoon@igalia.com> Reviewed-by:
Jonathan Marek <jonathan@marek.ca> Tested-by: Marge Bot <!3942> Part-of: <!3942>
-
Hyunjun Ko authored
Signed-off-by:
Hyunjun Ko <zzoon@igalia.com> Reviewed-by:
Jonathan Marek <jonathan@marek.ca> Part-of: <!3942>
-
Hyunjun Ko authored
TODO. We should implement this since indirect draw is enabled. Signed-off-by:
Hyunjun Ko <zzoon@igalia.com> Reviewed-by:
Jonathan Marek <jonathan@marek.ca> Part-of: <!3942>
-
Hyunjun Ko authored
1. Implement vkCmdBindTransformFeedbackBuffersEXT, vkCmdBeginTransformFeedbackEXT and vkCmdEndTransformFeedbackEXT. - Not handling counter buffers yet. 2. Implement streamout emit function, mostly taken from fd6_emit.c v2. Replace emit_pkt4 funcs with emit_regs. v3. Don't copy the state of stream-output from tu_pipeline. v4. Set zero to VPC_SO_CNTL/VPC_SO_BUF_CNTL in tu6_init_hw. Signed-off-by:
Hyunjun Ko <zzoon@igalia.com> Reviewed-by:
Jonathan Marek <jonathan@marek.ca> Part-of: <!3942>
-
Hyunjun Ko authored
Mostly taken from fd6_program.c. v2. Note that it forces to use full VS instead of binning pass VS if there's stream output as the binning pass VS will have outputs on other than position/psize stripped out, which is the same as freedreno. v3. fix indentation. v4. Use register index instead of location when setup streamout. Signed-off-by:
Hyunjun Ko <zzoon@igalia.com> Reviewed-by:
Jonathan Marek <jonathan@marek.ca> Part-of: <!3942>
-
Hyunjun Ko authored
Define new structures for streamout buffers and state. Most members of the state struct are taken from freedreno driver. v2. Use IR3_MAX_SO_* and avoid using magic values. v3. Remove the state of stream-output in tu_cmd_state and use one in tu_pipeline and split out reset and enabled fields. Signed-off-by:
Hyunjun Ko <zzoon@igalia.com> Reviewed-by:
Jonathan Marek <jonathan@marek.ca> Part-of: <!3942>
-
Hyunjun Ko authored
- Add one member to the existed ir3_stream_output so that we could assign location information from nir_xfb_info, rather than defining new struct. - Redefine maximum of so buffers, streams and outputs, which will be used for turnip. - Also enable caps for transform feedback for spirv_to_nir. v2. Remove redefined maximums and use IR3_MAX_SO_* and add IR3_MAX_SO_STREAMS. v3. Remove the newly added location field so that we could keep aligned with 32 bytes. Instead we create an array mapping between the location and consecutive index, which is GL driver is doing. Signed-off-by:
Hyunjun Ko <zzoon@igalia.com> Reviewed-by:
Jonathan Marek <jonathan@marek.ca> Part-of: <!3942>
-
David Stevens authored
When creating an egl surface from an ANativeWindow, the window's usage flags need to be set so that buffers are allocated properly. Signed-off-by:
David Stevens <stevensd@chromium.org> Reviewed-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Lepton Wu <lepton@chromium.org>
-
- 11 Mar, 2020 14 commits
-
-
Emma Anholt authored
This supports powering up the device (using an external tool you provide based on your particular lab), talking over serial to wait for the fastboot prompt, and then booting a fastboot image on a target device. I was previously relying on LAVA for this, but that ran afoul of corporate policies related to the AGPL. However, LAVA wasn't doing too much for us, given that gitlab already has a job scheduler and tagging and runners. We were spending a lot of engineering on making the two systems match up, when we can just have gitlab do it directly. Lightly-reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com> Tested-by: Marge Bot <!4076> Part-of: <!4076>
-
Emma Anholt authored
The debian firmware package doesn't actually contain it, costing us a minute of boot time waiting for it to show up. Lightly-reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com> Part-of: <!4076>
-
Emma Anholt authored
This is useful for sanity checking how the driver loads. Lightly-reviewed-by:
Kristian H. Kristensen <hoegsberg@google.com> Part-of: <!4076>
-
Yevhenii Kolesnikov authored
Knowing following: - CMP writes to flag register the result of applying cmod to the `src0 - src1`. After that it stores the same value to dst. Other instructions first store their result to dst, and then store cmod(dst) to the flag register. - inst is either CMP or MOV - inst->dst is null - inst->src[0] overlaps with scan_inst->dst - inst->src[1] is zero - scan_inst wrote to a flag register There can be three possible paths: - scan_inst is CMP: Considering that src0 is either 0x0 (false), or 0xffffffff (true), and src1 is 0x0: - If inst's cmod is NZ, we can always remove scan_inst: NZ is invariant for false and true. This holds even if src0 is NaN: .nz is the only cmod, that returns true for NaN. - .g is invariant if src0 has a UD type - .l is invariant if src0 has a D type - scan_inst and inst have the same cmod: If scan_inst is anything than CMP, it already wrote the appropriate value to the flag register. - else: We can change cmod of scan_inst to that of inst, and remove inst. It is valid as long as we make sure that no instruction uses the flag register between scan_inst and inst. Nine new cmod_propagation unit tests: - cmp_cmpnz - cmp_cmpg - plnnz_cmpnz - plnnz_cmpz (*) - plnnz_sel_cmpz - cmp_cmpg_D - cmp_cmpg_UD (*) - cmp_cmpl_D (*) - cmp_cmpl_UD (*) this would fail without changes to brw_fs_cmod_propagation. This fixes optimisation that used to be illegal (see issue #2154) = Before = 0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F 1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f = After = 0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F Now it is optimised as such (note change of cmod in line 0): = Before = 0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F 1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f = After = 0: linterp.nz.f0.0(8) vgrf0:F, g2:F, attr0<0>:F No shaderdb changes Closes: #2154 Signed-off-by:
Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by:
Matt Turner <mattst88@gmail.com> Tested-by: Marge Bot <!3348> Part-of: <!3348>
-
Alyssa Rosenzweig authored
Off-by-one. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <!4150> Part-of: <!4150>
-
Alyssa Rosenzweig authored
We have native FMA which works for graphics usage (unlike Midgard where it's really reserved for compute for various reasons), let's use it. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4150>
-
Alyssa Rosenzweig authored
Now that we have liveness analysis, we can cleanup the IR considerably. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4150>
-
Alyssa Rosenzweig authored
Now that all the guts are shared with Midgard, it's just a matter of wiring it in. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4150>
-
Alyssa Rosenzweig authored
Instead of trying to reindex all the times, just be okay with consistent but sparse indices, then figuring out the max index is easy enough. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4150>
-
Alyssa Rosenzweig authored
From Midgard. These are surprisingly helpful. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4150>
-
Alyssa Rosenzweig authored
Same purpose as the Midgard version, but the implementation is *dramatically* simpler thanks to our more regular IR. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <mesa/mesa!4150>
-
Alyssa Rosenzweig authored
While we're at it, cleanup the Midgard one. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <mesa/mesa!4150>
-
Alyssa Rosenzweig authored
We can move e v e n more code to be shared and let bi_block inherit from pan_block, which will allow us to use the shared data flow analysis. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4150>
-
Alyssa Rosenzweig authored
This way we can share the code with Bifrost. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <!4150>
-