Commits · lima-vertex-fixes · Vasily Khoruzhick / mesa

Nov 25, 2019


Since we're using a separate per-draw BO for GP outputs we don't
need suballocator anymore.

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>

8d705cd4

lima: use single BO for GP outputs · e9739bd1

Vasily Khoruzhick authored 5 years ago


Varyings, gl_Position and gl_PointSize are all GP outputs, so we
can use a single BO for them all. Also that allows us to get rid
of suballocator.

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>

e9739bd1

Nov 24, 2019
- lima: varying GP writes to varying BO, so use proper flag · 44d2108f
  Vasily Khoruzhick authored 5 years ago
  
  Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
  44d2108f
Nov 20, 2019

lima: split draw calls on 64k vertices · ca653dbb

Erico Nunes authored 5 years ago


The Mali400 only supports draws with up to 64k vertices per command.
To handle this, break the draw_vbo call into multiple commands.
Indexed drawing is left to a separate code path.
This implementation was ported from vc4_draw_vbo.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

ca653dbb

vc4: move the draw splitting routine to shared code · 6f90e8ff

Erico Nunes authored 5 years ago


This can also be useful for other hardware which has similar limitations
on vertex count per single draw.
The Mali400 has a similar limitation and can reuse this.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

6f90e8ff

lima: refactor indexed draw indices upload · 56d80f00

Erico Nunes authored 5 years ago


As of this commit this is just a refactor in preparation to enable
support for more than 64k vertices.
To support splitting the draw_vbo call, indices shouldn't be re-uploaded
every time.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

56d80f00

Nov 12, 2019

lima: allocate separate bo to store varyings · d2677825

Erico Nunes authored 5 years ago


The current strategy using the suballocator with fixed size doesn't
scale and causes some programs with large number of vertices (like some
glmark2 scenes) to crash.
Change it to dynamically allocate a separate bo to accomodate for
arbitrary number of vertices.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

d2677825

Nov 08, 2019

ac: Handle invalid GFX10 format correctly in ac_get_tbuffer_format. · 911a8261

Timur Kristóf authored 5 years ago


It happens that some games try to access a vertex buffer without
a valid format. This case was incorrectly handled by
ac_get_tbuffer_format which made ACO emit an invalid instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

911a8261

panfrost: Try to evict unused BOs from the cache · ee82f9f0

Boris Brezillon authored 5 years ago

The panfrost BO cache can only grow since all newly allocated BOs are
returned to the cache (unless they've been exported).

With the MADVISE ioctl that's not a big issue because the kernel can
come and reclaim this memory, but MADVISE will only be available on 5.4
kernels. This means an app can currently allocate a lot memory without
ever releasing it, leading to some situations where the OOM-killer kicks
in and kills the app (or even worse, kills another process consuming
more memory than the GL app) to get some of this memory back.

Let's try to limit the amount of BOs we keep in the cache by evicting
entries that have not been used for more than one second (if the app
stopped allocating BOs of this size, it's likely to not allocate
similar BOs in a near future).

This solution is based on the VC4/V3D implementation.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

ee82f9f0

panfrost: Move BO cache related fields to a sub-struct · 25059cc4

Boris Brezillon authored 5 years ago

We will soon introduce an LRU list to evict BOs that have been unused
for more than 1 second. Let's first move all BO cache fields to a
sub-struct to clarify which fields are used by the BO caching logic.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

25059cc4

pan/midgard: Switch base for vertex texturing on T720 · 5f768eda

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago

There aren't texture pipeline registers anymore; instead, space is
shared with work and ldst registers for output and input respectively.
We need to shift the base registers to represent this correctly.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

5f768eda

pan/midgard: Pass shader stage to disassembler · ac14facf

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago


Vertex texturing behaves differently from fragment texturing on some
GPUs.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

ac14facf

pan/midgard: Disassemble half-steps correctly · 51594120

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago


The meaning of some bits shifts; we need to account for this to print
swizzles sanely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

51594120

pan/midgard: Fix printing of half-registers in texture ops · ec2af6bc

Alyssa Rosenzweig authored 5 years ago and

Tomeu Vizoso committed 5 years ago

We were using old style half-registers; let's update that to be
consistent, preparing us for more disassmbler changes in this area.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

ec2af6bc

freedreno/ir3: Use regid() helper when setting up precolor regs · 4a4fad7f
Kristian Høgsberg authored 5 years ago
```
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
```
4a4fad7f

freedreno/a6xx: Turn on tessellation shaders · 3699a74a

Kristian Høgsberg authored 5 years ago


Wow. Very triangle. So shader.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

3699a74a

freedreno/a6xx: Only use merged regs and four quads for VS+FS · 53782571

Kristian Høgsberg authored 5 years ago


When other geometry stages are present, we chose two quads and no
merged regs.

Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>

53782571

freedreno/blitter: Save tessellation state · 07aedc36

Kristian Høgsberg authored 5 years ago


We have tessellation state now.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

07aedc36

freedreno/a6xx: Only set emit.hs/ds when we're drawing patches · d2d0c818

Kristian Høgsberg authored 5 years ago


At least the gallium blitter helper will call us to draw with
tessellation shaders set but a non-patch primitive.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>

Reviewed-by: Rob Clark <robdclark@gmail.com>

d2d0c818

freedreno: Use bypass rendering for tessellation · e5847908

Kristian Høgsberg authored 5 years ago


It seems like tiling could work in the Adreno architecture, but we've
only ever seen bypass rendering with tessellation.  For now, let's do
that too.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

e5847908

freedreno/a6xx: Program state for tessellation stages · 47e2c195

Kristian Høgsberg authored 5 years ago


Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

47e2c195

freedreno/a6xx: Emit constant parameters for tessellation stages · 03a30e7c

Kristian Høgsberg authored 5 years ago


Assemble the information the stages need and emit the constants.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

03a30e7c

freedreno/a6xx: Allocate and program tessellation buffer · 5dd51d2d

Kristian Høgsberg authored 5 years ago


Tessellation needs a couple of buffers that should hold the entire
output from a full VS+TCS draw call.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

5dd51d2d

freedreno/a6xx: Build the right draw command for tessellation · f0ef3e96

Kristian Høgsberg authored 5 years ago


We need to select the right primitive type, set a bit to turn on
tessellation and or in the TES output primitive type.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

f0ef3e96

freedreno/ir3: Allocate const space for tessellation parameters · 7272e8a7

Kristian Høgsberg authored 5 years ago


The tessellation stages need size and stride or the patch layout as
well as locations of attributes in the patch.  The tesselation stages
also use two system memory BOs and need the iovas of those.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

7272e8a7

freedreno/ir3: Pre-color TCS header and primitive ID inputs · 8739ea3a

Kristian Høgsberg authored 5 years ago


Similar to GS, the registers are shared and not reinitialized betewen
VS and TCS, so we need to make sure to allocate the same registers for
the system values between stages.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

8739ea3a

freedreno/ir3: Don't assume binning shader is always VS · b12ebe3e

Kristian Høgsberg authored 5 years ago


In tessellation mode, the TES is (probably) the binning shader.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

b12ebe3e

freedreno/ir3: Setup inputs and outputs for tessellation stages · 3cedeba7

Kristian Høgsberg authored 5 years ago


Similar to GS, some inputs are reused when the chsh from VS to TCS or
TES to GS, so we need to make sure we setup the right inputs and make
the shared system values outputs so they don't get clobbered.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

3cedeba7

freedreno/ir3: Implement TCS synchronization intrinsics · e28fbbd8

Kristian Høgsberg authored 5 years ago


We add two new IR3 specific nir intrinsics that map to the new condend
and endpatch instructions.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

e28fbbd8

freedreno/ir3: Implement tess coord intrinsic · 4915231b

Kristian Høgsberg authored 5 years ago


Our lowering pass made the z component unused by replacing its uses
by 1 - x - y.  The intrinsic implementation then just need to return
the x and y components.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

4915231b

freedreno/ir3: End TES with chsh when using GS · e16e48d0

Kristian Høgsberg authored 5 years ago


When we have both TES and GS, the TES needs to chain to the VS with
chmask and chsh GS just like the VS does to either TCS or GS.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

e16e48d0

freedreno/ir3: Add new synchronization opcodes · 581cd596

Kristian Høgsberg authored 5 years ago


There are two new opcodes in use in tesselation control shaders:
category 0, opcodes 13 and 15.  unk13 is a kill type of instruction
that terminates threads where !p0.x and it used to narrow down a patch
wavefront to just thread 0.  Then, once thread 0 has written the tess
levels, it issues unk15, which might signal the TE that another patch
has been fully written.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

581cd596

freedreno/ir3: Extend geometry lowering pass to handle tessellation · 56ed835b

Kristian Høgsberg authored 5 years ago


VS and TCS pass varyings the same way as VS and GS does. TCS then
writes entire patch to a system memory BO and TES eventually reads
back from the BO once the TE starts generating vertices.  TES outputs
vertices the same way as VS and GS, except when there's a GS as well,
in which case TES passes varyings to GS same way the VS would.

In addition, the TCS needs a little bit of control flow massaging so
that it only runs for valid invocations needs a couple of unknown
instructions to synchronize with the TE.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

56ed835b

freedreno/ir3: Add tessellation field to shader key · 8621fbc3

Kristian Høgsberg authored 5 years ago


Whether we're tessellating and which primitives the TES outputs
affects the entire pipeline so let's add a field to the key to track
that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

8621fbc3

freedreno/ir3: Use imul24 in offset calculations · 77b96b84

Kristian Høgsberg authored 5 years ago


With the imul24 opcode in place, we can now use it for computing local
offsets (ie for ldlw/stlw).

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

77b96b84

freedreno/ir3: Add ir3 intrinsics for tessellation · 41984c84

Kristian Høgsberg authored 5 years ago


These provide the iovas for system memory buffers used for
tessellation as well as a new HW specific system value.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

41984c84

freedreno: Don't count primitives for patches · d6209a50

Kristian Høgsberg authored 5 years ago


The gallium helper doesn't like patches and we can't determine how
many primitives it gets tessellated into anyway.  On gens where we
have tessellation, we get the prim count from a HW counter so just
skip counting on the CPU.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

d6209a50

freedreno/ir3: Add load and store intrinsics for global io · fe450ef4

Kristian Høgsberg authored 5 years ago


These intrinsics take a ivec2 for the 64 bit base address and a
integer offset.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

fe450ef4

freedreno/ir3: Emit link map as byte or dwords offsets as needed · 5d67da13

Kristian Høgsberg authored 5 years ago


Stages that load inputs with ldlw (TCS, GS) need byte offsets, stages
that load with ldg (TES) need dwords offsets.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

5d67da13

freedreno/a6xx: Add register offset for STG/LDG · 1f3b52ce

Kristian Høgsberg authored 5 years ago


These instructions take a 64 bit iova as two conescutive registers and
a immediate offset.  This patch adds support for the offset to be a
single register, which is added to the 64 bit iova.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

1f3b52ce

Admin message

Admin message