Skip to content

tu: Overhaul LRZ, implement on-GPU dir tracking and LRZ fast-clear

Comment from new tu_lrz.c:

Low-resolution Z buffer is very similar to a depth prepass that helps the HW avoid executing the fragment shader on those fragments that will be subsequently discarded by the depth test afterwards.

The interesting part of this feature is that it allows applications to submit the vertices in any order.

In the binning pass it is possible to store the depth value of each vertex into internal low resolution depth buffer and quickly test the primitives against it during the render pass.

There are a number of limitations when LRZ cannot be used:

  • Fragment shader side-effects (writing to SSBOs, atomic operations, etc);
  • Writing to stencil buffer
  • Writing depth while:
    • Changing direction of depth test (e.g. from OP_GREATER to OP_LESS);
    • Using OP_ALWAYS or OP_NOT_EQUAL;
  • Clearing depth with vkCmdClearAttachments;
  • (pre-a650) Not clearing depth attachment with LOAD_OP_CLEAR;
  • (pre-a650) Using secondary command buffers;
  • Sysmem rendering (with small caveat).

Pre-a650 (before gen3)

The direction is fully tracked on CPU. In renderpass LRZ starts with unknown direction, the direction is set first time when depth write occurs and if it does change afterwards - direction becomes invalid and LRZ is disabled for the rest of the renderpass.

Since direction is not tracked by GPU - it's impossible to know whether LRZ is enabled during construction of secondary command buffers.

For the same reason it's impossible to reuse LRZ between renderpasses.

A650+ (gen3+)

Now LRZ direction could be tracked on GPU. There are to parts:

  • Direction byte which stores current LRZ direction;
  • Parameters of the last used depth view.

The idea is the same as when LRZ tracked on CPU: when GRAS_LRZ_CNTL is used - its direction is compared to previously known direction and direction byte is set to disabled when directions are incompatible.

Additionally, to reuse LRZ between renderpasses, GRAS_LRZ_CNTL checks if current value of GRAS_LRZ_DEPTH_VIEW is equal to the value stored in the buffer, if not - LRZ is disabled. (This is necessary because depth buffer may have several layers and mip levels, on the other hand LRZ buffer represents only a single layer + mip level).

LRZ direction between renderpasses is disabled when underlying depth buffer is changed, the following commands could change depth image:

  • vkCmdBlitImage*
  • vkCmdCopyBufferToImage*
  • vkCmdCopyImage*

LRZ Fast-Clear

The LRZ fast-clear buffer is initialized to zeroes and read/written when GRAS_LRZ_CNTL.FC_ENABLE (b3) is set. It appears to store 1b/block. '0' means block has original depth clear value, and '1' means that the corresponding block in LRZ has been modified.

LRZ Caches

LRZ_FLUSH flushes and invalidates LRZ caches, there are two caches:

  • Cache for fast-clear buffer;
  • Cache for direction byte + depth view params. They could be cleared by LRZ_CLEAR. To become visible in GPU memory the caches should be flushed with LRZ_FLUSH afterwards.

GRAS_LRZ_CNTL reads from these caches.


Other notes:

LRZ fast-clear

  • Blob doesn't use fast-clear before gen3 since at least v615 (the older blob I have uses fast-clear on a630). It uses fast clear on a650+.
  • Fast-clear works with depth values of 0.0 and 1.0.

Unfortunately from my tests fast-clear is not beneficial... At most I saw 2% improvements, and on a real world case of PUBG - I saw 2% improvement in one renderpass and 0.4% worse perf in the second one, where the second one is an order of magnitude longer.

Though we don't set all configuration/workarounds/debug regs to the values from blob, which could affect the outcome.

Reverse-engineered regs:

I included changes from !7610 (closed)

  • New LRZ_CNTL fields:
    <bitfield name="DIR" low="6" high="7" type="a6xx_lrz_dir_status"/>
    <doc>
        If DISABLE_ON_WRONG_DIR enabled - write new LRZ direction into
        buffer, in case of mismatched direction writes 0 (disables LRZ).
    </doc>
    <bitfield name="DIR_WRITE" pos="8" type="boolean"/>
    <doc>
        Disable LRZ based on previous direction and the current one.
        If DIR_WRITE is not enabled - there is no write to direction buffer.
    </doc>
    <bitfield name="DISABLE_ON_WRONG_DIR" pos="9" type="boolean"/>
    <enum name="a6xx_lrz_dir_status">
        <value value="0x1" name="LRZ_DIR_LE"/>
        <value value="0x2" name="LRZ_DIR_GE"/>
        <value value="0x3" name="LRZ_DIR_INVALID"/>
    </enum>
  • LRZ direction is stored in byte at lrz_fc_offset + 0x200, which could be represented by enum:
   CUR_DIR_DISABLED = 0x0,
   CUR_DIR_GE = 0x1,
   CUR_DIR_LE = 0x2,
   CUR_DIR_UNSET = 0x3, // Clearing the buffer sets that byte to CUR_DIR_UNSET
  • GRAS_UNKNOWN_810A now is GRAS_LRZ_DEPTH_VIEW, its value (4 bytes) is written to lrz_fc_offset + 0x201:
	<bitfield name="BASE_LAYER" low="0" high="10" type="uint"/>
	<bitfield name="LAYER_COUNT" low="16" high="26" type="uint"/>
	<bitfield name="BASE_MIP_LEVEL" low="28" high="31" type="uint"/>

Observations on blob

  • Blob is much more happy to do LRZ_FLUSH, it flushes at the start of the renderpass, after binning, and at the end of the renderpass.
  • Blob seem not to care about changes in depth image done via vkCmdCopyImage.

Testing

Unfortunately VK CTS seem to be useless for testing LRZ...

I mostly tested these changes with ad hoc tests and dumped fast-clear buffer, direction buffer, and depth view at the start and the end of a renderpass.

I'm not sure that CTS would be ok with such tests. Maybe add tests to Crucible?

Edited by Danylo Piliaiev

Merge request reports