Commits · 24.1-branchpoint · Ravikiran Pallapatula / mesa

Apr 24, 2024

radeonsi: implement user_data_amd for 5, 6, and 7 components correctly · c3fc214a

Marek Olšák authored 10 months ago and

Marge Bot committed 9 months ago


NIR can't handle those component counts, so we have to split it into 2
SGPR vectors where each has max 4 components.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

c3fc214a

radeonsi: use ip_type in debug code instead of hardcoding GFX · 882ee264
Marek Olšák authored 10 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
882ee264

radeonsi: always run nir_opt_16bit_tex_image · e7000c02

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


It optimizes constants in srcs to 16 bits.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

e7000c02

radeonsi: only expose 8 EQAA samples due to shader limitations · 18bcdbb6
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
18bcdbb6

radeonsi: don't add whether NIR is used into the shader key · 256cc77f

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


This is from when we had TGSI and NIR was a debug option.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

256cc77f

radeonsi: make clear_render_target clear DCC directly instead of via pipe->clear() · e5c8f078
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
This extracts the relevant parts from si_fast_clear.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
e5c8f078

radeonsi: enable fast FB clears for conditional rendering · eccaba9d

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


They use compute shaders, which always support the render condition.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

eccaba9d

radeonsi: don't flush CB and DB if there have been no draw calls · 9a47fbec
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
9a47fbec
radeonsi: don't flush CB in si_launch_grid_internal_images if not needed · f0160443
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
f0160443

radeonsi: don't use si_get_flush_flags() for flushing images · 708f57e6

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


si_make_{CB/DB}_shader_coherent are more correct.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

708f57e6

radeonsi: disable VRS flat shading for selected 8xMSAA and thick tiling cases · 38f74d62
Marek Olšák authored 10 months ago and Marge Bot committed 9 months ago
```
for better slow clear performance

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
38f74d62

radeonsi/gfx11: implement DCC clear to "single" for fast non-0/1 clears · 86131c25

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


If the clear color isn't 0 or 1, we used a slow clear. This adds a new
DCC clear where the DCC buffer is cleared to a special value and the clear
color is stored at the beginning of each 256B block in the image.

It can be very fast, but it's not always faster than a slow clear.
There is a heuristic that determines whether this new fast clear is
better.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

86131c25

radeonsi: don't call resource_copy_region in pipe->blit · 10ec4689

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


It's slower because it forces preservation of NaNs.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

10ec4689

radeonsi: change allow_flat_shading to make it a single condition · 26a59558
Marek Olšák authored 10 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
26a59558

radeonsi: remove si_use_compute_copy_for_float_formats · 494cad56

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


Gfx blits preserve NaNs now, so this is no longer needed.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

494cad56

radeonsi: use simpler UINT fallback formats for draw-based resource_copy_region · 18b7b2c8
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
18b7b2c8

radeonsi: preserve NaNs in draw-based resource_copy_region · 8235d3aa

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


Gfx copies are faster sometimes, so they should be able to copy anything.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

8235d3aa

radeonsi: move blitter clear_render_target impl into si_gfx_clear_render_target · a03df53d
Marek Olšák authored 10 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
a03df53d
radeonsi: move blitter resource_copy_region implementation to si_gfx_copy_image · 82e63db9
Marek Olšák authored 10 months ago and Marge Bot committed 9 months ago
```
for a new performance test.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
82e63db9

radeonsi: allow input NIR to use descriptors in image opcodes · e9481320

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


Skip lowering because there is nothing to lower.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

e9481320

radeonsi: don't expose samples_identical and don't lower FMASK if it's disabled · 30fab15f
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <!28725>
```
30fab15f

radeonsi: fix initialization of occlusion query buffers for disabled RBs · dab4295c

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


GFX9+ should assume the enabled RB results are packed (no holes).
Same as PAL.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

dab4295c

radeonsi: move TCS epilog key bits to the key->ge.opt section · aad2302c

Marek Olšák authored 9 months ago and

Marge Bot committed 9 months ago


Since the TCS epilog is no more, this is required to apply those bits
to monolithic shaders.

tessfactors_are_def_in_all_invocs was unused.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

aad2302c

radeonsi: check has_stable_pstate in the winsys · d29d215d

Marek Olšák authored 10 months ago and

Marge Bot committed 9 months ago


so that we don't duplicate the condition everywhere

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

d29d215d

radeonsi: add the radeonsi_optimize_io option into the shader cache key · a094339d

Marek Olšák authored 10 months ago and

Marge Bot committed 9 months ago


otherwise the options would be ignored if the shader cache had already
cached the same shader with the option inverted.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

a094339d

radeonsi: use the same nir_lower_subgroups_options as RADV · 3630c11c

Marek Olšák authored 10 months ago and

Marge Bot committed 9 months ago


Some FREE calls are removed because nir_options is always NULL there.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

3630c11c

radeonsi/gfx11: enable DCC fast clears for 8-bit and 16-bit formats · adde1dba
Marek Olšák authored 10 months ago and Marge Bot committed 9 months ago
```
They seem to work fine.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
adde1dba

radeonsi/gfx11: don't prefetch constants in binaries into the instruction cache · d478693d

Marek Olšák authored 10 months ago and

Marge Bot committed 9 months ago


Only prefetch shader instructions. There will be more GFX versions
in that list.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>

d478693d

radeonsi/ci: update gfx11 failures · 71ae7b85
Marek Olšák authored 10 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
71ae7b85
ac/surface: constify and reindent NIR meta address-from-coord function params · 665df08a
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
665df08a
ac/llvm: always trim components of texture instructions, trim DMASK · cce1aa47
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
cce1aa47
ac/llvm: fix assertions for texture instructions with 16-bit LOD bias · 83a601d4
Marek Olšák authored 9 months ago and Marge Bot committed 9 months ago
```
A16 dictates the type.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <mesa/mesa!28725>
```
83a601d4

intel/dev: Read GFX IP version during runtime · 708b0a7c

José Roberto de Souza authored 11 months ago and

Marge Bot committed 9 months ago


Starting from MTL there is registers in HW to read the IP version of
graphics, media and display IPs, those registers are called GMD.

IPs can be used in any combination to form a SOC/platform and each IP
has it own stepping/revision, making complex to track each IP stepping
using just PCI revision.

Since MTL will be supported by default by i915 KMD that don't have
a uAPI fetch IP versions, this feature will only be supported in LNL
and newer that are backed by Xe KMD.

Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <mesa/mesa!26908>

708b0a7c

intel: Sync xe_drm.h · 4d3fee0b

José Roberto de Souza authored 1 year ago and

Marge Bot committed 9 months ago

Sync xe_drm.h with 31ced035ecde ("drm/xe/uapi: Restore flags VM_BIND_FLAG_READONLY and VM_BIND_FLAG_IMMEDIATE").

Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <mesa/mesa!26908>

4d3fee0b

etnaviv/nn: Keep track of the sign bit when decrementing to zero · a78e98f1
Tomeu Vizoso authored 9 months ago and Marge Bot committed 9 months ago
```
To avoid underflow.

Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <mesa/mesa!28879>
```
a78e98f1

etnaviv/nn: Don't shortcut ZRL bits calculation · 9bac40b7

Tomeu Vizoso authored 9 months ago and

Marge Bot committed 9 months ago


In some (probably malformed) cases, even weights BOs for strided or depthwise
convolutions can become bigger when using ZRL compression.

To avoid running out of space in the BO, play safe and calculate the
actual optimum ZRL bit count. This does slow compilation for quite a
bit, though (2x slower for MobileNetV1).

Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <mesa/mesa!28879>

9bac40b7

etnaviv/nn: Enable image cache · d46e68c8

Tomeu Vizoso authored 11 months ago and

Marge Bot committed 9 months ago


By using the on-chip SRAM to cache the input image we can save some more
bandwidth and increase the utilization of the NN cores, with the
following improvements:

MobileNetV1: 9.991ms -> 6.2ms
SSDLite MobileDet: 27ms -> 24.3ms

Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <mesa/mesa!28879>

d46e68c8

etnaviv/nn: Move unused field to its right place in the struct · d6045ca5

Tomeu Vizoso authored 11 months ago and

Marge Bot committed 9 months ago


The blob sets it in some cases, but doesn't seem to make any difference.

Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <mesa/mesa!28879>

d6045ca5

etnaviv/nn: Fix calculation of remaining out channels · c75b5126

Tomeu Vizoso authored 11 months ago and

Marge Bot committed 9 months ago


We were wrongly counting the remaining number of output channels in the
last superblock, when the former isn't divisible by the latter.

MobileNetV1: 9.991ms -> 9.991ms
SSDLite MobileDet: 32.692ms -> 27ms

Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <mesa/mesa!28879>

c75b5126

etnaviv/nn: Ensure tile_y is > 0 · baebd6f4

Tomeu Vizoso authored 11 months ago and

Marge Bot committed 9 months ago


A zero tile dimension doesn't make sense.

Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <mesa/mesa!28879>

baebd6f4

Admin message