- Apr 24, 2024
-
-
NIR can't handle those component counts, so we have to split it into 2 SGPR vectors where each has max 4 components. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
It optimizes constants in srcs to 16 bits. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
This is from when we had TGSI and NIR was a debug option. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
This extracts the relevant parts from si_fast_clear. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
They use compute shaders, which always support the render condition. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
si_make_{CB/DB}_shader_coherent are more correct. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
for better slow clear performance Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
If the clear color isn't 0 or 1, we used a slow clear. This adds a new DCC clear where the DCC buffer is cleared to a special value and the clear color is stored at the beginning of each 256B block in the image. It can be very fast, but it's not always faster than a slow clear. There is a heuristic that determines whether this new fast clear is better. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
It's slower because it forces preservation of NaNs. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Gfx blits preserve NaNs now, so this is no longer needed. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Gfx copies are faster sometimes, so they should be able to copy anything. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
for a new performance test. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Skip lowering because there is nothing to lower. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <!28725>
-
GFX9+ should assume the enabled RB results are packed (no holes). Same as PAL. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Since the TCS epilog is no more, this is required to apply those bits to monolithic shaders. tessfactors_are_def_in_all_invocs was unused. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
so that we don't duplicate the condition everywhere Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
otherwise the options would be ignored if the shader cache had already cached the same shader with the option inverted. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Some FREE calls are removed because nir_options is always NULL there. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
They seem to work fine. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Only prefetch shader instructions. There will be more GFX versions in that list. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
A16 dictates the type. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <mesa/mesa!28725>
-
Starting from MTL there is registers in HW to read the IP version of graphics, media and display IPs, those registers are called GMD. IPs can be used in any combination to form a SOC/platform and each IP has it own stepping/revision, making complex to track each IP stepping using just PCI revision. Since MTL will be supported by default by i915 KMD that don't have a uAPI fetch IP versions, this feature will only be supported in LNL and newer that are backed by Xe KMD. Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <mesa/mesa!26908>
-
Sync xe_drm.h with 31ced035ecde ("drm/xe/uapi: Restore flags VM_BIND_FLAG_READONLY and VM_BIND_FLAG_IMMEDIATE"). Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <mesa/mesa!26908>
-
To avoid underflow. Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Part-of: <mesa/mesa!28879>
-
In some (probably malformed) cases, even weights BOs for strided or depthwise convolutions can become bigger when using ZRL compression. To avoid running out of space in the BO, play safe and calculate the actual optimum ZRL bit count. This does slow compilation for quite a bit, though (2x slower for MobileNetV1). Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Part-of: <mesa/mesa!28879>
-
By using the on-chip SRAM to cache the input image we can save some more bandwidth and increase the utilization of the NN cores, with the following improvements: MobileNetV1: 9.991ms -> 6.2ms SSDLite MobileDet: 27ms -> 24.3ms Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Part-of: <mesa/mesa!28879>
-
The blob sets it in some cases, but doesn't seem to make any difference. Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Part-of: <mesa/mesa!28879>
-
We were wrongly counting the remaining number of output channels in the last superblock, when the former isn't divisible by the latter. MobileNetV1: 9.991ms -> 9.991ms SSDLite MobileDet: 32.692ms -> 27ms Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Part-of: <mesa/mesa!28879>
-
A zero tile dimension doesn't make sense. Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Part-of: <mesa/mesa!28879>
-