Commit ab37109d authored by Rob Clark's avatar Rob Clark 💬 Committed by Marge Bot
Browse files

freedreno/a6xx: Updates for tess_use_shared



The formula for calculating these two values seems to depend on
tess_use_shared, ie. a6xx_gen3 and a6xx_gen4 match.  The existing
calculation matches a6xx_gen1 and a6xx_gen2.

The new formula is based on traces varying # of output (from VS)
varyings from (1..31)*vec4 and vertices from (1..31) and coming up
with something that matches the blob.  Once hs_input_size*4 divided
by tcs_vertices_out goes above 64, this deviates a bit from the
blob, but AFAICT it is safe to pick a larger values.
Signed-off-by: Rob Clark's avatarRob Clark <robdclark@chromium.org>
Part-of: <mesa/mesa!12497>
parent 31835ac3
......@@ -639,29 +639,45 @@ setup_stateobj(struct fd_ringbuffer *ring, struct fd_context *ctx,
OUT_PKT4(ring, REG_A6XX_PC_TESS_NUM_VERTEX, 1);
OUT_RING(ring, hs_info->tess.tcs_vertices_out);
/* Total attribute slots in HS incoming patch. */
OUT_PKT4(ring, REG_A6XX_PC_HS_INPUT_SIZE, 1);
OUT_RING(ring, hs_info->tess.tcs_vertices_out * vs->output_size / 4);
const uint32_t wavesize = 64;
const uint32_t max_wave_input_size = 64;
const uint32_t patch_control_points = hs_info->tess.tcs_vertices_out;
if (ctx->screen->info->a6xx.tess_use_shared) {
unsigned hs_input_size = 6 + (3 * (vs->output_size - 1));
unsigned wave_input_size =
MIN2(64, DIV_ROUND_UP(hs_input_size * 4,
hs_info->tess.tcs_vertices_out));
OUT_PKT4(ring, REG_A6XX_PC_HS_INPUT_SIZE, 1);
OUT_RING(ring, hs_input_size);
OUT_PKT4(ring, REG_A6XX_SP_HS_WAVE_INPUT_SIZE, 1);
OUT_RING(ring, wave_input_size);
} else {
uint32_t hs_input_size =
hs_info->tess.tcs_vertices_out * vs->output_size / 4;
/* Total attribute slots in HS incoming patch. */
OUT_PKT4(ring, REG_A6XX_PC_HS_INPUT_SIZE, 1);
OUT_RING(ring, hs_input_size);
const uint32_t wavesize = 64;
const uint32_t max_wave_input_size = 64;
const uint32_t patch_control_points = hs_info->tess.tcs_vertices_out;
/* note: if HS is really just the VS extended, then this
* should be by MAX2(patch_control_points, hs_info->tess.tcs_vertices_out)
* however that doesn't match the blob, and fails some dEQP tests.
*/
uint32_t prims_per_wave = wavesize / hs_info->tess.tcs_vertices_out;
uint32_t max_prims_per_wave = max_wave_input_size * wavesize /
(vs->output_size * patch_control_points);
prims_per_wave = MIN2(prims_per_wave, max_prims_per_wave);
/* note: if HS is really just the VS extended, then this
* should be by MAX2(patch_control_points, hs_info->tess.tcs_vertices_out)
* however that doesn't match the blob, and fails some dEQP tests.
*/
uint32_t prims_per_wave = wavesize / hs_info->tess.tcs_vertices_out;
uint32_t max_prims_per_wave = max_wave_input_size * wavesize /
(vs->output_size * patch_control_points);
prims_per_wave = MIN2(prims_per_wave, max_prims_per_wave);
uint32_t total_size =
vs->output_size * patch_control_points * prims_per_wave;
uint32_t wave_input_size = DIV_ROUND_UP(total_size, wavesize);
uint32_t total_size =
vs->output_size * patch_control_points * prims_per_wave;
uint32_t wave_input_size = DIV_ROUND_UP(total_size, wavesize);
OUT_PKT4(ring, REG_A6XX_SP_HS_WAVE_INPUT_SIZE, 1);
OUT_RING(ring, wave_input_size);
OUT_PKT4(ring, REG_A6XX_SP_HS_WAVE_INPUT_SIZE, 1);
OUT_RING(ring, wave_input_size);
}
shader_info *ds_info = &ds->shader->nir->info;
OUT_PKT4(ring, REG_A6XX_PC_TESS_CNTL, 1);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment