WIP: lima: Split indexed draws
The blob splits indexed draws in cases when the index buffer is not
contiguous. This is done as soon as we have a gap > 8 in the index buffer.
index_buffer = [0, 10, 2] -> results in a single draw (0~10)
index_buffer = [0, 11, 2] -> results in two draws (0~2), (10)
So it seems, that the blob uses a gap of 11 - 2 = 9, from which on draws
This effort tries to implement this behaviour.
What it mainly does, is:
- Allocate shadow CPU memory when a buffer for indices is created.
This is currently only done, if the buffer is bound with
There is a deqp test (functional.buffer.write.use.index_array.array),
which uses PIPE_BIND_BUFFER_VERTEX to upload the indices. This case isn't
handled in this MR, the blob does that split though.
- Copy the dirty part of the shadow buffer back to GPU mem during unmapping
- Within the indexed draw we do some more or less magic things :)
We memcpy that shadow memory to a temporary one,
do a quick sort on it and see how many draws we need, when splitting the draws,
if the gap between the used indices is > 8.
This can still be improved. We can use a better algorithm, to find the gaps
and we should probably think about how to avoid the memcpy and qsort with each draw.
- Fill an array of struct (ctx->minmax) with corresponding min/ max values
depending on the number of draws we calculated for usage in
VS CMD stream creation.
- Use correct addresses for varyings and attributes including the right offset
- Use the ctx->minmax array and create VS CMDS while iteration through the array
The main issue on this effort is, that we need to get benchmarks, if this really is
an optimization and if so, what the best value of the gap is.
We have to get some benchmarks first to drop the WIP.
There are no regressions with dEQP-GLES2 tests.