RFC: Refactor how accumulators are handled
Currently in Orc, accumulators are handled as special opcodes. Such opcodes have the flag ORC_STATIC_OPCODE_ACCUMULATOR
https://gitlab.freedesktop.org/gstreamer/orc/-/blob/main/orc/orcopcode.h#L28 and the current logic is:
- The accumulators always end up adding the current accumulator value to the actual opcode operation
- In case a different kind of operation to accumulate is needed, a new opcode has to be created for that particular case
- The initial value of the accumulators is always set to zero (through pxor, for example)
- Given that the accumulator is added at every loop iteration, and the operation is always an addition, loop_shifts (partial data processing) are not a problem, adding zero does not alter the final value.
The opcodes accw
, accl
and accsadubl
are the current accumulator opcodes.
My actual requirement is to implement a maxf
accumulator, that is, do a maxf
for an array of floats, accumulate the maximum value and keep it. The approach would be to do the same as point 2 above. Create a new opcode like accmaxf
and change the conditions of point 3 and point 4 to handle partial data processing and initialize the data with MINF
instead of 0.
Now, I think this will pollute Orc with more opcodes where the actual operation of the accumulation (the maxf
) already exists. My proposal would be to just use regular opcodes as accumulator setters.
- Add a new flag to mark opcodes that can accumulate, something like
ORC_STATIC_OPCODE_CAN_ACCUMULATE
(actually commutative operations). - Add a sanity check that when using an opcode into a destination variable that is an accumulator, force to make src arg1 also the same variable, i.e
a1 = a1 + s1
- Pass the initial value of the accumulator as part of the OrcExecutor, similar to the current situation of storing the value, but loading it too.
- At each loop iteration (depending on the loop shift), correctly blend the initial accumulator value with the actual parameter to use. That means, if the initial value is 1 1 1 1 (double words on SSE, because of a multiplication) and the loop shift only holds 4 bytes of data (one double word), blend the value to be 1 1 1 X.
- Out of every loop, re-use the same opcode to reduce the vector into a single value.
Of course all of that, maintaining compatibility with old acc*
based opcodes. All of that is already implemented in https://gitlab.freedesktop.org/turran/orc/-/compare/main...accumulators?from_project_id=1360&straight=false with backwards compatibility and without breaking any test.
Please, let me know your thoughts.
PS: I sincerely think that a refactoring in Orc is needed, adding new features is complicated. I can manage that, but need the maintainers opinion on this topic. Maybe I should create another issue to discuss there?