RFC: Refactor how accumulators are handled

Currently in Orc, accumulators are handled as special opcodes. Such opcodes have the flag ORC_STATIC_OPCODE_ACCUMULATOR https://gitlab.freedesktop.org/gstreamer/orc/-/blob/main/orc/orcopcode.h#L28 and the current logic is:

The accumulators always end up adding the current accumulator value to the actual opcode operation
In case a different kind of operation to accumulate is needed, a new opcode has to be created for that particular case
The initial value of the accumulators is always set to zero (through pxor, for example)
Given that the accumulator is added at every loop iteration, and the operation is always an addition, loop_shifts (partial data processing) are not a problem, adding zero does not alter the final value.

The opcodes accw, accl and accsadubl are the current accumulator opcodes.

My actual requirement is to implement a maxf accumulator, that is, do a maxf for an array of floats, accumulate the maximum value and keep it. The approach would be to do the same as point 2 above. Create a new opcode like accmaxf and change the conditions of point 3 and point 4 to handle partial data processing and initialize the data with MINF instead of 0.

Now, I think this will pollute Orc with more opcodes where the actual operation of the accumulation (the maxf) already exists. My proposal would be to just use regular opcodes as accumulator setters.

Add a new flag to mark opcodes that can accumulate, something like ORC_STATIC_OPCODE_CAN_ACCUMULATE (actually commutative operations).
Add a sanity check that when using an opcode into a destination variable that is an accumulator, force to make src arg1 also the same variable, i.e a1 = a1 + s1
Pass the initial value of the accumulator as part of the OrcExecutor, similar to the current situation of storing the value, but loading it too.
At each loop iteration (depending on the loop shift), correctly blend the initial accumulator value with the actual parameter to use. That means, if the initial value is 1 1 1 1 (double words on SSE, because of a multiplication) and the loop shift only holds 4 bytes of data (one double word), blend the value to be 1 1 1 X.
Out of every loop, re-use the same opcode to reduce the vector into a single value.

Of course all of that, maintaining compatibility with old acc* based opcodes. All of that is already implemented in https://gitlab.freedesktop.org/turran/orc/-/compare/main...accumulators?from_project_id=1360&straight=false with backwards compatibility and without breaking any test.

Please, let me know your thoughts.

PS: I sincerely think that a refactoring in Orc is needed, adding new features is complicated. I can manage that, but need the maintainers opinion on this topic. Maybe I should create another issue to discuss there?

Edited Jan 18, 2024 by Jorge Zapata