Draft: RFC: Split out tensor decoders from the ONNX plugin and make the Tensor meta public. (!6000) · Merge requests · GStreamer / gstreamer

Olivier Crête requested to merge ocrete/gstreamer:split-out-tensor-decoders into main Jan 26, 2024

The current design of the ONNX code is that we output tensors as a GstMeta to the source buffer, and then there is a second element that does something useful with the tensor. This allows us to split the "engine" (ONNX-Runtime, TensorFlow, etc) and the model specific code.

I'm splitting it out of the ONNX plugin into the library as we intend to also build a TFLite plugin.

The thing I'm unsure about is how to describe tensors. Should we put those in caps like NNStreamer does? I'm not convinced by their approach to dimensions (all in one big string instead of something structure). They're all missing tensor names and the model name to make the whole thing understandable.

My current suggestion would be to use a caps-in-caps approach, something :

video/x-raw,
  tensors=(GstCaps)...

Where the caps would be around the lines of

data/tensors, model-name="ssd", tensors=(GstCaps)<...>

and each tensor describe as:

data/tensor, tensor="detections", dimensions=<2,3,4>, type="float16"

The tensor decoder can then precisely describe it's "input", either's an image with a tensor attached, or maybe it's just a tensor, or it could be something else.

Admin message

Draft: RFC: Split out tensor decoders from the ONNX plugin and make the Tensor meta public.

Merge request reports