WIP: rtpsession: use internal RTPSource with correct sender ssrc in bundled session
Hello all, opening this MR, for the start of a discussion, would be grateful for your input; equally happy to open an issue if preferred.
Problem:
When bundling audio and video on the same RTP session we may send PLI, FIR or NACK messages with the Sender-SSRC
field set to the SSRC of the local audio stream rather than the video stream.
According to RFC 8108 section 5.4.1
o RTCP feedback packets relating to a particular media type SHOULD
be sent by an SSRC that receives that media type. For example,
when audio and video are multiplexed onto a single RTP session,
endpoints will use their audio SSRC to send feedback on the audio
received from other participants.
...
Cause:
The GstForceKeyUnit
request sent by the decoder when an intra is required contains the correct SSRC for the incoming media stream. When this event reaches rtpsession
a flag is set on the correct remote RTPSource
and RTCP generation is immediately triggered. Unfortunately the RTCP code in rtp_session_on_timeout
just iterates over all internal sources using g_hash_table_foreach
and generates RTCP for all remote sources. If this happens to run for the internal audio source first then any PLIs generated, for instance, will have the wrong Sender-SSRC
.
Discussion:
rtpsession
really at present doesn't have a nice way to tie particular internal
and external
RTP sources together on media type. The implementation attempts to be a stateless as possible in this regard. Purely containing a number of RTPSources
that are created on the fly based on SSRC.
So part of the solution could involve keeping a hashtable to map corresponding Sender-SSRCs
to Media-SSRCs
.
The first prototype here is naive, although is merited by containing all the logic within rtpsession
itself. Here we can infer the map of SSRCs by the rtcp feedback messages we receive off-the-wire. This method however relies on the remote sender to have the correct implementation and us to have received a first feedback packet to build the map before sending our own; else we do a fallback to current behaviour. Also this prototype may go further than rfc, that states we need to match Sender-SSRC
as far as the media type, in scenarios of one audio and multiple video sources for example.