rtpmanager/rtsession: race conditions leading to critical warnings
Conditions
While testing the implementation for insertable streams in webrtcsink
&
webrtcsrc
, I encountered multiple critical warnings, which turned out to
result from two race conditions in rtpsession
. Both race conditions produce:
GLib-CRITICAL: g_hash_table_foreach: assertion 'version == hash_table->version' failed
In its simplest form, the test consists in 2 pipelines and a Signalling server:
- pipelines_sink: audiotestsrc ! webrtcsink
- pipelines_src: webrtcsrc ! appsrc
- Set
pipelines_sink
toPlaying
. - The Signalling server delivers the
producer_id
. - Initialize
pipelines_src
to establish a session withproducer_id
. - Set
pipelines_src
toPlaying
. - Wait for a buffer to be received by the
appsrc
. - Set
pipelines_src
toNull
. - Set
pipelines_sink
toNull
.
First race condition
First race condition happens in the following sequence:
-
webrtcsink
runs a task to periodically retrieve statistics fromwebrtcbin
. This transitively ends up executingrtp_session_create_stats
. -
pipelines_sink
is set toNull
. - In
Paused
toReady
,gst_rtp_session_change_state()
callsrtp_session_reset()
. - The assertion failure occurs when
rtp_session_reset
is called whilertp_session_create_stats
is executing.
This is because rtp_session_create_stats
acquires the lock on session
prior
to calling g_hash_table_foreach
, but rtp_session_reset
doesn't acquire the
lock before calling g_hash_table_remove_all
.
Acquiring the lock in rtp_session_reset
fixes the issue and was implemented in
this branch. I can open an MR if this seems acceptable.
Second race condition
Second race condition happens right after the first payload is received:
-
rtp_session_on_timeout
acquires the lock onsession
and proceeds with its processing. -
rtp_session_process_rtcp
is called (debug log : received RTCP packet) and attempts to acquire the lock onsession
, which is still held byrtp_session_on_timeout
. - as part of an hash table iterator,
rtp_session_on_timeout
transitively invokessource_caps
which releases the lock onsession
so as to callsession->callbacks.caps
. - Since
rtp_session_process_rtcp
was waiting for the lock to be released, it succeeds in acquiring it and proceeds withrtp_session_process_rr
which transitively callsg_hash_table_insert
viaadd_source
. - After
source_caps
re-acquires the lock and gives the control flow back tortp_session_on_timeout
, the hash table iterator is changed, resulting in the assertion failure.
I'm not quite sure how to fix this without risking deadlocks.