rtsp-server: corrupt memory due to raciness in the session timeout handling

I have been noticing quite strange crashes in an application using gst-rtsp-server, mostly in glibc's malloc()/ the Glib slice allocator(when g_slice=always-malloc is not set) and other parts of the code which at a first glance seemed quite unrelated to get-rtsp-server. But eventually managed to narrow the problem down to a corrupt memory caused by the session timeout mechanism.

Problem 1)

What happens is that under some circumstances clients will trigger session timeout and send RTSP TEARDOWN at the same time. Then gst_rtsp_session_filter() which is called by the session timeout mechanism will drop the reference owned by 'priv->medias' to the GstRTSPSessionMedia. handle_teardown() will try to do the very same thing. This would have not caused any issue if gst_rtsp_session_filter() was not temporarily releasing the lock while calling GstRTSPSessionFilterFunc.

This is the problematic piece of code:

gst_rtsp_session_filter():

      g_mutex_unlock (&priv->lock);

      res = func (sess, media, user_data);

      g_mutex_lock (&priv->lock);

when this happens handle_teardown() will take the lock and drop the reference hold from the private structure, gst_rtsp_session_filter() will then try to do the same after that. The ref count here will still be greater than zero since the intern hash table, visited, also holds a ref, but when the hash table gets freed at the end of the function the unref will be on an already freed object.

Problem 2)

handle_* () functions in the client can be called while the session is being finalized due to session timeout. They do call:

sessmedia = gst_rtsp_session_get_media (session, path, &matched);

which is a transfer-none call, in the meantime sessmedia may be freed by gst_rtsp_session_filter (). One solution could possibly be to implement a version which does transfer-full?

Do you have any suggestion how to solve that correctly?

Edited Oct 01, 2021 by Tim-Philipp Müller