rtsp-server: fix race when merging POST-tunnel with GET-tunnel
The problem is seen when tunneling RTSP over HTTP, when the second channel is received. In the description below The GET-channel is setup first, and the problem arises when the POST-channel is set up.
When the rtsp server receives a new connection it creates a new client object and calls gst_rtsp_client_attach(), supplying the client object and a context, of the client thread newly created by the server. In gst_rtsp_client_attach() a new watch is created and attached to the context supplied by the server. After that it takes the clients priv->lock while creating priv->rtsp_ctrl_timeout, a timer for the RTSP control channel. When done it releases priv->lock.
But what if the client thread gets to execute before the thread executing gst_rtsp_client_attach() takes priv->lock? The connection watch will call tunnel_post(), which will call handle_tunnel(). It will move the connection from the new client object to the already existing one and destroy priv->watch of the new one, which will eventually call client_watch_notify(), which calls rtsp_ctrl_timeout_remove(), which destroys priv->rtsp_ctrl_timeout, if it exists. Which, when things go wrong, it doesn't (because the server thread executing gst_rtsp_client_attach() was intercepted by the client thread). And eventually the client object is unreffed and we end up in gst_rtsp_client_finalize(), which in the beginning asserts that priv->rtsp_ctrl_timeout is NULL.
Now what happens if the server thread gets to execute again after the client thread has tried to destroy priv->rtsp_ctrl_timeout, but before it got to the assert in gst_rtsp_client_finalize()? Well, gst_rtsp_client_attach() may manage to create the timer source and assign it to priv->rtsp_ctrl_timeout. And the assert in gst_rtsp_client_finalize() will fail. This is what happen often in our environment.
But even if we got past the assert in the finalize function we would still have a problem. Eventually gst_rtsp_client_attach() would be allowed to continue and will try to assign the priv->rtsp_ctrl_timeout variable of an object which was just finalized.
I thought of two solutions to the problem. One was to put the creation of priv->rtsp_ctrl_timeout in the client thread. But that would mean having to wait for some kind of input on the connection. But the whole point of that timer is to make sure that we don't wait forever for the client to send us something. So that's not really a solution.
And the second idea I had was to simply take priv->lock before attaching the client to the context. That should fix it since both tunnel_get() and tunnel_post() will take the lock before proceeding.
Does this seem like a good solution?