Skip to content

utils/togglerecord: fix race condition checking other streams EOS state

Function check_and_update_stream_start checks whether other streams reached EOS. The stream being checked might already have locked its state. If it's about to check other streams too, this results in a deadlock.

The problem was due to the main_state guard being dropped handling event StreamStart checking whether the main stream is EOS:

    let main_is_eos = if let Some(main_state) = main_state {
        main_state.eos
    } else {
        false
    };

In the above code, main_state main state is comsumed and dropped after evaluating main_state.eos.

This is also the case before handling event Eos.

This revealed another deadlock in sink_chain due to an attempt to acquire to Mutexes without locking the main state first. See 2d commit.

Tested running the 21 tests set 20,000 in a loop.

Previous solution

For the record, this was the initial solution proposed in this MR. It is left here so as to understand the discussions below.

The solution in this commit moves the eos flag in an AtomicBool out of the state Mutex so that it can be read even if state is locked in another thread.

I suspect similar deadlocks could also occur checking other streams flushing or reading current_running_time so I also considered using a parking_lot::RwLock on the state, but I stepped back as it proved to be a much more involving solution.

I'm never too confident choosing the Ordering variant. My intention here was to make sure the most up to date stored value is available when loading, so load should always occur after any concurrent store.

Edited by François Laignel

Merge request reports