Jack modules use locking primitives in process callback
Summary
The JACK sink and source modules use locking primitives where they are expressly forbidden by JACK usage guidelines. Both modules end up calling pthread_mutex_lock()
inside the JACK process callback (jack-module-sink
locks a mutex via pa_asyncmsgq_send()
, jack-module-source
via pa_asyncmsgq_post()
).
Additionally it looks like the source module might be allocating heap memory, which is also forbidden in a JACK realtime context, but I haven't had a closer look yet.
JACK API usage Frothing-at-the-mouth rant
From the jack_set_process_callback()
API documentation:
The code in the supplied function must be suitable for real-time execution. That means that it cannot call functions that might block for a long time. This includes all I/O functions (disk, TTY, network), malloc, free, printf, pthread_mutex_lock, sleep, wait, poll, select, pthread_join, pthread_cond_wait, etc, etc.
To many (most?) developers this may appear as needlessly strict and forbidding. Judging by how many programs use mutexes in the JACK callback I feel that a subtle disconnect between pro-audio developers and the wider community has taken place. To me it seems like the requirements outlined in JACK's API documentation aren't being taken as necessary implementation details, but rather as an abstract statement of principle reminiscent of declarations by hard-core FLOSS aficionados. "Yeah man, nice ideas you got there, but I'm happy with MIT-licensed code."
(I'm not making any statements regarding licenses here, it's just an analogy. Calm down. No, really, stop.)
The fact is, however, that if you put mutexes or anything of that sort in the JACK callback, your application is putting significant strain on the entire chain of clients in the graph. The mutex might mostly work, i.e., there are no audible pops or cracks in the audio stream, but you're very likely to just barely return from the callback in time. If another process causes spikes in CPU load, and/or you add more JACK clients to the signal chain, your "works for me" implementation will drag the entire graph down and force you to run at stupid high latencies.
Replicating
Configure JACK to run at a reasonably low latency, without PulseAudio. Launch QJackCtl. Observe the RT utilization percentage. It should be fairly low when not actively running a bunch of effects and stuff.
Now do the same but load PulseAudio sink and/or source. Observe QJactkCtl's RT utilization. At least on my Ryzen 3700X it's at a constant 100%. The sound does not crackle, but the utilization hints at clients behaving badly. Note that my CPU load is just a few %, nothing heavy is actually happening in the system, but the realtime context is getting bottlenecked by locking primitives.
How to fix it
PulseAudio's JACK source and sink modules must use non-blocking synchronization primitives when copying audio buffers in their JACK callbacks. Likely converting the current async stuff to simple double buffering with atomic flipping would work. The JACK source callback must be able to immediately write to an available buffer, signal Pulse with a non-blocking primitive after copying is done, then return. The JACK sink callback must be able to read the most recent input data from a buffer without waiting, signal Pulse, and return. Nothing else. No juggling of locks between JACK callback context and Pulse threads.
(This bug report is still a work in progress.)