poor audio performance in qemu

When running pipewire inside qemu with an audio output client on it, the audio performance is very poor; there are underruns continuously (not just clicks and pops). I am aware that audio in qemu is in general not that good, but plain gstreamer alsasink works fine for instance (with a few clicks and pops sometimes, but that's nothing compared to how pipewire sounds)

Let's debug this. To aid debugging, I added a printout of the current time in the log output (I plan to submit a PR for this later).

[D][000006164.677904][alsa-utils.c:1079 spa_alsa_start()] alsa 0x55ee4a278280: start 1024 slave:0
Hardware PCM card 0 'HDA Intel' device 0 subdevice 0
Its setup is:
  stream       : PLAYBACK
  access       : MMAP_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 16384
  period_size  : 1024
  period_time  : 21333
  tstamp_mode  : ENABLE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 1024
  period_event : 0
  start_threshold  : 9223372036854775807
  stop_threshold   : 16384
  silence_threshold: 0
  silence_size : 0
  boundary     : 4611686018427387904
  appl_ptr     : 1536
  hw_ptr       : 540
[T][000006164.680999][alsa-utils.c:708 spa_alsa_write()] begin 0 16384 1024
[T][000006164.681012][alsa-utils.c:768 spa_alsa_write()] silence 2048
[T][000006164.681019][alsa-utils.c:773 spa_alsa_write()] commit 0 2048 93097

We start with this configuration, writing 2048 samples of silence at t=681 msec.

[T][000006164.681025][alsa-utils.c:788 spa_alsa_write()] snd_pcm_start 2048
[T][000006164.681563][alsa-utils.c:1014 alsa_on_timeout_event()] timeout 2048 6164915672416 6119302269295 45613403121 1024 95145
[T][000006164.681575][alsa-utils.c:939 handle_play()] early wakeup 2048 1024

At the same time, we have the chance to pull data from the client, but we don't, because we still have 2048 samples of silence to play, so we go to sleep for a while.

[T][000006164.702828][alsa-utils.c:1014 alsa_on_timeout_event()] timeout 832 6164937327019 6164937005749 321270 1024 95145
[T][000006164.702866][alsa-utils.c:655 update_time()] slave:0 6164937327019 1.000412 832 -192.000000 1024.421666 1024

At t=702 msec, we wake up and there is only 832 samples left in the device. The time difference is 21.3 msec, which should have been 1024 samples at the 48kHz rate we are running, but it looks like the device has consumed the equivalent of 25.3 msec instead.

[T][000006164.703286][alsa-utils.c:708 spa_alsa_write()] begin 2048 14336 1024
[T][000006164.703290][alsa-utils.c:752 spa_alsa_write()] alsa-util 0x55ee4a278280: reuse buffer 0
[T][000006164.703293][port.c:147 schedule_mix_reuse_buffer()] port 0x55ee4a276aa0: reuse buffer 0 0
[T][000006164.703296][alsa-utils.c:773 spa_alsa_write()] commit 2048 1024 95145
[T][000006164.703300][node.c:719 process_node()] node 0x55ee4a277b50: graph completed wait:366084 run:26061

At this time we try to pull a buffer from the client, and we end up committing it ~0.5msec later:

[T][000006164.725638][alsa-utils.c:1014 alsa_on_timeout_event()] timeout 511 6164958792282 6164958651571 140711 1024 96169
[T][000006164.725666][alsa-utils.c:655 update_time()] slave:0 6164958792282 1.001689 511 -513.000000 1025.729781 1024

Waking up again at t=725.6 msec. The device has 511 samples left, while at t=702.8 it had 832 and we also wrote another 1024, so 832+1024-511=1345 samples consumed, the equivalent of 28msec, while the clock advanced only 22.8 msec. Again, it looks like the device consumes samples faster.

[T][000006164.726094][alsa-utils.c:708 spa_alsa_write()] begin 3072 13312 1024
[T][000006164.726104][alsa-utils.c:752 spa_alsa_write()] alsa-util 0x55ee4a278280: reuse buffer 0
[T][000006164.726108][port.c:147 schedule_mix_reuse_buffer()] port 0x55ee4a276aa0: reuse buffer 0 0
[T][000006164.726111][alsa-utils.c:773 spa_alsa_write()] commit 3072 1024 96169
[T][000006164.726115][node.c:719 process_node()] node 0x55ee4a277b50: graph completed wait:398729 run:32142

The next write happens at t=726.1 msec and we write another 1024 samples.

[E][000006164.745599][alsa-utils.c:544 alsa_recover()] 0x55ee4a278280: xrun of 64456 usec 3093 0.000000

Now in the next wakeup, at t=745.5 msec, we underrun. The time diff from the previous wakeup is 19.9 msec, but the device has consumed more than 1024+511=1535 (i.e. 31.9 msec) of samples (well, I guess it tries to pull multiples of 1024 to send to pulseaudio on the host and it can't find enough data, triggering the underrun. The 31.9 msec is probably wrong, but in any case we can see that the device is going much faster than anticipated).

This motif continues in loops.

[E][000006165.023541][alsa-utils.c:544 alsa_recover()] 0x55ee4a278280: xrun of 62540 usec 3001 0.000000
[T][000006165.024963][alsa-utils.c:708 spa_alsa_write()] begin 0 16384 1024
[T][000006165.024972][alsa-utils.c:768 spa_alsa_write()] silence 2048
[T][000006165.024976][alsa-utils.c:773 spa_alsa_write()] commit 0 2048 130651
[T][000006165.024980][alsa-utils.c:788 spa_alsa_write()] snd_pcm_start 2048
[T][000006165.025070][alsa-utils.c:1014 alsa_on_timeout_event()] timeout 2048 6165257825549 6165257782932 42617 1024 132699
[T][000006165.025076][alsa-utils.c:939 handle_play()] early wakeup 2048 1024

This is another underrun of the same client, about 300ms later. We detect it and we write 2048 samples of silence, going to sleep right after.

[T][000006165.044950][alsa-utils.c:1014 alsa_on_timeout_event()] timeout 711 6165279297098 6165279158882 138216 1024 132699
[T][000006165.044976][alsa-utils.c:655 update_time()] slave:0 6165279297098 1.000671 711 -313.000000 1024.687403 1024

Waking up ~19ms later to find out that ~1300 samples are already gone.

[T][000006165.045388][alsa-utils.c:773 spa_alsa_write()] commit 2048 1024 132699

Commit 1024 samples about 0.4 msec later.

[T][000006165.066305][alsa-utils.c:1014 alsa_on_timeout_event()] timeout 356 6165300636771 6165300616120 20651 1024 133723

Waking up ~20ms later and more samples are gone.

[T][000006165.066781][alsa-utils.c:773 spa_alsa_write()] commit 3072 1024 133723
[T][000006165.066785][node.c:719 process_node()] node 0x55ee4a277b50: graph completed wait:390204 run:23833
[E][000006165.087656][alsa-utils.c:544 alsa_recover()] 0x55ee4a278280: xrun of 62563 usec 3003 0.000000

Commit at t=066.7 msec and of course when we wake up again around ~21 msec later (this window is increasing for some reason!?), the device has underrun.

To conclude, I think there are 2 problems with the algorithm that is driving alsa-sink.

We don't pull samples early enough when we start (either at the beginning of playback or right after an underrun) but we wait for the first timeout. I think we could pull earlier, as long as there is space in the ringbuffer.
The timeout window should be much more sensitive to the device rate. If the device is consuming the equivalent of 25 msec of data in just 21 msec, then the timeout window needs to aggressively shrink to something like 15 msec. And it definitely should not increase.

Edited Dec 04, 2020 by George Kiagiadakis