cleanup after cairo / xwayland / font map hash table destroy
Starting with commit 823580e0, the headless backend explicitly calls cleanup_after_cairo()
which seems to trigger an assert within cairo, when cairo_debug_reset_static_data()
is called. This happens only
when we run xwayland-test, and it happens once in a few runs, suggesting a race somewhere.
I initially thought this might be a result of cairo_xcb_surface_create_with_xrender_format()
which still holds a reference
somehow. But going a bit deeper it seems that the specific assert would reference the same hash table (which the current code should
release all possible references), the font_map one, while cairo_xcb_surface_create_with_xrender_format()
operates on a distinct hash table.
The trace looks like this:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007f29e32dbd2f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007f29e328cef2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007f29e3277472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007f29e3277395 in __assert_fail_base
(fmt=0x7f29e33eba70 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f29e357e47b "hash_table->live_entries == 0", file=file@entry=0x7f29e357e467 "../src/cairo-hash.c", line=line@entry=219, function=function@entry=0x7f29e357e5c0 <__PRETTY_FUNCTION__.8> "_cairo_hash_table_destroy") at ./assert/assert.c:92
#5 0x00007f29e3285df2 in __GI___assert_fail
(assertion=0x7f29e357e47b "hash_table->live_entries == 0", file=0x7f29e357e467 "../src/cairo-hash.c", line=219, function=0x7f29e357e5c0 <__PRETTY_FUNCTION__.8> "_cairo_hash_table_destroy") at ./assert/assert.c:101
#6 0x00007f29e346ee6d in _cairo_hash_table_destroy (hash_table=0x55cf02eaf320) at ../src/cairo-hash.c:219
#7 0x00007f29e34bc48a in _cairo_scaled_font_map_destroy () at ../src/cairo-scaled-font.c:460
#8 0x00007f29e34622c1 in cairo_debug_reset_static_data () at ../src/cairo-debug.c:67
#9 0x00007f29e36dde10 in cleanup_after_cairo () at ../shared/cairo-util.c:700
#10 0x00007f29e36db9fa in headless_destroy (backend=0x55cf02e0b160) at ../libweston/backend-headless/headless.c:511
#11 0x00007f29e371438b in weston_compositor_destroy (compositor=0x55cf02e01c30) at ../libweston/compositor.c:8968
#12 0x00007f29e3776597 in wet_main (argc=1, argv=0x55cf02dfff00, test_data=0x7ffc85f2eff0) at ../compositor/main.c:4220
#13 0x000055cf02bb3d97 in execute_compositor (setup=0x7ffc85f2f060, data=0x55cf02dff998) at ../tests/weston-test-fixture-compositor.c:410
#14 0x000055cf02bb56e5 in weston_test_harness_execute_as_client (harness=0x55cf02dff980, setup=0x7ffc85f2f060) at ../tests/weston-test-runner.c:534
#15 0x000055cf02badb72 in fixture_setup (harness=0x55cf02dff980) at ../tests/xwayland-test.c:61
#16 0x000055cf02badb90 in fixture_setup_run_ (harness=0x55cf02dff980, arg_=0x0) at ../tests/xwayland-test.c:63
#17 0x000055cf02bb597d in main (argc=1, argv=0x7ffc85f2f248) at ../tests/weston-test-runner.c:682
The way that we handled this previously, like in fb57ce17, was to explicitly call pango_cairo_font_map_set_default(NULL)
. Instrumenting cairo when creating/insert/removing and destroying entries in that font_map hash table shows that there's still a cached version of a hash entry, but that there's no holdover that still references it . Now in normal runs, _cairo_scaled_font_map_destroy()
would perform removal
from the font_map hash table, when it detects at least a holdover, and finally when _cairo_hash_table_destroy()
is called, there are no entries left in that font map hash table.
For runs that trigger the assert _cairo_scaled_font_map_destroy()
will skip removal of entries as that holdover is 0. It seems to suggest that there's window of time where the holdovers and the hash table entries are out-of-sync / or some kind of thread race, happening internally.
For pango_cairo_font_map_set_default()
, the docs say:
" * Note that since Pango 1.32.6, the default fontmap is per-thread.
* This function only changes the default fontmap for
* the current thread. Default fontmaps of existing threads
* are not changed. Default fontmaps of any new threads will
* still be created using [func@PangoCairo.FontMap.new]."
What I've also noticed that building with -fsantize=address I can't trigger the issue, even after a few hundred of times. Might explain why I have not seen it before in CI, but I'm not excluding it.
So far, I've tried a couple of things, but I can still trigger it:
- add
pango_cairo_font_map_set_default(NULL)
inweston_wm_destroy()
respectivelyweston_wm_window_destroy()
- in weston-test-runner.c add a
atexit(3)
to explicitly callcleanup_after_cairo()
in that function, rather than doing it in headless backend.
Maybe another set of eyes can over these might trigger some follow-up I could try.