XDamage events are inserted between GetImage header and data
TLDR: pDraw->pScreen->GetImage() call in dix/dispatch.c can emit XDamage events that end up between GetImage response header and its data, crashing clients that use XDamage extension, and violating X11 wire protocol.
Long story:
For over a year vino-server was crashing on me on startup, complaining that unknown sequence was received from X server:
[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
vino-server: ../../src/xcb_io.c:260: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
Aborted (core dumped)
I've always restarted it, and sooner or later it did not crash. Unfortunately after last xfce4 update few weeks ago problem became 100% reproducible - vino-server would crash on connect each and every time - unless I run it under strace or valgrind. They both slowed down vino-server a bit, and apparently that was enough to avoid crash.
This weekend I finally got annoyed enough to try to figure out what is going on. When looking at the message that triggered the problem, it looked like memory corruption, as whole message was filled with repeated value 63422 (0xF7BE). After adding logging into vino, libx11, and libxcb, I've noticed that crash happens on next packet after 1976 bytes of data are read, and also that read of the image data got split into 32 bytes read, and 1944 bytes read. So I modified libxcb to die when 1976 bytes response arrives, and one attempt later here was the stack trace:
(gdb) bt
#0 0x00007fb954978081 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fb954963535 in __GI_abort () at abort.c:79
#2 0x00007fb95496340f in __assert_fail_base
(fmt=0x7fb954ac5710 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7fb953bf1260 "len != 1976", file=0x7fb953bf121c "../../src/xcb_in.c", line=413, function=<optimized out>)
at assert.c:92
#3 0x00007fb954970b92 in __GI___assert_fail
(assertion=assertion@entry=0x7fb953bf1260 "len != 1976", file=file@entry=0x7fb953bf121c "../../src/xcb_in.c", line=line@entry=413, function=function@entry=0x7fb953bf12d0 <__PRETTY_FUNCTION__.11208
> "read_block") at assert.c:101
#4 0x00007fb953be0ddc in read_block (len=<optimized out>, buf=0x557810253fd0, fd=3) at ../../src/xcb_in.c:413
#5 0x00007fb953be0ddc in _xcb_in_read_block (c=c@entry=0x55780ffe0e60, buf=buf@entry=0x557810253fb0, len=len@entry=2008) at ../../src/xcb_in.c:1087
#6 0x00007fb953be110f in read_packet (c=0x55780ffe0e60) at ../../src/xcb_in.c:268
#7 0x00007fb953be110f in _xcb_in_read (c=c@entry=0x55780ffe0e60) at ../../src/xcb_in.c:1042
#8 0x00007fb953bdef7f in _xcb_conn_wait (c=c@entry=0x55780ffe0e60, cond=cond@entry=0x7fffe9524360, vector=vector@entry=0x0, count=count@entry=0x0) at ../../src/xcb_conn.c:515
#9 0x00007fb953be061f in wait_for_reply (c=c@entry=0x55780ffe0e60, request=request@entry=467, e=e@entry=0x7fffe9524420) at ../../src/xcb_in.c:525
#10 0x00007fb953be0792 in xcb_wait_for_reply64 (c=c@entry=0x55780ffe0e60, request=467, e=e@entry=0x7fffe9524420) at ../../src/xcb_in.c:569
#11 0x00007fb955183e38 in _XReply (dpy=dpy@entry=0x55780ffdfba0, rep=rep@entry=0x7fffe9524480, extra=extra@entry=0, discard=discard@entry=0) at ../../src/xcb_io.c:609
#12 0x00007fb9551691dc in XGetImage (dpy=0x55780ffdfba0, d=1001, x=1465, y=1162, width=26, height=38, plane_mask=18446744073709551615, format=2) at ../../src/GetImage.c:76
#13 0x00007fb95516946a in XGetSubImage
(dpy=<optimized out>, d=<optimized out>, x=x@entry=1465, y=y@entry=1162, width=width@entry=26, height=height@entry=38, plane_mask=18446744073709551615, format=2, dest_image=0x5578101502e0, dest_x=
1465, dest_y=1162) at ../../src/GetImage.c:138
#14 0x000055780fe076fa in vino_fb_xdamage_idle_handler (vfb=0x55781017cd00 [VinoFB]) at server/vino-fb.c:602
#15 0x00007fb955881d8e in g_main_dispatch (context=0x55781000fc40) at ../../../glib/gmain.c:3179
#16 0x00007fb955881d8e in g_main_context_dispatch (context=context@entry=0x55781000fc40) at ../../../glib/gmain.c:3844
#17 0x00007fb955882140 in g_main_context_iterate (context=0x55781000fc40, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:3917
#18 0x00007fb955882413 in g_main_loop_run (loop=0x557810056210) at ../../../glib/gmain.c:4111
#19 0x000055780fe06f7d in main (argc=<optimized out>, argv=<optimized out>) at server/vino-main.c:319
We can see from stack trace that XGetImage() was invoked, and from local variables that response header was received with 494 dwords of data to follow:
(gdb) frame 5
#5 _xcb_in_read_block (c=c@entry=0x55780ffe0e60, buf=buf@entry=0x557810253fb0, len=len@entry=2008) at ../../src/xcb_in.c:1087
1087 int ret = read_block(c->fd, (char *) buf + done, len - done);
(gdb) print *(xcb_generic_event_t*) buf
$1 = {response_type = 1 '\001', pad0 = 16 '\020', sequence = 467, pad = {494, 33, 0, 0, 0, 0, 0}, full_sequence = 30605658}
But when I look at image data, they do not look too much as image data. Instead they start with something that looks pretty much as XDamage event (xdpyinfo confirms that 90 is XDAMAGE event):
(gdb) frame 4
#4 0x00007fb953be0ddc in read_block (len=<optimized out>, buf=0x557810253fd0, fd=3) at ../../src/xcb_in.c:413
413 assert(len != 1976);
(gdb) print *(xcb_generic_event_t*) buf
$2 = {response_type = 90 'Z', pad0 = 1 '\001', sequence = 467, pad = {1001, 37748742, 1583889968, 76678587, 1441814, 0, 78644800}, full_sequence = 4156487614}
So somehow XDamage event got inserted between GetImage response header, and image data. Note that drawable returned by XDamage (pad[0], 1001) is same drawable on which client issued XGetImage. So 32 bytes of XDamage event is interpreted as image data, and last 32 bytes of image data are interpreted as X server response - resulting in instant crash as image data interpreted as X11 sequence will almost never match.
So problem is not in vino-server, nor libx11 nor libxcb, but in X server itself. And indeed, little instrumentation added into DoGetImage() and DamageExtNotify() revealed that there is a code path that generates XDamage event for drawable X from GetImage call for drawable X, inserting XDamage header between GetImage response header and its body (or possibly between two parts of the body if GetImage could not be completed in one response):
Thread 1 "Xorg" hit Breakpoint 1, DamageInsideGetImage () at ../../../../damageext/damageext.c:121
121 ../../../../damageext/damageext.c: No such file or directory.
(gdb) bt
#0 0x000056203f7405cf in DamageInsideGetImage () at ../../../../damageext/damageext.c:121
#1 0x000056203f7405cf in DamageExtNotify (pDamageExt=0x562040e766f0, pBoxes=0x7fff2a10fc20, nBoxes=1) at ../../../../damageext/damageext.c:121
#2 0x000056203f74811b in DamageReportDamage (pDamage=pDamage@entry=0x562040e77540, pDamageRegion=0x562040e77598) at ../../../../../miext/damage/damage.c:1950
#3 0x000056203f7488f8 in damageRegionProcessPending (pDrawable=<optimized out>) at ../../../../../miext/damage/damage.c:294
#4 0x000056203f748f9d in damageComposite
(op=<optimized out>, pSrc=<optimized out>, pMask=<optimized out>, pDst=0x562040adc6a0, xSrc=<optimized out>, ySrc=<optimized out>, xMask=0, yMask=0, xDst=0, yDst=1161, width=1600, height=39)
at ../../../../../miext/damage/damage.c:518
#5 0x000056203f6f2d7b in compWindowUpdateAutomatic (pWin=<optimized out>) at ../../../../composite/compwindow.c:705
#6 0x000056203f6f2d7b in compPaintWindowToParent (pWin=pWin@entry=0x562040ce7790) at ../../../../composite/compwindow.c:729
#7 0x000056203f6f3c48 in compPaintChildrenToWindow (pWin=0x562040771b30) at ../../../../composite/compwindow.c:744
#8 0x000056203f6f3c48 in compPaintChildrenToWindow (pWin=pWin@entry=0x562040771b30) at ../../../../composite/compwindow.c:736
#9 0x000056203f6f14ef in compGetImage (pDrawable=0x562040771b30, sx=1465, sy=1162, w=26, h=38, format=2, planemask=4294967295, pdstLine=0x562040e82980 "") at ../../../../composite/compinit.c:153
#10 0x000056203f6667ab in DoGetImage
(planemask=<optimized out>, height=<optimized out>, width=<optimized out>, y=<optimized out>, x=<optimized out>, drawable=<optimized out>, format=<optimized out>, client=0x562040cc5f80)
at ../../../../dix/dispatch.c:2214
#11 0x000056203f6667ab in ProcGetImage (client=0x562040cc5f80) at ../../../../dix/dispatch.c:2276
#12 0x000056203f669944 in Dispatch () at ../../../../dix/dispatch.c:478
#13 0x000056203f66d914 in dix_main (argc=6, argv=0x7fff2a110078, envp=<optimized out>) at ../../../../dix/main.c:276
#14 0x00007fbfcefeabbb in __libc_start_main (main=0x56203f657700 <main>, argc=6, argv=0x7fff2a110078, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff2a110068)
at ../csu/libc-start.c:308
#15 0x000056203f65773a in _start () at ../../../../dix/gc.c:814
For now I've modified X server's DoGetImage() to first buffer all image in memory before sending out header, so XDamage events are emitted before GetImage() response, rather than during the response getimage.patch It is definitely not optimal from memory consumption stand point, but changing WriteToClient() to buffer event notifications while someone is in the middle of writing multipart response looked too involved to me.
Bug is also tracked in Debian as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941863
Thanks, Petr Vandrovec