eglMakeCurrent does not always ensure dri_drawable->update_drawable_info has been called for a new EGLSurface if another has been created and destroyed first
Submitted by Johan Helsing
Assigned to Johan Helsing
Created attachment 138907 Apitrace of the failing test
This is pretty timing sensitive, and I haven't been able to reproduce on anything except a Qt unit test running against a headless Weston compositor.
Start Weston with: weston --backend=headless-backend.so -i 0
Qt unit test: https://codereview.qt-project.org/#/c/225522/2//ALL
The symptom is that glReadPixels will only return 1 pixel instead of 16 which is the surface size.
Why it fails:
In the test, a temporary EGLSurface will be created as part of the window setup. After that surface has been destroyed, the real window surface is created. When that happens the malloc in driCreateNewDrawable may return the same address the first surface's drawable had. Consequently, when dri_make_current later tries to determine if it should update the texture_stamp it compares the surface's drawable pointer against the drawable in the last call to dri_make_current and assumes it's the same surface (which it isn't).
When texture_stamp is left unset then dri_st_framebuffer_validate thinks it has already called update_drawable_info for that drawable, which is why, in the test above, the size of the surface is not updated to match the Wayland window and glReadPixels later is only going to return only one pixel.
The solution is to clear the dangling pointer to the destroyed drawable, dri_context::dPriv. I've confirmed that this fixes the flakiness of my test and will post a patch on the mailing list shortly.
I've tagged this bug with Drivers/Gallium/llvmpipe, but it really affects all gallium dri drivers (and I've reproduced the issue with softpipe as well).
Related bug report in Qt: https://bugreports.qt.io/browse/QTBUG-67678
Attachment 138907, "Apitrace of the failing test":