Ensure next buffer at the end of SwapBuffers() to reduce perceived latency
Not sure if this is a gallium feature request or some other Mesa component. Currently, it looks like swapchain contention backpressure hits the client app during the first draw operation of a given frame. As an illustration, here's where the GL client ends up blocking if there are no available buffers (using the glxgears benchmark here as a testcase):
#0 0x00007f7d91c23b2f in poll () from /usr/lib/libc.so.6
#1 0x00007f7d91a2463b in poll (__timeout=-1, __nfds=1, __fds=0x7ffceb950968)
at /usr/include/bits/poll2.h:46
#2 _xcb_conn_wait (c=c@entry=0x5614a1216420, cond=cond@entry=0x5614a13bd438,
vector=vector@entry=0x0, count=count@entry=0x0) at xcb_conn.c:480
#3 0x00007f7d91a26b1b in xcb_wait_for_special_event (c=0x5614a1216420,
se=0x5614a13bd410) at xcb_in.c:795
#4 0x00007f7d919e1419 in dri3_wait_for_event_locked (full_sequence=0x0,
draw=0x5614a13bca18) at ../mesa/src/loader/loader_dri3_helper.c:582
#5 dri3_wait_for_event_locked (draw=0x5614a13bca18, full_sequence=0x0)
at ../mesa/src/loader/loader_dri3_helper.c:563
#6 0x00007f7d919e201a in dri3_find_back (draw=draw@entry=0x5614a13bca18)
at ../mesa/src/loader/loader_dri3_helper.c:711
#7 0x00007f7d919e39be in dri3_get_buffer (format=format@entry=4098,
buffer_type=buffer_type@entry=loader_dri3_buffer_back,
draw=draw@entry=0x5614a13bca18, driDrawable=<optimized out>)
at ../mesa/src/loader/loader_dri3_helper.c:1882
#8 0x00007f7d919e3ec9 in loader_dri3_get_buffers (
driDrawable=<optimized out>, format=4098, stamp=0x5614a13bcb80,
loaderPrivate=0x5614a13bca18, buffer_mask=<optimized out>,
buffers=<optimized out>) at ../mesa/src/loader/loader_dri3_helper.c:2106
#9 0x00007f7d9009da0b in dri_image_drawable_get_buffers (
statts_count=<optimized out>, statts=<optimized out>,
images=<optimized out>, drawable=<optimized out>)
at ../mesa/src/gallium/frontends/dri/dri2.c:283
#10 dri2_allocate_textures (ctx=0x5614a122c140, drawable=0x5614a13bcb80,
statts=0x5614a13bd238, statts_count=2)
at ../mesa/src/gallium/frontends/dri/dri2.c:418
#11 0x00007f7d9009f91e in dri_st_framebuffer_validate (stctx=<optimized out>,
stfbi=<optimized out>, statts=0x5614a13bd238, count=2, out=0x7ffceb950e10)
at ../mesa/src/gallium/frontends/dri/dri_drawable.c:82
#12 0x00007f7d900fbe8a in st_framebuffer_validate (
stfb=stfb@entry=0x5614a13bcde0, st=st@entry=0x5614a13a63b0)
at ../mesa/src/mesa/state_tracker/st_manager.c:222
#13 0x00007f7d900fc509 in st_manager_validate_framebuffers (st=0x5614a13a63b0)
at ../mesa/src/mesa/state_tracker/st_manager.c:1197
#14 0x00007f7d9011d016 in st_validate_state (st=st@entry=0x5614a13a63b0,
pipeline=pipeline@entry=ST_PIPELINE_CLEAR)
at ../mesa/src/mesa/state_tracker/st_atom.c:203
#15 0x00007f7d90122557 in st_Clear (ctx=0x7f7d8434c010, mask=18)
at ../mesa/src/mesa/state_tracker/st_cb_clear.c:442
#16 0x00005614a1015053 in draw ()
#17 0x00005614a10153f0 in draw_gears ()
#18 0x00005614a10154c1 in draw_frame ()
#19 0x00005614a10162ab in event_loop ()
#20 0x00005614a1016666 in main ()
The common pattern followed by typical naive client apps (which is true for most games) is "sample time, simulate, draw, present, repeat". In GPU-bound scenarios, the swapchain length is the primary way that an app ends up blocking to not get too ahead of the GPU. If instead of the above, the next framebuffer was validated immediately at the end of SwapBuffers(), before returning to the client app, there would be one less frame worth of latency visible to the user in the above scenario. It would also match what other drivers and graphics stacks do.