va/d3d12: Implement vaSyncBuffer for ffmpeg encode with async_depth
This MR implements vaSyncBuffer
in the VA frontend, and then implements support for it in the d3d12 gallium video driver.
Quoting the FFmpeg VAAPI encoder docs: "async_depth
Maximum processing parallelism. Increase this to improve single channel performance. This option doesn’t work if driver doesn’t implement vaSyncBuffer function. Please make sure there are enough hw_frames allocated if a large number of async_depth is used."
When FFmpeg detects vaSyncBuffer
is available, the gallium driver must support multiple p_video_codec.end_frame
encode (ie. vaEndFrame
) calls that spawn in-flight GPU work but return without waiting the GPU to be finished, and block for GPU sync only when p_video_codec.get_feedback
(ie. vaSyncBuffer
) is called to retrieve the compressed bitstream sizes that also need to be kept stored. When vaSyncBuffer
is not available, FFmpeg does synchronous and consecutive calls to vaEndPicture
+ vaSyncSurface
for every frame.
As this MR is a breaking change from FFmpeg behavior when vaSyncBuffer
is not available, a new cap PIPE_VIDEO_CAP_ENC_SUPPORTS_ASYNC_OPERATION
is defined to make vaSyncBuffer
return VA_STATUS_ERROR_UNIMPLEMENTED
unless the gallium drivers explicitly support this async pipe_video_codec
call pattern.
Quick testing on a couple video clips with the d3d12 driver show 66% more relative GPU utilization and 30% to 50% faster times when FFmpeg detects async support.