va/d3d12: Implement vaSyncBuffer for ffmpeg encode with async_depth (!18715) · Merge requests · Mesa / mesa

Sil Vilerino requested to merge sivileri/mesa:d3d12_video_async_optimizations into main Sep 20, 2022

This MR implements vaSyncBuffer in the VA frontend, and then implements support for it in the d3d12 gallium video driver.

Quoting the FFmpeg VAAPI encoder docs: "async_depth Maximum processing parallelism. Increase this to improve single channel performance. This option doesn’t work if driver doesn’t implement vaSyncBuffer function. Please make sure there are enough hw_frames allocated if a large number of async_depth is used."

When FFmpeg detects vaSyncBuffer is available, the gallium driver must support multiple p_video_codec.end_frame encode (ie. vaEndFrame) calls that spawn in-flight GPU work but return without waiting the GPU to be finished, and block for GPU sync only when p_video_codec.get_feedback (ie. vaSyncBuffer) is called to retrieve the compressed bitstream sizes that also need to be kept stored. When vaSyncBuffer is not available, FFmpeg does synchronous and consecutive calls to vaEndPicture + vaSyncSurface for every frame.

As this MR is a breaking change from FFmpeg behavior when vaSyncBuffer is not available, a new cap PIPE_VIDEO_CAP_ENC_SUPPORTS_ASYNC_OPERATION is defined to make vaSyncBuffer return VA_STATUS_ERROR_UNIMPLEMENTED unless the gallium drivers explicitly support this async pipe_video_codec call pattern.

Quick testing on a couple video clips with the d3d12 driver show 66% more relative GPU utilization and 30% to 50% faster times when FFmpeg detects async support.

Edited Sep 21, 2022 by Sil Vilerino

va/d3d12: Implement vaSyncBuffer for ffmpeg encode with async_depth

Merge request reports