Skip to content

i915g: Screen corruption with ENOBUFS caused by fence register shortage

GKraats requested to merge GKraats/mesa_alu:blit into main

The routines i915_fill_blit and i915_copy_blit at src/gallium/drivers/i915/i915_blit.c checks, if executing of the command would cause shortage of fence register, by calling i915_winsys_validate_buffers. The first problem is that this routine is not checking the last buffer. The second problem is that the usage of the fence register at a buffer is later set at OUT_RELOC_FENCED. This causes exceeding of the limit of 14 available fence registers, if all buffers at the batch are different. At dual monitor this always causes loosing the current batchbuffer with error ENOBUFS. If only 1 monitor is used the effective limit is 15 and it is only aborting if the buffer needs 16 fence registers, which is only possible if i915_copy_blit is the last command. Loosing a batchbuffer causes screen corruption. If mesa is compiled for debug, the running program will crash at an assert; gnome-shell will logout.

The best solution would be the call to a libdrm-routine, which updates the needed fence registers for each buffer before the validation, but I didnot see such routine, so it should be added to libdrm, causing a troublesome dependency.

This MR contains the fix with num_of_buffers at validation as described above. It solves the problem at the blit-functions by checking the number of fence-registers after updating the batch. If too many registers are used, the batch-entries and relocs for the current blit function are removed by setting batch->ptr and reloc_count to value before the blit call and calling drm_intel_gem_bo_clear_relocs. This truncated batch is flushed, and the batch is updated again for the current blit function. This solution is simple and clear and hardly costs performance.

Another simple solution is maintaining a shadow-administration at the blit-routines. For every call of a blit-routine it increments the shadow-fence-count. If the shadow-fence-count > limit it flushes the batch-buffer. The flush-routine has to reset shadow-fence-count to 0. This solution supposes, that only the blit-routines are using fence-registers.

Closes: #6963

Closes: #6587

Edited by GKraats

Merge request reports