ffmpeg/iva triggered GPU crash GPU HANG: ecode 12:4:2ffffffd, in ffmpeg, fail to reset in drm-tip
I can reproduce this 100% from an ffmpeg session running in Frigate against a few h.264 cameras with the following ffmpeg options: - -hwaccel - vaapi - -hwaccel_device - /dev/dri/renderD128 - -hwaccel_output_format - yuv420p
It works for usually 10-30 minutes then crashes.
HW: Asus Z690-P DDR4, i12700k OS: x86_64 5.17.0-rc2-custom from drm-tip I have all the latest firmware files from linux-firmware installed., this also happens with stock Ubuntu HWE 20.04.
Relevent dmesg: [ 3801.565943] i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out [ 3801.569650] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:2ffffffd, in ffmpeg [2537] [ 3812.260280] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:28fffffd, in ffmpeg [3989] [ 3812.261304] i915 0000:00:02.0: [drm] Resetting vcs1 for stopped heartbeat on vcs1 [ 3812.261857] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on vcs1 [ 3812.464850] [drm:__uc_sanitize [i915]] ERROR Failed to reset GuC, ret = -110 [ 3812.553885] i915 0000:00:02.0: [drm] ERROR Failed to reset chip [ 3812.553888] i915 0000:00:02.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_reset+0x258/0x2d0 [i915] [ 3812.759406] [drm:__uc_sanitize [i915]] ERROR Failed to reset GuC, ret = -110 [ 3812.761034] i915 0000:00:02.0: [drm] ffmpeg[2537] context reset due to GPU hang [ 3812.761046] i915 0000:00:02.0: [drm] ffmpeg[3989] context reset due to GPU hang [ 3812.776687] intel_gt_invalidate_tlbs: 54 callbacks suppressed [ 3812.776699] i915 0000:00:02.0: [drm] ERROR rcs0 TLB invalidation did not complete in 4ms! [ 3812.782913] i915 0000:00:02.0: [drm] ERROR bcs0 TLB invalidation did not complete in 4ms! [ 3812.788268] i915 0000:00:02.0: [drm] ERROR rcs0 TLB invalidation did not complete in 4ms! [ 3812.793893] i915 0000:00:02.0: [drm] ERROR bcs0 TLB invalidation did not complete in 4ms! [ 3812.798941] i915 0000:00:02.0: [drm] ERROR rcs0 TLB invalidation did not complete in 4ms! [ 3812.804344] i915 0000:00:02.0: [drm] ERROR bcs0 TLB invalidation did not complete in 4ms! [ 3812.809756] i915 0000:00:02.0: [drm] ERROR rcs0 TLB invalidation did not complete in 4ms! [ 3812.814904] i915 0000:00:02.0: [drm] ERROR bcs0 TLB invalidation did not complete in 4ms! [ 3812.820228] i915 0000:00:02.0: [drm] ERROR rcs0 TLB invalidation did not complete in 4ms! [ 3812.825357] i915 0000:00:02.0: [drm] ERROR bcs0 TLB invalidation did not complete in 4ms! [ 3816.810711] Fence expiration time out i915-0000:00:02.0:ffmpeg[3989]:b1fc! [ 3823.905825] show_signal_msg: 22 callbacks suppressed [ 3823.905828] ffmpeg[4731]: segfault at 0 ip 0000000000000000 sp 00007fff9ef65ae8 error 14 in libigdgmm.so.11.3.1176[7f6f304d6000+7000]