System hangs during parallel media transcode operations after enabling VT-d
After enabling "VT-d" from BIOS (which i915 enables IOMMU for GFX) for BXT J4205, that machine started to suffer from system hangs.
They happen when running multiple GPU transcode operations in parallel (I'm testing these with 5 parallel ones).
Transcode could be either heavy HEVC one:
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i Netflix_FoodMarket_4096x2160_10bit_420_100mbs_600.h265 -c:v hevc_vaapi -b:v 20M -an -vframes 300 -y 0099_4K20.h265
Or somewhat more lightweight AVC one:
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i 1920x1080i_29.97_20mb_mpeg2_high.mpv -c:v h264_vaapi -b:v 6000K -compression_level 7 -an -vframes 1200 -y 0024_HD17i7_1.0.h264
And repeating that parallel test-case for few times.
There's nothing in "dmesg -w" output over remote ssh connection when this happens, the connection just dies.
I've enabled VT-d also on SkullCanyon (SKL GT4e) and TGL-H (GT1), but those do not suffer from the same problem. I think GLK would though, as it's also on Atom.
These system hangs happen both with latest drm-tip and few months old drm-tip version so this kernel bug may have been there for longer time, or be HW/FW issue needing WA.
(I first thought this was due to media-driver update, but rolling back to media stack version before VT-d change did not help.)