Skylake GPU HANG while gstreamer H264 vaapi encoding from MJPEG vaapi decode on drm-tip
Submitted by Andy Nicholas
Assigned to Intel GFX Bugs mailing list
Link to original bug (#110394)
Description
Created attachment 143919
GPU crash info plus dmesg log
Similar to Bug #110297 which I filed.
Skylake GPU hang when encoding video stream to H.264 using VAAPI. The stream is decoded from a VAAPI MJPEG stream from a file. We run test loops where we transcode this stream over and over, thousands of times. This GPU hang happened on iteration 1026.
Running on Intel Compute Stick STK2mV64CC. We have locked the minimum and maximum clock speeds of the GPU to 500 Mhz to attempt to avoid... this issue.
We are running this test because one of our products needs to have this exact configuration: read an MJPEG stream from a V4L camera and transcode into H264. This configuration needs to be super stable. Crashing once in 1026 iterations is not considered "stable".
Using Ubuntu 18.04 plus DRM-TIP kernel from about 3 weeks ago which corresponds with 5.1-rc1.
Using GStreamer 1.14.1:
shield@tobeprovisioned1804:~$ gst-launch-1.0 --version
gst-launch-1.0 version 1.14.1
GStreamer 1.14.1
https://launchpad.net/distros/ubuntu/+source/gstreamer1.0
Full GPU hang log and dmesg enclosed. This is related to a similar bug which I previously filed.
Especially concerning is that the machine is usable (but the GPU seems dead) after this crash. We would like to figure out a way of determining that the GPU has died and to kernel panic so that we can, eventually, reboot. Modifying the kernel is A-OK to avoid this issue, so if Intel doesn't have a mechanism then I will try to add something myself.
Leaving the machine in this "half dead" state is bad. We can't use the gstreamer process termination as the "reboot the machine" trigger as we may have other, less severe, bugs where we simply want to restart the gstreamer process.
Attachment 143919, "GPU crash info plus dmesg log":
crash2_192.168.16.38