GPU Hang Error with OpenVINO's "object_detection_demo.py" on Kernel 6.2 and 6.5.0 using i915 Driver
Steps to Reproduce:
- Set up an environment with Ubuntu 22.04.
- Install the kernel version 6.2 HWE or 6.5.0-060500rc2drmintelnext20230817-generic. Note that with kernel 5.15, the problem does not manifest.
- Use the official OpenVINO 2023.0.1 Docker image.
- Run the demo "object_detection_demo.py" from Open Model Zoo:
object_detection_demo.py -m yolox-tiny/FP16/yolox-tiny.xml -at yolox -i test.jpg --no_show -d GPU
- Observe the GPU hang error.
Frequency of Issue:
The error consistently occurs every time the demo is executed using the specified kernel versions.
Additional Information:
Error log:
[ 80.761763] i915 0000:00:02.0: [drm:i915_gem_open [i915]]
[ 92.302537] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 92.302702] i915 0000:00:02.0: [drm] python3[2103] context reset due to GPU hang
[ 92.302745] i915 0000:00:02.0: [drm:mark_guilty [i915]] context python3[2103]: guilty 1, banned
[ 92.303647] i915 0000:00:02.0: [drm:mark_guilty [i915]] client python3[2103]: gained 4 ban score, now 4
[ 92.314159] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:e757fefe, in python3 [2103]
System information:error.bz2
System architecture: x86_64
Kernel version: 6.5.0-060500rc2drmintelnext20230817-generic
Linux distribution: Ubuntu Server 22.04
DMI: Intel(R) Client Systems NUC7CJYHN/NUC7JYB
Attempted Solution:
Tried to extend the timeout by using:
echo 10000 | sudo tee /sys/class/drm/card0/engine/rcs0/preempt_timeout_ms
However, this did not resolve the error.
---error.bz2