DG1 VCS engine timeout after Coarse Power Gating enabling
Since DG1 support in the i915 KMD is not done, which is mainly broken in media-driver/iHD and I can't use DKMS that is not upstreamed, so I have been testing the Xe KMD, which was working fine until I upgraded from 6.10 to 6.11 kernel. It fails when decoding HEVC videos with ffmpeg/vaapi, and the VCS engine hangs.
I bisected the commit that caused it, which is 38e8c4184ea0eebe478a9e6fc8d1062561223429, which enabled Coarse Power Gating. After reverting it everything is back to normal.
I would be very grateful if someone could take a look at this.
How to reproduce:
ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -hwaccel_output_format vaapi \
-i "https://test-videos.co.uk/vids/bigbuckbunny/mp4/h265/1080/Big_Buck_Bunny_1080_10s_30MB.mp4" -an -sn -f null -
Logs:
lspci -nn | grep VGA
03:00.0 VGA compatible controller [0300]: Intel Corporation DG1 [Iris Xe MAX Graphics] [8086:4905] (rev 01)
sudo dmesg | grep xe
[ 0.000000] Command line: BOOT_IMAGE=/casper/vmlinuz --- quiet splash i915.force_probe=!4905 xe.force_probe=4905
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000996] MTRR map: 5 entries (3 fixed + 2 variable; max 23), built from 10 variable MTRRs
[ 0.055998] Kernel command line: BOOT_IMAGE=/casper/vmlinuz --- quiet splash i915.force_probe=!4905 xe.force_probe=4905
[ 0.152100] ... fixed-purpose events: 4
[ 0.002407] ... fixed-purpose events: 3
[ 0.325850] pci 0000:00:1f.4: BAR 4 [io 0xefa0-0xefbf]
[ 7.281724] systemd[1]: Set up automount proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point.
[ 8.465322] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
[ 8.885689] xe 0000:03:00.0: vgaarb: deactivate vga console
[ 8.886026] xe 0000:03:00.0: [drm] Found DG1 (device ID 4905) display version 12.00 stepping B0
[ 8.893496] xe 0000:03:00.0: [drm] Using GuC firmware from i915/dg1_guc_70.bin version 70.29.2
[ 8.898129] xe 0000:03:00.0: [drm] Using HuC firmware from i915/dg1_huc.bin version 7.9.3
[ 8.928235] xe 0000:03:00.0: [drm] VISIBLE VRAM: 0x0000006000000000, 0x0000000100000000
[ 8.928414] xe 0000:03:00.0: [drm] VRAM[0, 0]: Actual physical size 0x0000000100000000, usable size exclude stolen 0x00000000fb800000, CPU accessible size 0x00000000fb800000
[ 8.928417] xe 0000:03:00.0: [drm] VRAM[0, 0]: DPA range: [0x0000000000000000-100000000], io range: [0x0000006000000000-60fb800000]
[ 8.928420] xe 0000:03:00.0: [drm] Total VRAM: 0x0000006000000000, 0x0000000100000000
[ 8.928421] xe 0000:03:00.0: [drm] Available VRAM: 0x0000006000000000, 0x00000000fb800000
[ 8.941884] xe 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg1_dmc_ver2_02.bin (v2.2)
[ 9.366657] xe 0000:03:00.0: [drm] vcs1 fused off
[ 9.366659] xe 0000:03:00.0: [drm] vcs3 fused off
[ 9.366660] xe 0000:03:00.0: [drm] vcs4 fused off
[ 9.366661] xe 0000:03:00.0: [drm] vcs5 fused off
[ 9.366661] xe 0000:03:00.0: [drm] vcs6 fused off
[ 9.366662] xe 0000:03:00.0: [drm] vcs7 fused off
[ 9.366662] xe 0000:03:00.0: [drm] vecs1 fused off
[ 9.366663] xe 0000:03:00.0: [drm] vecs2 fused off
[ 9.366664] xe 0000:03:00.0: [drm] vecs3 fused off
[ 9.369897] xe 0000:03:00.0: [drm] GT0: using 65535 GUC IDs
[ 9.408941] [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 2
[ 9.465038] fbcon: xedrmfb (fb0) is primary device
[ 9.465042] xe 0000:03:00.0: [drm] fb0: xedrmfb frame buffer device
[ 9.474591] xe 0000:03:00.0: [drm] *ERROR* PCODE Mailbox failed: 4 Illegal Subcommand
[ 9.474622] xe 0000:03:00.0: [drm] *ERROR* PCODE Mailbox failed: 4 Illegal Subcommand
[ 9.474663] xe 0000:03:00.0: [drm] *ERROR* PCODE Mailbox failed: 4 Illegal Subcommand
[ 9.609169] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops i915_audio_component_bind_ops [xe])
[ 71.166149] xe 0000:03:00.0: [drm] *ERROR* Atomic update failure on pipe A (start=3696 end=3697) time 2067 us, min 1073, max 1079, scanline start 1060, end 1060
[ 95.526703] xe 0000:03:00.0: [drm] *ERROR* Atomic update failure on pipe A (start=5158 end=5159) time 2074 us, min 1073, max 1079, scanline start 944, end 1084
[ 599.545866] audit: type=1400 audit(1734341228.403:92): apparmor="DENIED" operation="change_onexec" class="file" info="label not found" error=-2 profile="unconfined" name="ubuntu_pro_apt_news" pid=7871 comm="(python3)"
[ 599.548706] audit: type=1400 audit(1734341228.406:93): apparmor="DENIED" operation="change_onexec" class="file" info="label not found" error=-2 profile="unconfined" name="ubuntu_pro_esm_cache" pid=7872 comm="(python3)"
[ 710.044059] xe 0000:03:00.0: [drm] GT0: Engine reset: engine_class=vcs, logical_mask: 0x3, guc_id=19
[ 710.044221] xe 0000:03:00.0: [drm] GT0: Timedout job: seqno=4294967169, lrc_seqno=4294967169, guc_id=19, flags=0x0
[ 710.044563] xe 0000:03:00.0: [drm] Xe device coredump has been created
[ 710.044576] xe 0000:03:00.0: [drm] Check your /sys/class/drm/card2/device/devcoredump/data
sudo cat /sys/class/drm/card2/device/devcoredump/data
**** Xe Device Coredump ****
kernel: 6.11.0-8-generic
module: xe
Snapshot time: 1734341338.908194137
Uptime: 709.854912753
Process: ffmpeg
PCI ID: 0x4905
PCI revision: 0x01
GT id: 0
Type: main
IP ver: 0.0.0
CS reference clock: 19200000
**** GuC CT ****
H2G CTB (all sizes in DW):
size: 1024
resv_space: 0
head: 635
tail: 971
space: 687
broken: 0
head (memory): 971
tail (memory): 971
status (memory): 0x0
G2H CTB (all sizes in DW):
size: 4096
resv_space: 1024
head: 700
tail: 0
space: 3071
broken: 0
head (memory): 700
tail (memory): 700
status (memory): 0x0
g2h outstanding: 0
GuC ID: 19
Name: vcs19
Class: 1
Logical mask: 0x3
Width: 1
Ref: 17
Timeout: 5000 (ms)
Timeslice: 1000 (us)
Preempt timeout: 640000 (us)
HW Context Desc: 0x00efa000
HW Indirect Ring State: 0x00000000
LRC Head: (memory) 0
LRC Tail: (internal) 1672, (memory) 1672
Start seqno: (memory) -127
Seqno: (memory) -128
Timestamp: 0x00000000
Job Timestamp: 0x00000000
[HWSP].length: 0x1000
[HWSP].data: zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzs8W+Lzs8W+Mzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
[HWCTX].length: 0x1000
[HWCTX].data: z&I8pc!#u&.s8N')!#ttqz!#ttm!!!4Z!#ttu!:O:q!#tu$!!"\R!#u#Qz!#u#)!7W+o!#u"Nz!#u$Tz!#u$Xz!#u$\z!#u#iz!#u'Izzzzzz&I8p[!#u*>z!#u'!z!#u&rz!#u&nz!#u&jz!#u&fz!#u&bz!#u&^z!#u&Zpl)ujzzzzzzzzzzzzzzzzzzzzzzzzzzzzz&I8qX!#u/uz!#u/uz!#u/uz!#u/uz!#u/uz!#u/uz!#ttes8N'!!#u!/rr2ru!#u!Ss8N'!!#u#a!!!!"!#u#e!#/O8!#u)CGGrDH!#u#Yz!#u#9!7W+k!#u#=z!#u#A!!!!"!#u+]z!#u1Cz!#u1Gz!#u1Kz!#u1Oz!#u1Sz!#u1Wz!#u1[z!#u1_z!#u1cz!#u1gz!#u1kz!#u1oz!#u1sz!#u2"z!#u2&z!#u2*z!#u2.z!#u22z!#u26z!#u2:z!#u2>z!#u2Bz!#u2Fz!#u2Jz!#u2Nz!#u2Rz!#u2Vz!#u2Zz!#u2^z!#u2bz!#u2fz!#u2jz!#tuP!e:7N%Y+NZzz&HWLS!!#D0z!!#D4z!!#D8z!!#D<z!!#D@z!!#DDz!!#=sz!!#>"zzzzzzz"TSN'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
Schedule State: 0x441
Flags: 0x0
Job: seqno=-126, fence=0, finished=0
Job: seqno=-125, fence=0, finished=0
Job: seqno=-124, fence=0, finished=0
Job: seqno=-123, fence=0, finished=0
Job: seqno=-122, fence=0, finished=0
Job: seqno=-121, fence=0, finished=0
Job: seqno=-120, fence=0, finished=0
Job: seqno=-119, fence=0, finished=0
Job: seqno=-118, fence=0, finished=0
Job: seqno=-117, fence=0, finished=0
Job: seqno=-116, fence=0, finished=0
Job: seqno=-115, fence=0, finished=0
Job: seqno=-114, fence=0, finished=0
Job: seqno=-113, fence=0, finished=0
Job: seqno=-112, fence=0, finished=0
**** Job ****
batch_addr[0]: 0x000001ffffff0000
**** HW Engines ****
vcs0 (physical), logical instance=0
Forcewake: domain 0x8, ref 1
HWSTAM: 0x00000000
RING_HWS_PGA: 0x0115c000
RING_EXECLIST_STATUS: 0x0000000000000001
RING_EXECLIST_SQ_CONTENTS: 0x0000000000000000
RING_START: 0x0000000000000000
RING_HEAD: 0x00000000
RING_TAIL: 0x00000000
RING_CTL: 0x00000000
RING_MI_MODE: 0x00000200
RING_MODE: 0x00000000
RING_IMR: 0x00000000
RING_ESR: 0x00000000
RING_EMR: 0xffffffff
RING_EIR: 0x00000000
ACTHD: 0x0000000000000000
BBADDR: 0x0000000000000000
DMA_FADDR: 0x0000000000000000
INDIRECT_RING_STATE: 0x00000000
IPEHR: 0x00000000
RING_INSTDONE: 0x7ffffffe
vcs2 (physical), logical instance=1
Forcewake: domain 0x20, ref 1
HWSTAM: 0xffffffff
RING_HWS_PGA: 0x01163000
RING_EXECLIST_STATUS: 0x0000000000000001
RING_EXECLIST_SQ_CONTENTS: 0x0000000000000000
RING_START: 0x0000000000000000
RING_HEAD: 0x00000000
RING_TAIL: 0x00000000
RING_CTL: 0x00000000
RING_MI_MODE: 0x00000200
RING_MODE: 0x00000008
RING_IMR: 0x00000000
RING_ESR: 0x00000000
RING_EMR: 0xffffffff
RING_EIR: 0x00000000
ACTHD: 0x0000000000000000
BBADDR: 0x0000000000000000
DMA_FADDR: 0x0000000000000000
INDIRECT_RING_STATE: 0x00000000
IPEHR: 0x00000000
RING_INSTDONE: 0x7ffffffe
**** VM state ****
[0].error: -19