rpl-p: Some errors and warnings starting with '*ERROR* gt0: TLB invalidation time'd out, seqno=21248, recv=21247'
Easily reproduced with Mesa piglit test suite. Here the beginning of the errors and failures:
[ 174.300394] xe 0000:00:02.0: [drm] *ERROR* gt0: TLB invalidation time'd out, seqno=21248, recv=21247
[ 174.564391] xe 0000:00:02.0: [drm] *ERROR* gt0: TLB invalidation time'd out, seqno=21249, recv=21247
[ 174.828403] xe 0000:00:02.0: [drm] *ERROR* gt0: TLB invalidation time'd out, seqno=21250, recv=21247
[ 175.017074] xe 0000:00:02.0: [drm] *ERROR* GuC engine reset request failed on 0:0 because 0x00000000
[ 175.026324] xe 0000:00:02.0: [drm] GT0: trying reset
[ 175.026332] xe 0000:00:02.0: [drm] GT0: reset queued
[ 175.026561] xe 0000:00:02.0: [drm] GT0: reset started
[ 175.036294] xe 0000:00:02.0: [drm] *ERROR* GuC PC Shutdown failed
[ 175.042501] ------------[ cut here ]------------
[ 175.042504] WARNING: CPU: 2 PID: 9 at drivers/gpu/drm/xe/xe_guc.c:801 xe_guc_stop_prepare+0x15/0x20 [xe]
[ 175.042552] Modules linked in: xe drm_ttm_helper gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy ttm drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt mei_pxp mei_hdcp x86_pkg_temp_thermal pmt_telemetry pmt_class coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep kvm_intel snd_hda_core e1000e i2c_i801 mei_me snd_pcm ptp i2c_smbus pps_core mei wmi_bmof intel_vsec video wmi fuse
[ 175.042596] CPU: 2 PID: 9 Comm: kworker/u32:0 Not tainted 6.3.0+zeh-xe+ #965
[ 175.042599] Hardware name: Intel Corporation Raptor Lake Client Platform/RaptorLake-P LP5 RVP, BIOS RPLPFWI1.R00.4081.A00.2302200847 02/20/2023
[ 175.042600] Workqueue: gt-ordered-wq gt_reset_worker [xe]
[ 175.042631] RIP: 0010:xe_guc_stop_prepare+0x15/0x20 [xe]
[ 175.042663] Code: 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 81 c7 08 0b 00 00 e8 e4 60 00 00 85 c0 75 05 c3 cc cc cc cc <0f> 0b c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90
[ 175.042665] RSP: 0018:ffffc900000dfe28 EFLAGS: 00010282
[ 175.042668] RAX: 00000000fffffffb RBX: ffff88811a2325d0 RCX: 0000000000000001
[ 175.042669] RDX: 0000000000000005 RSI: ffffffff824c2c33 RDI: ffff88811a230000
[ 175.042671] RBP: ffff8881001e5038 R08: 00000000fffdffff R09: 0000000000000001
[ 175.042672] R10: 00000000fffdffff R11: ffff8882ae9fe000 R12: ffff88811a233c10
[ 175.042673] R13: ffff88811a232308 R14: 0000000000000000 R15: ffff88811a258805
[ 175.042675] FS: 0000000000000000(0000) GS:ffff8882a6300000(0000) knlGS:0000000000000000
[ 175.042676] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 175.042678] CR2: 00007fec68000020 CR3: 0000000117ae4003 CR4: 0000000000770ee0
[ 175.042679] PKRU: 55555554
[ 175.042680] Call Trace:
[ 175.042682] <TASK>
[ 175.042683] gt_reset_worker+0xd6/0x290 [xe]
[ 175.042714] process_one_work+0x260/0x520
[ 175.042723] worker_thread+0x4a/0x390
[ 175.042726] ? __pfx_worker_thread+0x10/0x10
[ 175.042729] kthread+0xed/0x120
[ 175.042732] ? __pfx_kthread+0x10/0x10
[ 175.042735] ret_from_fork+0x29/0x50
[ 175.042742] </TASK>
[ 175.042743] irq event stamp: 288749
[ 175.042744] hardirqs last enabled at (288755): [<ffffffff811dade9>] __up_console_sem+0x59/0x80
[ 175.042748] hardirqs last disabled at (288760): [<ffffffff811dadce>] __up_console_sem+0x3e/0x80
[ 175.042750] softirqs last enabled at (288318): [<ffffffff8114b42e>] irq_exit_rcu+0xbe/0x130
[ 175.042754] softirqs last disabled at (288311): [<ffffffff8114b42e>] irq_exit_rcu+0xbe/0x130
[ 175.042756] ---[ end trace 0000000000000000 ]---
Full log dmesg.txt