Skip to content

drm/xe: Hold a ref to hw fence when it is in the irq_list

Matthew Brost requested to merge (removed):xe into xe

Should fix below kernel panic / #42 (closed):

[25262.722349] kernel BUG at drivers/gpu/drm/xe/xe_hw_fence.c:188!
[25262.728499] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[25262.733740] CPU: 5 PID: 14190 Comm: kworker/5:2 Tainted: G        W         5.18.0-xe+ #1853
[25262.742176] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[25262.755551] Workqueue: events drm_sched_main [gpu_sched]
[25262.760869] RIP: 0010:xe_hw_fence_release+0xf1/0x100 [xe]
[25262.766294] Code: 80 3d 47 54 02 00 00 75 d7 48 c7 c2 30 75 14 a0 be f0 00 00 00 48 c7 c7 af b8 13 a0 c6 05 2b 54 02 00 01 e8 42 62 9a e1 eb b6 <0f> 0b 0f 0b 31 ed e9 1a ff ff ff 0f 1f 40 00 41 57 41 56 41 55 41
[25262.785036] RSP: 0018:ffffc9000414bd98 EFLAGS: 00010297
[25262.790267] RAX: ffff88816f75a640 RBX: ffff88818e8d4300 RCX: 0000000000000000
[25262.797404] RDX: 0000000000000001 RSI: ffffffff822e5fe8 RDI: ffffffff82341287
[25262.804543] RBP: ffff88818e8d4358 R08: 0000000000000001 R09: 0000000000000000
[25262.811680] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810b1f8328
[25262.818816] R13: ffff88810b1f8420 R14: 00000000ffffffff R15: ffff88810b1f8388
[25262.825951] FS:  0000000000000000(0000) GS:ffff88849fa80000(0000) knlGS:0000000000000000
[25262.834038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25262.839786] CR2: 00007f097e89a000 CR3: 0000000005612006 CR4: 0000000000770ee0
[25262.846923] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[25262.854058] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[25262.861198] PKRU: 55555554
[25262.863921] Call Trace:
[25262.866385]  <TASK>
[25262.868500]  drm_sched_fence_release_scheduled+0x5c/0x70 [gpu_sched]
[25262.874870]  drm_sched_entity_pop_job+0x39e/0x450 [gpu_sched]
[25262.880625]  drm_sched_main+0x1bb/0x5c0 [gpu_sched]
[25262.885516]  process_one_work+0x272/0x5c0
[25262.889542]  worker_thread+0x37/0x370
[25262.893219]  ? process_one_work+0x5c0/0x5c0
[25262.897414]  kthread+0xed/0x120
[25262.900573]  ? kthread_complete_and_exit+0x20/0x20
[25262.905378]  ret_from_fork+0x1f/0x30
[25262.908968]  </TASK>
[25262.911167] Modules linked in: xe fuse snd_hda_codec_hdmi x86_pkg_temp_thermal snd_hda_intel snd_intel_dspcfg coretemp snd_hda_codec snd_hwdep snd_hda_core mei_me snd_pcm mei crct10dif_pclmul crc32_pclmul e1000e ptp i2c_i801 drm_ttm_helper ghash_clmulni_intel i2c_smbus ttm pps_core gpu_sched intel_lpss_pci drm_suballoc_helper [last unloaded: xe]
[25262.941770] ---[ end trace 0000000000000000 ]---
[25263.510736] RIP: 0010:xe_hw_fence_release+0xf1/0x100 [xe]
[25263.516174] Code: 80 3d 47 54 02 00 00 75 d7 48 c7 c2 30 75 14 a0 be f0 00 00 00 48 c7 c7 af b8 13 a0 c6 05 2b 54 02 00 01 e8 42 62 9a e1 eb b6 <0f> 0b 0f 0b 31 ed e9 1a ff ff ff 0f 1f 40 00 41 57 41 56 41 55 41
[25263.534941] RSP: 0018:ffffc9000414bd98 EFLAGS: 00010297
[25263.540177] RAX: ffff88816f75a640 RBX: ffff88818e8d4300 RCX: 0000000000000000
[25263.547329] RDX: 0000000000000001 RSI: ffffffff822e5fe8 RDI: ffffffff82341287
[25263.554504] RBP: ffff88818e8d4358 R08: 0000000000000001 R09: 0000000000000000
[25263.561652] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810b1f8328
[25263.568816] R13: ffff88810b1f8420 R14: 00000000ffffffff R15: ffff88810b1f8388
[25263.575963] FS:  0000000000000000(0000) GS:ffff88849fa80000(0000) knlGS:0000000000000000
[25263.584075] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25263.589837] CR2: 00007f097e89a000 CR3: 000000019ae84005 CR4: 0000000000770ee0
[25263.596989] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[25263.604133] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[25263.611284] PKRU: 55555554

Signed-off-by: Matthew Brost matthew.brost@intel.com

Edited by Matthew Brost

Merge request reports