xe_exec_basic --r once-basic-defer-mmap fails on PVC if BO not validated in CPU handler
Related to discussion on: !192 (closed)
From the ftrace everything looks correct. A VM bind is done, which triggers a validate, the validate results in a move, and the bind is done by the CPU after the move is complete. For some reason the job which uses the bound BO hangs on the jump to the batch.
Dmesg:
[ 237.383499] [IGT] xe_exec_basic: executing
[ 237.387888] [IGT] xe_exec_basic: starting subtest once-basic-defer-mmap
[ 242.551471] xe 0000:8c:00.0: [drm] Timedout job: seqno=1, guc_id=2, flags=0x4
[ 242.551690] *ERROR*
H2G CTB (all sizes in DW):
[ 242.557842] *ERROR* size: 1024
[ 242.561114] *ERROR* resv_space: 0
[ 242.564647] *ERROR* head: 0
[ 242.567660] *ERROR* tail: 238
[ 242.570851] *ERROR* space: 785
[ 242.574116] *ERROR* broken: 0
[ 242.577292] *ERROR* head (memory): 238
[ 242.581249] *ERROR* tail (memory): 238
[ 242.585197] *ERROR* status (memory): 0x0
[ 242.589317] *ERROR*
G2H CTB (all sizes in DW):
[ 242.595338] *ERROR* size: 4096
[ 242.598588] *ERROR* resv_space: 1024
[ 242.602356] *ERROR* head: 56
[ 242.605432] *ERROR* tail: 0
[ 242.608423] *ERROR* space: 3071
[ 242.611757] *ERROR* broken: 0
[ 242.614924] *ERROR* head (memory): 56
[ 242.618776] *ERROR* tail (memory): 56
[ 242.622626] *ERROR* status (memory): 0x0
[ 242.626742] *ERROR* g2h outstanding: 0
[ 242.630683] *ERROR*
GuC ID: 2
[ 242.635224] *ERROR* Name: bcs2
[ 242.638465] *ERROR* Class: 3
[ 242.641534] *ERROR* Logical mask: 0x1
[ 242.645389] *ERROR* Width: 1
[ 242.648456] *ERROR* Ref: 2
[ 242.651352] *ERROR* Timeout: 5000 (ms)
[ 242.655298] *ERROR* Timeslice: 1000 (us)
[ 242.659418] *ERROR* Preempt timeout: 640000 (us)
[ 242.664222] *ERROR* HW Context Desc: 0x01114000
[ 242.668945] *ERROR* LRC Head: (memory) 0
[ 242.673063] *ERROR* LRC Tail: (internal) 88, (memory) 88
[ 242.678565] *ERROR* Start seqno: (memory) 1
[ 242.682943] *ERROR* Seqno: (memory) 0
[ 242.686795] *ERROR* Schedule State: 0x3
[ 242.690817] *ERROR* Flags: 0x4
[ 242.694059] *ERROR* bcs0 (physical), logical instance=0
[ 242.699387] *ERROR* Forcewake: domain 0x2, ref 1
[ 242.704188] *ERROR* MMIO base: 0x00022000
[ 242.708387] *ERROR* HWSTAM: 0xffffffff
[ 242.712328] *ERROR* RING_HWS_PGA: 0x00d10000
[ 242.716793] *ERROR* RING_EXECLIST_STATUS_LO: 0x00003098
[ 242.722204] *ERROR* RING_EXECLIST_STATUS_HI: 0x00000100
[ 242.727615] *ERROR* RING_EXECLIST_SQ_CONTENTS_LO: 0x01114119
[ 242.733485] *ERROR* RING_EXECLIST_SQ_CONTENTS_HI: 0x0111411d
[ 242.739331] *ERROR* RING_EXECLIST_CONTROL: 0x00000000
[ 242.744574] *ERROR* RING_START: 0x01110000
[ 242.748862] *ERROR* RING_HEAD: 0x0000003c
[ 242.753144] *ERROR* RING_TAIL: 0x00000058
[ 242.757435] *ERROR* RING_CTL: 0x00003001
[ 242.761555] *ERROR* RING_MODE: 0x00001000
[ 242.765755] *ERROR* RING_MODE_GEN7: 0x00000008
[ 242.770395] *ERROR* RING_IMR: 0x00000000
[ 242.774680] *ERROR* RING_ESR: 0x00000001
[ 242.778969] *ERROR* RING_EMR: 0xffffffff
[ 242.783261] *ERROR* RING_EIR: 0x00000000
[ 242.787551] *ERROR* ACTHD: 0x001a0010_00000000
[ 242.792276] *ERROR* BBADDR: 0x001a0011_00000000
[ 242.796999] *ERROR* DMA_FADDR: 0x001a0200_00000000
[ 242.801978] *ERROR* IPEIR: 0x00000000
[ 242.805834] *ERROR* IPEHR: 0x00000003
[ 242.811661] xe 0000:8c:00.0: [drm] Engine reset: guc_id=2
[ 242.846420] [IGT] xe_exec_basic: exiting, ret=98
[ 242.856403] xe_exec_basic (1906) used greatest stack depth: 12024 bytes left
Ftrace:
xe_exec_basic-1906 [007] ..... 237.950122: xe_engine_create: 3:0xff, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x10
xe_exec_basic-1906 [007] ..... 237.950244: xe_engine_create: 3:0xff, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x10
xe_exec_basic-1906 [007] ..... 237.950247: xe_vm_create: vm=0xffff888111185000, asid=0x00000
xe_exec_basic-1906 [007] ..... 237.950612: xe_engine_create: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x4
xe_exec_basic-1906 [007] ..... 237.950754: xe_bo_cpu_fault: size=65536, flags=0x84000112, vm=0xffff888111185000
kworker/u32:7-152 [009] ..... 237.950906: xe_vma_evict: vma=0xffff8881321d9a00, asid=0x0000d, start=0x0000001a0000, end=0x0000001affff, ptr=0x000000000000,
kworker/u32:7-152 [009] ..... 237.950954: xe_vma_evict: vma=0xffff8881321d9a00, asid=0x0000d, start=0x0000001a0000, end=0x0000001affff, ptr=0x000000000000,
kworker/u32:7-152 [009] ..... 237.950955: xe_bo_move: size=65536, flags=0x84000112, vm=0xffff888111185000
kworker/u32:7-152 [009] ..... 237.951069: xe_hw_fence_create: ctx=0x000000000000002e, fence=0xffff88812016ccc0, seqno=1
kworker/u32:7-152 [009] ..... 237.951070: xe_sched_job_create: fence=0xffff88812016ccc0, seqno=1, guc_id=0, batch_addr=0x004800400000, guc_state=0x0, flags=0x2, error=0
kworker/u32:7-152 [009] ..... 237.951072: xe_sched_job_exec: fence=0xffff88812016ccc0, seqno=1, guc_id=0, batch_addr=0x004800400000, guc_state=0x0, flags=0x2, error=0
kworker/9:3-495 [009] ..... 237.951115: xe_sched_job_run: fence=0xffff88812016ccc0, seqno=1, guc_id=0, batch_addr=0x004800400000, guc_state=0x0, flags=0x2, error=0
kworker/9:3-495 [009] ..... 237.951116: xe_engine_register: 3:0x100, gt=1, width=1, guc_id=0, guc_state=0x1, flags=0x2
kworker/9:3-495 [009] ..... 237.951138: xe_engine_scheduling_enable: 3:0x100, gt=1, width=1, guc_id=0, guc_state=0x7, flags=0x2
kworker/u32:7-152 [009] ..... 237.951244: xe_bo_move: new_mem->mem_type=3
kworker/u32:7-152 [009] ..... 237.951245: xe_vma_bind: vma=0xffff8881321d9a00, asid=0x0000d, start=0x0000001a0000, end=0x0000001affff, ptr=0x000000000000,
<idle>-0 [014] d.h2. 237.951345: xe_hw_fence_try_signal: ctx=0x000000000000002e, fence=0xffff88812016ccc0, seqno=1
<idle>-0 [014] dNh2. 237.951376: xe_hw_fence_signal: ctx=0x000000000000002e, fence=0xffff88812016ccc0, seqno=1
kworker/u32:8-153 [006] ..... 237.951390: xe_engine_scheduling_done: 3:0x100, gt=1, width=1, guc_id=0, guc_state=0x7, flags=0x2
kworker/14:1-230 [014] ..... 237.951402: xe_sched_job_free: fence=0xffff88812016ccc0, seqno=1, guc_id=0, batch_addr=0x004800400000, guc_state=0x3, flags=0x2, error=0
kworker/u32:7-152 [009] ..... 237.951667: xe_vm_cpu_bind: vm=0xffff888111185000, asid=0x0000d
kworker/u32:7-152 [009] ..... 237.952159: xe_vm_cpu_bind: vm=0xffff888111185000, asid=0x0000d
xe_exec_basic-1906 [007] ..... 237.952288: xe_hw_fence_create: ctx=0x00000000000000e3, fence=0xffff88812016cf40, seqno=1
xe_exec_basic-1906 [007] ..... 237.952289: xe_sched_job_create: fence=0xffff88812016cf40, seqno=1, guc_id=2, batch_addr=0x0000001a0000, guc_state=0x0, flags=0x4, error=0
xe_exec_basic-1906 [007] ..... 237.952290: xe_vma_rebind_exec: vma=0xffff8881321d9a00, asid=0x0000d, start=0x0000001a0000, end=0x0000001affff, ptr=0x000000000000,
xe_exec_basic-1906 [007] ..... 237.952290: xe_vma_bind: vma=0xffff8881321d9a00, asid=0x0000d, start=0x0000001a0000, end=0x0000001affff, ptr=0x000000000000,
xe_exec_basic-1906 [007] ..... 237.952324: xe_vm_cpu_bind: vm=0xffff888111185000, asid=0x0000d
xe_exec_basic-1906 [007] ..... 237.952352: xe_vm_cpu_bind: vm=0xffff888111185000, asid=0x0000d
xe_exec_basic-1906 [007] ..... 237.952387: xe_sched_job_exec: fence=0xffff88812016cf40, seqno=1, guc_id=2, batch_addr=0x0000001a0000, guc_state=0x0, flags=0x4, error=0
kworker/7:1-244 [007] ..... 237.952429: xe_sched_job_run: fence=0xffff88812016cf40, seqno=1, guc_id=2, batch_addr=0x0000001a0000, guc_state=0x0, flags=0x4, error=0
kworker/7:1-244 [007] ..... 237.952430: xe_engine_register: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x1, flags=0x4
kworker/7:1-244 [007] ..... 237.952449: xe_engine_scheduling_enable: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x7, flags=0x4
kworker/u32:7-152 [009] ..... 237.952692: xe_engine_scheduling_done: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x7, flags=0x4
kworker/6:2-233 [006] ..... 237.953086: xe_hw_fence_free: ctx=0x0000000000000010, fence=0xffff88812016c180, seqno=11
kworker/u32:7-152 [009] ..... 237.953289: xe_vm_free: vm=0xffff888132068800, asid=0x0000b
kworker/u32:7-152 [009] ..... 237.953657: xe_vm_free: vm=0xffff888132fd5000, asid=0x0000c
kworker/u32:8-153 [013] ..... 243.321724: xe_sched_job_timedout: fence=0xffff88812016cf40, seqno=1, guc_id=2, batch_addr=0x0000001a0000, guc_state=0x3, flags=0x4, error=0
kworker/u32:8-153 [013] ..... 243.321794: xe_engine_scheduling_disable: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x19, flags=0x5
kworker/u32:7-152 [001] ..... 243.322380: xe_engine_reset: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x19, flags=0x5
kworker/u32:7-152 [001] ..... 243.322415: xe_engine_scheduling_done: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x59, flags=0x5
kworker/u32:7-152 [001] ..... 243.322506: xe_engine_deregister: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x51, flags=0x5
kworker/u32:7-152 [001] ..... 243.322637: xe_engine_deregister_done: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x51, flags=0x5
kworker/u32:8-153 [013] ...1. 243.322723: xe_sched_job_set_error: fence=0xffff88812016cf40, seqno=1, guc_id=2, batch_addr=0x0000001a0000, guc_state=0x50, flags=0x5, error=-62
kworker/u32:8-153 [013] d.h2. 243.322730: xe_hw_fence_try_signal: ctx=0x00000000000000e3, fence=0xffff88812016cf40, seqno=1
kworker/u32:8-153 [013] dNh2. 243.322739: xe_hw_fence_signal: ctx=0x00000000000000e3, fence=0xffff88812016cf40, seqno=1
kworker/13:1-112 [013] ..... 243.322745: xe_sched_job_free: fence=0xffff88812016cf40, seqno=1, guc_id=2, batch_addr=0x0000001a0000, guc_state=0x50, flags=0x5, error=-62
kworker/u32:8-153 [013] ..... 243.323009: xe_vma_unbind: vma=0xffff8881321d9a00, asid=0x0000d, start=0x0000001a0000, end=0x0000001affff, ptr=0x000000000000,
kworker/u32:8-153 [013] ..... 243.323013: xe_vm_cpu_bind: vm=0xffff888111185000, asid=0x0000d
kworker/u32:8-153 [013] ..... 243.323079: xe_vm_cpu_bind: vm=0xffff888111185000, asid=0x0000d
xe_exec_basic-1906 [007] ..... 243.323222: xe_bo_cpu_fault: size=65536, flags=0x84000112, vm=0xffff888111185000
xe_exec_basic-1906 [007] ..... 243.365882: xe_engine_kill: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0x50, flags=0x5
xe_exec_basic-1906 [007] ..... 243.365899: drm_sched_msg_add: guc_id=2, opcode=1
kworker/7:1-244 [007] ..... 243.366176: drm_sched_msg_recv: guc_id=2, opcode=1
kworker/7:1-244 [007] ..... 243.366177: xe_engine_cleanup_entity: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0xd0, flags=0x5
kworker/u32:7-152 [001] ..... 243.366202: xe_engine_destroy: 3:0x1, gt=1, width=1, guc_id=2, guc_state=0xd0, flags=0x5
kworker/u32:7-152 [001] ..... 243.366216: xe_hw_fence_free: ctx=0x00000000000000e3, fence=0xffff88812016cf40, seqno=1
xe_exec_basic-1906 [007] ..... 243.366315: xe_engine_kill: 3:0xff, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x10
xe_exec_basic-1906 [007] ..... 243.366325: drm_sched_msg_add: guc_id=1, opcode=1
xe_exec_basic-1906 [007] .N... 243.366483: xe_engine_kill: 3:0xff, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x10
kworker/7:1-244 [007] ..... 243.366494: drm_sched_msg_recv: guc_id=1, opcode=1
kworker/7:1-244 [007] ..... 243.366495: xe_engine_cleanup_entity: 3:0xff, gt=0, width=1, guc_id=1, guc_state=0x80, flags=0x10
xe_exec_basic-1906 [007] ..... 243.366508: drm_sched_msg_add: guc_id=1, opcode=1
kworker/u32:7-152 [001] ..... 243.366514: xe_engine_destroy: 3:0xff, gt=0, width=1, guc_id=1, guc_state=0x80, flags=0x10
kworker/7:1-244 [007] ..... 243.366518: drm_sched_msg_recv: guc_id=1, opcode=1
kworker/7:1-244 [007] ..... 243.366519: xe_engine_cleanup_entity: 3:0xff, gt=1, width=1, guc_id=1, guc_state=0x80, flags=0x10
kworker/u32:9-154 [003] ..... 243.366537: xe_engine_destroy: 3:0xff, gt=1, width=1, guc_id=1, guc_state=0x80, flags=0x10
kworker/u32:9-154 [003] ..... 243.367142: xe_vm_free: vm=0xffff888111185000, asid=0x0000d
Test output:
Starting subtest: once-basic-defer-mmap
(xe_exec_basic:1906) CRITICAL: Test assertion failure function test_exec, file ../tests/xe/xe_exec_basic.c:205:
(xe_exec_basic:1906) CRITICAL: Failed assertion: data[i].data == 0xc0ffee
(xe_exec_basic:1906) CRITICAL: error: 411041793 != 12648430
Stack trace:
#0 ../lib/igt_core.c:1745 __igt_fail_assert()
#1 ../tests/xe/xe_exec_basic.c:185 test_exec()
#2 [<unknown>+0x0]
Subtest once-basic-defer-mmap failed.
**** DEBUG ****
(xe_exec_basic:1906) CRITICAL: Test assertion failure function test_exec, file ../tests/xe/xe_exec_basic.c:205:
(xe_exec_basic:1906) CRITICAL: Failed assertion: data[i].data == 0xc0ffee
(xe_exec_basic:1906) CRITICAL: error: 411041793 != 12648430
(xe_exec_basic:1906) igt_core-INFO: Stack trace:
(xe_exec_basic:1906) igt_core-INFO: #0 ../lib/igt_core.c:1745 __igt_fail_assert()
(xe_exec_basic:1906) igt_core-INFO: #1 ../tests/xe/xe_exec_basic.c:185 test_exec()
(xe_exec_basic:1906) igt_core-INFO: #2 [<unknown>+0x0]
**** END ****
Subtest once-basic-defer-mmap: FAIL (5.458s)