Enhance error capture
Going forward, issues are becoming more and more complex and instrumenting user side is also non trivial. Once mr187 is merge I would like to add capability to add capability to list vma and under Kconfig.debug option also a "compact" hexdump of all or guilty batch (BBADDR: 0x0000d556aa620385)
- report "VM root" address so we can verify the content from user side.
Current POC on PVC:
[65804.193900] *ERROR*
H2G CTB (all sizes in DW):
[65804.199936] *ERROR* size: 1024
[65804.203189] *ERROR* resv_space: 0
[65804.206678] *ERROR* head: 0
[65804.209635] *ERROR* tail: 223
[65804.212785] *ERROR* space: 800
[65804.216026] *ERROR* broken: 0
[65804.219172] *ERROR* head (memory): 223
[65804.223097] *ERROR* tail (memory): 223
[65804.227026] *ERROR* status (memory): 0x0
[65804.231124] *ERROR*
G2H CTB (all sizes in DW):
[65804.237135] *ERROR* size: 4096
[65804.240363] *ERROR* resv_space: 1024
[65804.244118] *ERROR* head: 40
[65804.247172] *ERROR* tail: 0
[65804.250142] *ERROR* space: 3071
[65804.253446] *ERROR* broken: 0
[65804.256591] *ERROR* head (memory): 40
[65804.260431] *ERROR* tail (memory): 40
[65804.264268] *ERROR* status (memory): 0x0
[65804.268365] *ERROR* g2h outstanding: 0
[65804.272289] *ERROR*
GuC ID: 2
[65804.276824] *ERROR* Name: ccs2
[65804.280051] *ERROR* Class: 5
[65804.283108] *ERROR* Logical mask: 0x1
[65804.286948] *ERROR* Width: 1
[65804.290002] *ERROR* Ref: 3
[65804.292871] *ERROR* Timeout: 1 (ms)
[65804.296538] *ERROR* Timeslice: 1000 (us)
[65804.300637] *ERROR* Preempt timeout: 640000 (us)
[65804.305431] *ERROR* HW Context Desc: 0x01184000
[65804.310141] *ERROR* LRC Head: (memory) 144
[65804.314414] *ERROR* LRC Tail: (internal) 280, (memory) 280
[65804.320079] *ERROR* Start seqno: (memory) 2
[65804.324438] *ERROR* Seqno: (memory) 1
[65804.328278] *ERROR* Schedule State: 0x43
[65804.332373] *ERROR* Flags: 0x8
[65804.335602] *ERROR* ccs0 (physical), logical instance=0
[65804.340918] *ERROR* Forcewake: domain 0x2, ref 1
[65804.345712] *ERROR* MMIO base: 0x0001a000
[65804.349897] *ERROR* HWSTAM: 0x00000000
[65804.353811] *ERROR* RING_HWS_PGA: 0x00e90000
[65804.358260] *ERROR* RING_EXECLIST_STATUS_LO: 0x00003098
[65804.363663] *ERROR* RING_EXECLIST_STATUS_HI: 0x00000100
[65804.369065] *ERROR* RING_EXECLIST_SQ_CONTENTS_LO: 0x01184119
[65804.374901] *ERROR* RING_EXECLIST_SQ_CONTENTS_HI: 0x0118411d
[65804.380738] *ERROR* RING_EXECLIST_CONTROL: 0x00000000
[65804.385967] *ERROR* RING_START: 0x01180000
[65804.390243] *ERROR* RING_HEAD: 0x000000dc
[65804.394516] *ERROR* RING_TAIL: 0x00000118
[65804.398791] *ERROR* RING_CTL: 0x00003001
[65804.402890] *ERROR* RING_MODE: 0x00001000
[65804.407079] *ERROR* RING_MODE_GEN7: 0x00000000
[65804.411699] *ERROR* RING_IMR: 0x00000000
[65804.415975] *ERROR* RING_ESR: 0x00000000
[65804.420249] *ERROR* RING_EMR: 0xffffffff
[65804.424525] *ERROR* RING_EIR: 0x00000000
[65804.428800] *ERROR* ACTHD: 0x0000d556aa620384
[65804.433426] *ERROR* BBADDR: 0x0000d556aa620385
[65804.438049] *ERROR* DMA_FADDR: 0x0000d556aa620780
[65804.442931] *ERROR* IPEIR: 0x00000000
[65804.446768] *ERROR* IPEHR: 0x00000005
[65804.452087] *ERROR* GEN12_RCU_MODE: 0x00000001
[65804.456706] *ERROR* VM root: A:0xd71000
[65804.460722] *ERROR* [00008000fffc0000-00008000fffcffff] S:0x0000000000010000 A:00000000011e0000 LMEM
[65804.470146] *ERROR* [0000] 80100061 7f054220 00000000 00000000 80000065 7f258220 02000004 ffffffc0
[65804.479199] *ERROR* [0020] 80000065 7f058110 01000024 00ff00ff 80001a40 7f258220 01007f24 00400040
[65804.488247] *ERROR* [0040] 8000195b 7f048220 01017f24 00c07f04 00012031 01140000 fa007f8f f6380003
(cut)
[65805.026894] *ERROR* [0fc0] 18800101 001b0f90 00000000 13004002 01524204 00000000 00000366 01000000
[65805.035943] *ERROR* [0fe0] 04000001 02800000 02800101 13244002 000000d4 00000000 ffffffff 02800100
[65805.044996] *ERROR* [00008000fffd0000-00008000fffdffff] S:0x0000000000010000 A:00000000011c0000 LMEM
[65805.054424] *ERROR* [0000] 00000000 00000000 00000000 00000001 00000001 00000001 00000000 00000000
[65805.063476] *ERROR* [0020] fffe0000 ff00ffff 40400000 000000c8 00000001 00000001 00000001 00000000
(cut)
[65806.203774] *ERROR* [0fe0] 04000001 02800000 02800101 13244002 000000d4 00000000 ffffffff 02800100
[65806.212823] *ERROR* [00008000fffe0000-00008000fffeffff] S:0x0000000000010000 A:0000000001110000 LMEM
[65806.222244] *ERROR* [0000] 80100061 7f054220 00000000 00000000 80000065 7f258220 02000004 ffffffc0
[65806.231294] *ERROR* [0020] 80000065 7f058110 01000024 00ff00ff 80001a40 7f258220 01007f24 00400040
(cut)
[65806.457576] *ERROR* [0340] 00140040 20010aa0 0a102000 000004a4 00140040 20010aa0 0a102000 000004a4
[65806.466625] *ERROR* *
[65806.468974] *ERROR* [0b00] 80001940 04f58660 050004f4 00010001 00140040 20010aa0 0a102000 000004a4
[65806.478027] *ERROR* [0b20] 80541970 00010220 520004f4 000004b4 00140040 7b050aa0 0a102000 000004a4
[65806.487073] *ERROR* [0b40] 84400020 00004000 00000000 fffff7e8 00162331 00000000 fb087724 01007b14
[65806.496124] *ERROR* [0b60] 341c0061 0000007f 8000c431 020c0000 fa3e000c 00000000 bc188461 00000200
[65806.505172] *ERROR* [0b80] 800c1131 00000004 30207f0c 00000000 00000060 00000000 00000000 00000000
[65806.514227] *ERROR* [0ba0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[65806.523273] *ERROR* *
[65806.525622] *ERROR* [0000aaad553ef000-0000aaad553effff] S:0x0000000000001000 A:00000000bca6c000 USR
[65806.540950] *ERROR* [0000] 0000002d 00000000 e93c15e0 0000ffff e93c1757 0000ffff e93c1748 0000ffff
[65806.549998] *ERROR* [0020] e93c1070 0000ffff ba7be800 0000aaaa e93c1050 0000ffff ba6977a8 0000aaaa
[65806.559050] *ERROR* [0040] dd663a00 0000aaaa ba7be800 0000aaaa 00000004 00000000 000000c8 00000000
[65806.568099] *ERROR* [0060] 447a0000 00000000 40400000 00000000 494e7e80 00000000 76613200 02a1ac70
(cut)
[65811.465909] *ERROR* [0000d556aa620000-0000d556aa66ffff] S:0x0000000000050000 A:0000000000fc0000 LMEM <<< this is the one of highest interest
[65811.475390] *ERROR* [0000] 61090001 814e0000 0000ffff 0e009003 00000002 aa5c0000 0000d556 00000000
[65811.484442] *ERROR* [0020] 04800001 18800101 aa620030 0000d556 7a000004 00140480 00000000 00000000
[65811.493488] *ERROR* [0040] 00000000 00000000 18800101 aa480000 0000d556 0e009003 00000003 aa5c0000
[65811.502539] *ERROR* [0060] 0000d556 00000000 04800001 18800101 aa620078 0000d556 18800101 aa4a00c0
[65811.511590] *ERROR* [0080] 0000d556 0e009003 00000004 aa5c0000 0000d556 00000000 04800001 18800101
[65811.520641] *ERROR* [00a0] aa6200a8 0000d556 18800101 aa4a0180 0000d556 0e009003 00000005 aa5c0000
[65811.529687] *ERROR* [00c0] 0000d556 00000000 04800001 18800101 aa6200d8 0000d556 18800101 aa4a0240
[65811.538734] *ERROR* [00e0] 0000d556 0e009003 00000006 aa5c0000 0000d556 00000000 04800001 18800101
[65811.547782] *ERROR* [0100] aa620108 0000d556 18800101 aa4a0300 0000d556 0e009003 00000007 aa5c0000
[65811.556832] *ERROR* [0120] 0000d556 00000000 04800001 18800101 aa620138 0000d556 18800101 aa4a03c0
[65811.565885] *ERROR* [0140] 0000d556 0e009003 00000008 aa5c0000 0000d556 00000000 04800001 18800101
[65811.574936] *ERROR* [0160] aa620168 0000d556 18800101 aa4a0480 0000d556 0e009003 00000009 aa5c0000
[65811.583985] *ERROR* [0180] 0000d556 00000000 04800001 18800101 aa620198 0000d556 18800101 aa4a0540
[65811.593037] *ERROR* [01a0] 0000d556 0e009003 0000000a aa5c0000 0000d556 00000000 04800001 18800101
[65811.602085] *ERROR* [01c0] aa6201c8 0000d556 18800101 aa4a0600 0000d556 0e009003 0000000b aa5c0000
[65811.611135] *ERROR* [01e0] 0000d556 00000000 04800001 18800101 aa6201f8 0000d556 18800101 aa4a06c0
[65811.620182] *ERROR* [0200] 0000d556 0e009003 0000000c aa5c0000 0000d556 00000000 04800001 18800101
[65811.629234] *ERROR* [0220] aa620228 0000d556 18800101 aa4a0780 0000d556 0e009003 0000000d aa5c0000
[65811.638285] *ERROR* [0240] 0000d556 00000000 04800001 18800101 aa620258 0000d556 7a000004 00140480
[65811.647339] *ERROR* [0260] 00000000 00000000 00000000 00000000 18800101 aa480100 0000d556 0e009003
[65811.656386] *ERROR* [0280] 0000000e aa5c0000 0000d556 00000000 04800001 18800101 aa6202a0 0000d556
[65811.665436] *ERROR* [02a0] 7a000a04 00141c8c 00000000 00000000 00000000 00000000 7a000a04 00104100
[65811.674485] *ERROR* [02c0] aa5c0040 0000d556 00000001 00000000 04800000 05000000 00000000 00000000
[65811.683537] *ERROR* [02e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[65811.692585] *ERROR* [0300] 0e009003 0000000f aa5c0000 0000d556 00000000 04800001 18800101 aa620324
[65811.701638] *ERROR* [0320] 0000d556 7a000004 00140480 00000000 00000000 00000000 00000000 18800101
[65811.710685] *ERROR* [0340] aa480140 0000d556 0e009003 00000010 aa5c0000 0000d556 00000000 04800001
[65811.719733] *ERROR* [0360] 18800101 aa62036c 0000d556 7a000a04 00141c8c 00000000 00000000 00000000
[65811.728780] *ERROR* [0380] 00000000 7a000a04 00104100 aa5c0040 0000d556 00000002 00000000 04800000
[65811.737835] *ERROR* [03a0] 05000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
....
and this one can be decoded straight (without details here)
1 : STATE_SYSTEM_MEM_FENCE_ADDRESS
2 : MI_SEMAPHORE_WAIT
3 : MI_MEM_FENCE
4 : MI_BATCH_BUFFER_START
5 : PIPE_CONTROL
6 : MI_BATCH_BUFFER_START
7 : MI_SEMAPHORE_WAIT
8 : MI_MEM_FENCE
9 : MI_BATCH_BUFFER_START
10 : MI_BATCH_BUFFER_START
11 : MI_SEMAPHORE_WAIT
12 : MI_MEM_FENCE
13 : MI_BATCH_BUFFER_START
14 : MI_BATCH_BUFFER_START
15 : MI_SEMAPHORE_WAIT
16 : MI_MEM_FENCE
17 : MI_BATCH_BUFFER_START
18 : MI_BATCH_BUFFER_START
19 : MI_SEMAPHORE_WAIT
20 : MI_MEM_FENCE
21 : MI_BATCH_BUFFER_START
22 : MI_BATCH_BUFFER_START
23 : MI_SEMAPHORE_WAIT
24 : MI_MEM_FENCE
25 : MI_BATCH_BUFFER_START
26 : MI_BATCH_BUFFER_START
27 : MI_SEMAPHORE_WAIT
28 : MI_MEM_FENCE
29 : MI_BATCH_BUFFER_START
30 : MI_BATCH_BUFFER_START
31 : MI_SEMAPHORE_WAIT
32 : MI_MEM_FENCE
33 : MI_BATCH_BUFFER_START
34 : MI_BATCH_BUFFER_START
35 : MI_SEMAPHORE_WAIT
36 : MI_MEM_FENCE
37 : MI_BATCH_BUFFER_START
38 : MI_BATCH_BUFFER_START
39 : MI_SEMAPHORE_WAIT
40 : MI_MEM_FENCE
41 : MI_BATCH_BUFFER_START
42 : MI_BATCH_BUFFER_START
43 : MI_SEMAPHORE_WAIT
44 : MI_MEM_FENCE
45 : MI_BATCH_BUFFER_START
46 : MI_BATCH_BUFFER_START
47 : MI_SEMAPHORE_WAIT
48 : MI_MEM_FENCE
49 : MI_BATCH_BUFFER_START
50 : PIPE_CONTROL
51 : MI_BATCH_BUFFER_START
52 : MI_SEMAPHORE_WAIT
53 : MI_MEM_FENCE
54 : MI_BATCH_BUFFER_START
55 : PIPE_CONTROL
56 : PIPE_CONTROL
57 : MI_MEM_FENCE
58 : MI_BATCH_BUFFER_END
That would help understanding timeout, hangs, ...
Feedback ?