GPU unstable on raven2 with Linux 6.5
- GPU: AMD Radeon Vega 3 Graphics (raven2, LLVM 15.0.6, DRM 3.54, 6.5.4-g8a16969a8434)
- Good kernel: 6.4.x
- Bad kernel: 6.5.x
Jobs:
- https://gitlab.freedesktop.org/mesa/mesa/-/jobs/49336521
- https://gitlab.freedesktop.org/mesa/mesa/-/jobs/49343504
- https://gitlab.freedesktop.org/mesa/mesa/-/jobs/49343854
- https://gitlab.freedesktop.org/mesa/mesa/-/jobs/49415553 (VA testing)
...
[0m2023-09-22 12:23:48.022789: [0mPass: 4437, ExpectedFail: 8, Skip: 192, Flake: 1, Duration: 1:28, Remaining: 6:49
[0m2023-09-22 12:23:48.022796: Pass: 4507, ExpectedFail: 8, Skip: 196, Flake: 1, Duration: 1:30, Remaining: 6:45
[0m2023-09-22 12:23:53.144672: [1m[ 224.000691] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:23:53.144876: [1m[ 224.018634] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:23:53.144933: [1m[ 224.056161] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031[0m
[0m2023-09-22 12:23:53.144975: [1m[ 224.105150] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)[0m
[0m2023-09-22 12:23:53.145020: [1m[ 224.147169] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x1[0m
[0m2023-09-22 12:23:53.145064: [1m[ 224.159308] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.145107: [1m[ 224.167267] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x3[0m
[0m2023-09-22 12:23:53.145134: [1m[ 224.176024] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.145165: [1m[ 224.183160] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:23:53.145193: [1m[ 224.194284] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:23:53.145232: [1m[ 224.209946] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:23:53.145260: [1m[ 224.220774] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:23:53.145286: [1m[ 224.228305] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:23:53.145313: [1m[ 224.235288] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:23:53.145339: [1m[ 224.240927] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.145371: [1m[ 224.246617] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:23:53.145398: [1m[ 224.252726] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.145446: [1m[ 224.258476] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:23:53.145485: [1m[ 224.263328] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:23:53.145530: [1m[ 224.278973] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:23:53.145572: [1m[ 224.289800] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:23:53.145625: [1m[ 224.297297] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:23:53.145666: [1m[ 224.304264] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:23:53.145774: [1m[ 224.309844] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.145816: [1m[ 224.315512] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:23:53.145857: [1m[ 224.321614] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.145899: [1m[ 224.327371] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:23:53.145939: [1m[ 224.332194] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:23:53.145966: [1m[ 224.347834] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:23:53.145991: [1m[ 224.358666] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:23:53.146028: [1m[ 224.366155] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:23:53.146054: [1m[ 224.373119] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:23:53.146080: [1m[ 224.378698] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.146108: [1m[ 224.384364] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:23:53.146134: [1m[ 224.390468] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.146159: [1m[ 224.396223] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:23:53.146194: [1m[ 224.401045] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:23:53.146227: [1m[ 224.416686] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:23:53.146281: [1m[ 224.427525] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:23:53.146319: [1m[ 224.435012] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:23:53.146368: [1m[ 224.441981] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:23:53.146408: [1m[ 224.447556] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.146446: [1m[ 224.453232] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:23:53.146490: [1m[ 224.459332] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:23:53.146517: [1m[ 224.465084] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:23:53.146720: [0m[31mERROR - Piglit error: Unknown option: -fbo
[0m2023-09-22 12:23:53.146757: [0mPass: 4588, ExpectedFail: 8, Skip: 199, Flake: 1, Duration: 1:32, Remaining: 6:44
[0m2023-09-22 12:23:53.146784: Pass: 4663, ExpectedFail: 8, Skip: 204, Flake: 1, Duration: 1:34, Remaining: 6:41
[0m2023-09-22 12:23:53.146815: Pass: 4717, ExpectedFail: 8, Skip: 208, Flake: 1, Duration: 1:36, Remaining: 6:34
[0m2023-09-22 12:23:58.230026: Pass: 4860, ExpectedFail: 8, Skip: 213, Flake: 1, Duration: 1:38, Remaining: 6:31
[0m2023-09-22 12:23:58.230133: Pass: 4896, ExpectedFail: 8, Skip: 217, Flake: 1, Duration: 1:40, Remaining: 6:28
[0m2023-09-22 12:23:58.230151: Pass: 5039, ExpectedFail: 8, Skip: 221, Flake: 1, Duration: 1:42, Remaining: 6:25
[0m2023-09-22 12:24:03.344302: [1m[ 224.469906] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:24:03.344506: [1m[ 224.485554] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:24:03.344563: [1m[ 224.496378] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:24:03.344596: [1m[ 224.503913] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:24:03.344766: [1m[ 224.510879] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.344839: [1m[ 224.516461] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.344875: [1m[ 224.522141] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.344919: [1m[ 224.528240] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.344988: [1m[ 224.533995] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:24:03.345049: [1m[ 224.538822] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:24:03.345097: [1m[ 224.554475] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:24:03.345156: [1m[ 224.565306] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:24:03.345193: [1m[ 224.572817] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:24:03.345249: [1m[ 224.579786] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.345299: [1m[ 224.585379] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.345354: [1m[ 224.591046] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.345387: [1m[ 224.597148] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.345423: [1m[ 224.602903] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:24:03.345456: [1m[ 224.607743] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:24:03.345494: [1m[ 224.623393] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:24:03.345526: [1m[ 224.634223] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:24:03.345562: [1m[ 224.641713] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:24:03.345599: [1m[ 224.648690] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.345630: [1m[ 224.654272] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.345665: [1m[ 224.659940] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.345696: [1m[ 224.666042] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.345731: [1m[ 224.671795] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:24:03.345762: [1m[ 224.676616] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:24:03.345802: [1m[ 224.692264] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:24:03.345851: [1m[ 224.703087] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:24:03.345917: [1m[ 224.710576] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:24:03.345971: [1m[ 224.717555] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.346011: [1m[ 224.723137] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.346067: [1m[ 224.728805] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.346117: [1m[ 224.734908] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.346157: [1m[ 224.740663] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:24:03.346189: [1m[ 224.745493] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process arb_shader_stor pid 12183 thread arb_shader:cs0 pid 12201)[0m
[0m2023-09-22 12:24:03.346244: [1m[ 224.761133] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)[0m
[0m2023-09-22 12:24:03.346278: [1m[ 224.771964] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000[0m
[0m2023-09-22 12:24:03.346308: [1m[ 224.779463] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)[0m
[0m2023-09-22 12:24:03.346339: [1m[ 224.786428] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.346370: [1m[ 224.792010] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.346406: [1m[ 224.797678] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0[0m
[0m2023-09-22 12:24:03.346437: [1m[ 224.803780] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0[0m
[0m2023-09-22 12:24:03.346480: [1m[ 224.809534] amdgpu 0000:04:00.0: amdgpu: RW: 0x0[0m
[0m2023-09-22 12:24:03.346725: Pass: 5079, ExpectedFail: 8, Skip: 222, Flake: 1, Duration: 1:44, Remaining: 6:22
[0m2023-09-22 12:24:03.346760: Pass: 5235, ExpectedFail: 8, Skip: 225, Flake: 1, Duration: 1:46, Remaining: 6:16
...
Edited by David Heidelberg