freedreno: skqp instability from GPU hangs
We have GPU hangs during skqp runs. This one failed with a hang in a new test:
22-06-22 17:46:14 R SERIAL-CPU> Starting: gles_colorfilterimagefilter
22-06-22 17:46:14 R SERIAL-CPU> [ 129.737053] adreno 5000000.gpu: [drm:a6xx_irq] *ERROR* gpu fault ring 0 fence 767 status 00E51005 rb 006a/006a ib1 0000000000000000/0000 ib2 0000000000000000/0000
22-06-22 17:46:14 R SERIAL-CPU> [ 129.752057] msm ae00000.mdss: [drm:recover_worker] *ERROR* A630: hangcheck recover!
22-06-22 17:46:14 R SERIAL-CPU> [ 129.759976] msm ae00000.mdss: [drm:recover_worker] *ERROR* A630: offending task: skqp:sq0 (/skqp/skqp /skqp/assets //results/gles gles_)
22-06-22 17:46:14 R SERIAL-CPU> [ 129.772747] revision: 630 (6.3.0.2)
22-06-22 17:46:14 R SERIAL-CPU> [ 129.776350] rb 0: fence: 1894/1895
22-06-22 17:46:14 R SERIAL-CPU> [ 129.780119] rptr: 12
22-06-22 17:46:14 R SERIAL-CPU> [ 129.782736] rb wptr: 106
22-06-22 17:46:14 R SERIAL-CPU> [ 129.785444] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG0: 0
22-06-22 17:46:14 R SERIAL-CPU> [ 129.792075] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG1: 0
22-06-22 17:46:14 R SERIAL-CPU> [ 129.798711] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG2: 0
22-06-22 17:46:14 R SERIAL-CPU> [ 129.805330] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG3: 0
22-06-22 17:46:14 R SERIAL-CPU> [ 129.811958] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG4: 0
22-06-22 17:46:14 R SERIAL-CPU> [ 129.818585] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG5: 0
22-06-22 17:46:14 R SERIAL-CPU> [ 129.825204] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG6: 0
22-06-22 17:46:14 R SERIAL-CPU> [ 129.831821] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG7: 1
22-06-22 17:46:14 R SERIAL-CPU> FAILED: gles_colorfilterimagefilter (255)
An example of an existing, known GPU hang that is marked as xfail (-1
in gl_rendertests.txt):
22-06-22 17:44:35 R SERIAL-CPU> Starting: gl_bug339297_as_clip
22-06-22 17:44:35 R SERIAL-CPU> [ 30.734114] adreno 5000000.gpu: [drm:a6xx_irq] *ERROR* gpu fault ring 0 fence ba status 00E59005 rb 0064/0064 ib1 00000001017DB000/0000 ib2 00000001024EE5D0/0000
22-06-22 17:44:35 R SERIAL-CPU> [ 30.749240] msm ae00000.mdss: [drm:recover_worker] *ERROR* A630: hangcheck recover!
22-06-22 17:44:35 R SERIAL-CPU> [ 30.757154] msm ae00000.mdss: [drm:recover_worker] *ERROR* A630: offending task: skqp:sq0 (/skqp/skqp /skqp/assets //results/gl gl_)
22-06-22 17:44:35 R SERIAL-CPU> [ 30.801549] revision: 630 (6.3.0.2)
22-06-22 17:44:35 R SERIAL-CPU> [ 30.805138] rb 0: fence: 184/186
22-06-22 17:44:35 R SERIAL-CPU> [ 30.808736] rptr: 12
22-06-22 17:44:35 R SERIAL-CPU> [ 30.811348] rb wptr: 100
22-06-22 17:44:35 R SERIAL-CPU> [ 30.814061] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG0: 0
22-06-22 17:44:35 R SERIAL-CPU> [ 30.820692] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG1: 0
22-06-22 17:44:35 R SERIAL-CPU> [ 30.827317] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG2: 0
22-06-22 17:44:35 R SERIAL-CPU> [ 30.833940] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG3: 0
22-06-22 17:44:35 R SERIAL-CPU> [ 30.840562] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG4: 0
22-06-22 17:44:35 R SERIAL-CPU> [ 30.847186] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG5: 0
22-06-22 17:44:35 R SERIAL-CPU> [ 30.847370] adreno 5000000.gpu: [drm:a6xx_irq] *ERROR* gpu fault ring 0 fence ba status 00E59005 rb 0064/0064 ib1 00000001017DB000/0000 ib2 00000001024EE5D0/0000
22-06-22 17:44:35 R SERIAL-CPU> [ 30.853808] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG6: 0
22-06-22 17:44:35 R SERIAL-CPU> [ 30.875222] adreno 5000000.gpu: [drm:a6xx_recover] CP_SCRATCH_REG7: 1
22-06-22 17:44:35 R SERIAL-CPU> [ 30.884461] msm ae00000.mdss: [drm:recover_worker] *ERROR* A630: hangcheck recover!
22-06-22 17:44:35 R SERIAL-CPU> Passed: gl_bug339297_as_clip
(that's the devcoredump from that job it looks like)