Link shader command causes flakyness with the new CI
The new virglrenderer CI jobs that weren't merged yet (!694 (merged)) revealed a regression that causes a number of flaky tests starting from commit 453017e3.
These flaky test regressions have an interesting property: They are very reproducible, as long as you remove the shader cache before every run.
After a lot of investigation, I found that the problem seems to be a state-leak when linking geometry shaders or when linking a TCS without a TES. I say it's a state-leak because it affects tests that don't even use those kinds of shaders, but I haven't found the actual culprit of this.
The file attached is a set of tests that can be run in ~1 minute on my PC and reproduces the bug. I couldn't reduce it further. When I tried, the flakes went away.
One last thing to notice is that I can only reproduce it when running deqp-runner.sh
with the DEQP_SUITE
option.
I changed my deqp-virgl-gl.toml
to use the attached file and reduced the tests to a single job, it looks like this:
[[deqp]]
deqp = "/install/crosvm-runner.sh"
caselists = ["/tmp/gles31.txt"]
deqp_args = [
"/deqp/modules/gles31/deqp-gles31",
"--deqp-surface-width=256",
"--deqp-surface-height=256",
"--deqp-surface-type=pbuffer",
"--deqp-gl-config-name=rgba8888d24s8ms0",
"--deqp-visibility=hidden"
]
timeout = 360.0
The exact command I'm running is the following:
FDO_CI_CONCURRENT=11 DEQP_RUNNER_OPTIONS="-vvv" GALLIUM_DRIVER=virgl CROSVM_GALLIUM_DRIVER=llvmpipe CROSVM_GPU_ARGS="gles=false,backend=virglrenderer,egl=true,surfaceless=true,width=1024,height=768" GALLIVM_PERF="nopt,no_quad_lod" DEQP_SUITE=virgl-gl GPU_VERSION=virgl-gl install/deqp-runner.sh 2>&1
This issue was initially found by @cristicc. I just confirmed it also happens on my local runs.