ci: priority queue - "taking too long" issue

Documenting what was discussed on IRC:

We see several "CI is taking too long" replies from Marge, you can check that on in panel of the dashboard: https://ci-stats-grafana.freedesktop.org/d/Ae_TLIwVk/mesa-ci-quality-false-positives?orgId=1&viewPanel=15

It seems that sometimes this happens because the current gitlab policy is to run the oldest jobs first.

A few possible solutions were discussed:

Implement a new policy in gitlab (to allow users to have different priorities)
Implement two new endpoints in gitlab: one to get the queue, another one to pick a given job in the queue. This way the prioritization could be in the runner side which would choose the jobs to pick.
Soft-disable mechanism. We could have a daemon that would check if we have a certain amount of jobs in the waiting queue of a given tag and we could spawn more docker gitlab-runners of this tag when needed to execute /bin/true (so the job wouldn't be executed, but ignored) and report somewhere about this (IRC?)
Increase the overall timeout
Adjust timeout of jobs

A few points: We need to check if the bottleneck are in the runners, or available DUT, or if the jobs are failing and getting retried. Maybe it is interesting to start with an analysis regarding the main reasons pipelines timeout.

Please let me know if I'm missing something and your feedback on this. Thanks

Edited Sep 04, 2023 by Helen Mae Koike Fornazier

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Admin message

ci: priority queue - "taking too long" issue