Skip to content

Draft: WIP: manage confidence level on job results

Sergi Blanch Torné requested to merge sergi/ci-collate:confidence_level into main

There is a debate about the confidence level we have in the results provided by a job in a pipeline.

This tool is initially meant to collate information from a job or a pipeline, but its features when further when we add to it the capacity to update expectation files based on the results.csv and failures.csv files the jobs provide. It tries to do what a developer do manually, but there are different behaviors based on details in the source.

When one ran a testing pipeline, the failing jobs are automatically retried based on GitLab CI rules. Then the tool uses the information from those retries to know if the results are consistent between runs. It's easy to think it is a Flake instead of a Fail if the results aren't consistent. But when processing a Nightly run pipeline, those heavy jobs aren't retried, so there is no information to see inconsistencies, so the confidence level on the results drops. There is a stream of thought to say with only one Fail we cannot add to *-fails.txt file, instead to *-flakes.txt. But as far as we don't have a way to remove tests from flakes, we can end adding everything to flakes and cheating one of the purposes of the CI.

There are two issues open in deqp-runner that can affect this development (mesa/deqp-runner#48 and mesa/deqp-runner#49). 

Meanwhile, this merge request will experiment on different ways we can address the problem.

Merge request reports