Optimize artifact upload/download by gstreamer jobs
Two main ways in which egress bandwidth is used:
- Runners downloading docker images
- Runners downloading artifacts for jobs (cerbero-deps cache and for different jobs in the same pipeline)
For improving these, people have suggested using external caches, and to reduce the number of flaky tests we have.
Another thing that can be done is to reduce the size of the artifacts. We've been optimizing for job time, but now we know there's a significant monetary cost associated with the size of artifacts. This is the focus for this issue.
Concrete numbers will help to target this better, but we can make a start anyway.
The following has already been done:
What else we could do:
Take another look at what we're putting in our tarballs and see if there's some more low-hanging fruit
Use xz with parallel encode and decoder. @ystreet's testing from 6 months ago showed a 30-40% reduction in size.
- Schedule jobs from the same pipeline but different stages on the same runner and use a local cache to transfer data from one to the other (uploading artifacts is free, downloading is expensive).
- Move to a different cerbero-deps storage
- Only run CI when assigned to marge bot
- Eliminate the need to transfer artifacts from build to test stages (for some jobs)