Rethinking our RootFS/container strategy
Mesa's testing has grown massively since its beginning over 5 years ago, and our containers and rootfs reflect that organic growth.
As we are about to add more testing on new boards which would require deep changes to the existing rootfs/containers or introduce another one, we decided to take a global look at the current state and decided to take the first steps towards improving the current situation even if it makes the situation a little worse at first.
CI Farms needs
Before spending time analyzing the current strategy of Mesa CI, let’s first evaluate the different requirements and capabilities of the different CI farms:
Functionality / CI Farm | LAVA | Bare Metal | CI-Tron w/ Boot2container |
---|---|---|---|
Live execution log | Serial adapter then SSH | Serial adapter | Serial adapter |
RootFS | HTTP urls to: * Base RootFS tarball * Overlay RootFS tarball * Build tarball |
Either: * Pre-extracted in NFS share * All baked as an initrd |
Container images |
RootFS caching | On CI infra: * Layers of HTTP caching proxies On DUTs: N/A |
On CI infra: * Container local storage * HTTP caching proxy On DUTs: N/A |
On CI infra: * Pull-through container registry On DUTs: * Container local storage |
Kernel / Initrd / DTB | * HTTP urls for the kernel / Initrd * DTB file name, to access it via TFTP |
Can come from anywhere | HTTP URLs |
Build artefact caching | HTTP caching proxies | HTTP caching proxy | None (ci-tron#242) |
Persistent data cache | HTTP caching proxies (?) | HTTP caching proxies (?) | * S3 buckets on gateway * Container volumes on DUTs |
Current Mesa CI RootFS/container strategy
The current strategy to generate & store the different rootfs/containers needed to represent the test environments are very dependent on the drivers / jobs. I’ll ignore the windows jobs since I don’t know anything about them and they mostly run on dedicated runners.
Let’s first look at the build jobs before looking into the test jobs.
Build jobs containers
Most build jobs are running on the fd.o shared runners, let’s have a look at the dependency tree (as seen in the layers), along with how many rebuilds happened this year.
- x86_64 runners:
- alpine:edge
- alpine/x86_64_build (7)
- alpine/x86_64_lava_ssh_client (1)
- debian:bookworm-slim
- debian/x86_64_build-base (26)
- debian/android_build (27)
- debian/ppc64el_build (25)
- debian/s390x_build (24)
- debian/x86_32_build (24)
- debian/x86_64_build (25)
- fedora:38
- fedora/x86_64_build (12)
- alpine:edge
- arm64 runners
- debian/arm64_build (26)
Thanks to the usage of container layers, the 10 build containers share most of their content and thus reduce execution time, storage and bandwidth costs during build, but also storage and bandwidth costs for the runners using them to build the artefacts.
Test jobs containers
Like for the build jobs, the test jobs are built on fd.o runners, but most of them aren’t run on them. Let’s look at their dependency tree (as seen in the layers) along with how many times they were rebuilt this year (between parenthesis), and who is using them:
- x86_64 runners:
- debian:bookwork-slim
- debian/x86_64_test-base (35)
- debian/x86_64_test-gl (93): Used by
- llvmpipe on fd.o runners
- Zink-on-radv on CI-Tron / Valve infra
- debian/x86_64_test-vk (95): Used by
- lavapipe on fd.o runners
- Radv/r300g on CI-Tron / Valve infra
- debian/x86_64_test-gl (93): Used by
- debian/arm32_test (83): Used by:
- vc4/v3dv on Bare-metal
- Etnaviv on Bare-metal
- Nouveau on Bare-metal
- debian/arm64_test (89): Used by:
- vc4/v3dv jobs on Bare-metal
- Etnaviv on Bare-metal
- Nouveau on Bare-metal
- debian/x86_64_test-base (35)
- debian:bookwork-slim
Please note how the arm32/64 containers are actually x86_64 containers. Indeed, they mostly contain a stripped down debian, with all the packages needed by the bare-metal farms to set up the test environment for the DUTs. The reason for their name is because they also contain a rootfs, copied from the result of the kernel+rootfs_arm32/64 jobs (a tarball stored in s3.freedesktop.org).
Building Mesa
Mesa gets built by runners using the appropriate *_build
container, then store the artefact both in S3 (using ci-fairy) and as gitlab artefacts.
The content of the artefact is meant to provide a built version of mesa, along with all the files necessary for testing so that the git tree does not need to be re-downloaded on the DUT runners.
Here is an overview:
- /b2c → Script + template necessary to queue a job in CI-Tron
- /ci-common → Various scripts that help set up the test environment
- /lava → Various scripts relied upon by lava
- install.tar: → Archive to be extracted in the rootfs
- /install
- /bare-metal/ → Collection of scripts which should be the same as the one found in
debian/arm*_test
- /common/ → Same scripts as
/ci-common
- /fossils/ → Various scripts and files needed to run fossils
- /lib/
- /piglit/ → Various scripts to run piglit
- /share/
- /vkd3d-proton/ → A script needed to execute vkd3d-proton tests
- /VERSION
- /*.txt → Various expectation files
- /*.toml → Various deqp-runner config files
- /*.sh → Various scripts that serve as entry points for testing
Deployment flow per farm
Step / CI Farm | Shared runners | LAVA | Bare-metal | CI-Tron |
---|---|---|---|---|
Gitlab runner image | debian/x86_64_test-* |
debian/x86_64_build |
debian/arm.*_test |
gfx-ci/ci-tron/mesa-trigger |
GW: Artefact download | * RootFS: N/A * Kernel/DTB: N/A * Build: s3.fd.o |
* RootFS: s3.fd.o * Kernel/DTB: s3.fd.o * Build: s3.fd.o |
* RootFS: built-in runner img * Kernel/DTB: built-in or s3.fd.o * Build: s3.fd.o |
* RootFS: debian/x86_64_test-* * Kernel/DTB: HTTP link Build: Gitlab job artefact |
GW: Artefact deployment | * RootFS: N/A * Build: untared by job * Results: N/A |
* RootFS: LAVA from URL * Kernel/DTB: LAVA from URL * Build: LAVA from URL |
* RootFS: copied by job * Kernel/DTB: copied by job * Build: untared by job |
* RootFS: pull-through registry * Kernel/DTB: local HTTP server * Build: local S3 bucket, untar on DUT |
DUT: Artefact download | N/A | TFTP / NFS | TFTP / NFS | TFTP / HTTP / S3 |
GW: Results retrieval | N/A | NFS | NFS or no results | S3 |
Discussion
In my opinion, here are the pros and cons of the current solution.
Pros
- The build containers are as optimal as can be with the default container storage settings
- The test x86_64 containers are almost perfect too, aside from containing a kernel for venus testing
- The images are relatively small and well optimised for the current targets
Cons
- Duplicated code / package lists between the build, test, and rootfs jobs that need to be kept in sync & Non-uniform test environment across CI farms, drivers, and architectures:
- Makes every job a little different → Increases the size and complexity of the CI code
- Potentially-subtle breakages due to the build artefacts being generated for a different set of libraries (as encountered a few weeks ago)
- Makes dependency uprevs harder (Don’t Repeat Yourself)
- Possible solutions:
- Make both the build and test containers/rootfs inherit from the base test environment (basic list of packages needed for mesa)
- Make the build containers contain all the build dependencies of Mesa
- Make the test containers contain all the test suites needed for testing
- Make the rootfs tarballs by extracting the test containers then adding farm-specific files (LAVA-only)
- Make the build and test containers/rootfs tags include the base test environment’s tag
- Container and rootfs images generated in pre-merge are not labelled as such have the same lifespan as post-merge:
- Increases the storage costs for fd.o with no benefit
- Possible solution:
- Add pre-/post-merge to the tag, copy the pre-merge container/rootfs to the post-merge one when missing. Containers won’t duplicate the layers (~0 cost) but the rootfs will get duplicated…
- Hard to near-impossible to reproduce the test environment locally (depending on the job):
- Any job running on LAVA / Bare-metal requires intimate knowledge of the code to be able to reproduce
- Makes it very hard for developer to reason with the test environment
- Possible solution: We need single-line reproducers!
- Have the test environment be a container that is common between all farms
- Make the entrypoint a script that takes the URL of the build as an env var, then download/extract it before running
- Generate a one-liner developers can copy/paste to get a shell in the test environment
- Our testing environment is a crude reimplementation of container runtimes:
- Changing one byte in a rootfs requires a complete new rootfs to be stored on fd.o (no sharing at all), downloaded in full, and stored in full at every CI infra using it
- Increases both storage and egress costs for fd.o
- Increases bandwidths and storage costs at the CI Farms
- Rebuild counts will grow as new DUTs / Farms are added (assuming they can even share the same files)
- Potential solution:
- Use containers as a shared base for every test jobs
- Reduce cpu, storage and bandwidth cost for fd.o by switching to zstd:chunked compress →zstd is faster to compress/decompress, and has a smaller size
- Reduce cpu, storage, and bandwidth costs for CI farms by switching to zstd:chunked images and enabling composefs as a storage driver → The same file will be downloaded and stored only once on disk even when it comes from different containers / tags
- The rootfs and debian/arm.*_test containers largely contain the same data
- Doubles storage cost for fd.o
- Potential solution: Make the bare-metal farms:
- Implement the previous solution to get a generic test container
- Have a generic trigger container that contains all the dependencies for setting up the test environment
- Extract the generic test container straight to the container using podman export
- Avoid re-downloads of the test container by exposing a podman rootless socket inside the runner
- Make the podman rootless service unable to run containers by periodically killing any container being run on it
Summary
The current solution works but has the following overall limitations:
- Hard to maintain
- Hard to reproduce locally
- Inefficient on the fd.o side (storage and bandwidth)
- Inefficient on the CI gateway / DUT side
But container runtimes should be able to help by switching to zstd:chunked + composefs to reduce fd.o storage costs (zstd is better than gzip), bandwidth cost (by only downloading the files that changed between containers), and local storage in the CI farms by deduplicating files between containers (thanks to composefs).
Next steps?
There are changes needed in multiple projects to make the new strategy bear fruits
Mesa
- Reduce code duplication between the build and test containers
- Make the x86_64 base container scripts architecture-independent, with per-arch adjustments possible
- Add arm64 -base, -vk, and -gl containers
- Make the lava rootfs and the bare-metal use the new test containers (either extract during rootfs creation, or at runtime)
Bare-metal
- Add new podman daemon (not root) dedicated to container caching, exposed to the jobs through a unix socket in an env var (
DOCKER_CONTAINER_CACHE_SOCKET
): - Make sure any container that could be run through this socket does not have access to the filesystem or the internet
- Add periodic scripts to kill every container running
- Document the needed change for every farm to reproduce
- Drop the rootfs from the arm*_test containers and use
podman export
to extract the test container to the NFS share
LAVA
Not much can be done there, aside from maybe generating and caching the rootfs locally… and enabling composefs when the time comes.
CI-templates
Let’s reduce bandwidth usage by switching our containers to zstd:chunked.
Relevant upstream issue in podman which is trying to make this the default: https://github.com/containers/image/issues/2189