Rethinking our RootFS/container strategy

Mesa's testing has grown massively since its beginning over 5 years ago, and our containers and rootfs reflect that organic growth.

As we are about to add more testing on new boards which would require deep changes to the existing rootfs/containers or introduce another one, we decided to take a global look at the current state and decided to take the first steps towards improving the current situation even if it makes the situation a little worse at first.

CI Farms needs

Before spending time analyzing the current strategy of Mesa CI, let’s first evaluate the different requirements and capabilities of the different CI farms:

Functionality / CI Farm	LAVA	Bare Metal	CI-Tron w/ Boot2container
Live execution log	Serial adapter then SSH	Serial adapter	Serial adapter
RootFS	HTTP urls to: * Base RootFS tarball * Overlay RootFS tarball * Build tarball	Either: * Pre-extracted in NFS share * All baked as an initrd	Container images
RootFS caching	On CI infra: * Layers of HTTP caching proxies On DUTs: N/A	On CI infra: * Container local storage * HTTP caching proxy On DUTs: N/A	On CI infra: * Pull-through container registry On DUTs: * Container local storage
Kernel / Initrd / DTB	* HTTP urls for the kernel / Initrd * DTB file name, to access it via TFTP	Can come from anywhere	HTTP URLs
Build artefact caching	HTTP caching proxies	HTTP caching proxy	None (ci-tron#242)
Persistent data cache	HTTP caching proxies (?)	HTTP caching proxies (?)	* S3 buckets on gateway * Container volumes on DUTs

Current Mesa CI RootFS/container strategy

The current strategy to generate & store the different rootfs/containers needed to represent the test environments are very dependent on the drivers / jobs. I’ll ignore the windows jobs since I don’t know anything about them and they mostly run on dedicated runners.

Let’s first look at the build jobs before looking into the test jobs.

Build jobs containers

Most build jobs are running on the fd.o shared runners, let’s have a look at the dependency tree (as seen in the layers), along with how many rebuilds happened this year.

x86_64 runners:
- alpine:edge
  - alpine/x86_64_build (7)
  - alpine/x86_64_lava_ssh_client (1)
- debian:bookworm-slim
  - debian/x86_64_build-base (26)
  - debian/android_build (27)
  - debian/ppc64el_build (25)
  - debian/s390x_build (24)
  - debian/x86_32_build (24)
  - debian/x86_64_build (25)
- fedora:38
  - fedora/x86_64_build (12)
arm64 runners
- debian/arm64_build (26)

Thanks to the usage of container layers, the 10 build containers share most of their content and thus reduce execution time, storage and bandwidth costs during build, but also storage and bandwidth costs for the runners using them to build the artefacts.

Test jobs containers

Like for the build jobs, the test jobs are built on fd.o runners, but most of them aren’t run on them. Let’s look at their dependency tree (as seen in the layers) along with how many times they were rebuilt this year (between parenthesis), and who is using them:

x86_64 runners:
- debian:bookwork-slim
  - debian/x86_64_test-base (35)
    - debian/x86_64_test-gl (93): Used by
      - llvmpipe on fd.o runners
      - Zink-on-radv on CI-Tron / Valve infra
    - debian/x86_64_test-vk (95): Used by
      - lavapipe on fd.o runners
      - Radv/r300g on CI-Tron / Valve infra
  - debian/arm32_test (83): Used by:
    - vc4/v3dv on Bare-metal
    - Etnaviv on Bare-metal
    - Nouveau on Bare-metal
  - debian/arm64_test (89): Used by:
    - vc4/v3dv jobs on Bare-metal
    - Etnaviv on Bare-metal
    - Nouveau on Bare-metal

Please note how the arm32/64 containers are actually x86_64 containers. Indeed, they mostly contain a stripped down debian, with all the packages needed by the bare-metal farms to set up the test environment for the DUTs. The reason for their name is because they also contain a rootfs, copied from the result of the kernel+rootfs_arm32/64 jobs (a tarball stored in s3.freedesktop.org).

Building Mesa

Mesa gets built by runners using the appropriate *_build container, then store the artefact both in S3 (using ci-fairy) and as gitlab artefacts.

The content of the artefact is meant to provide a built version of mesa, along with all the files necessary for testing so that the git tree does not need to be re-downloaded on the DUT runners.

Here is an overview:

/b2c → Script + template necessary to queue a job in CI-Tron
/ci-common → Various scripts that help set up the test environment
/lava → Various scripts relied upon by lava
install.tar: → Archive to be extracted in the rootfs
- /install
- /bare-metal/ → Collection of scripts which should be the same as the one found in debian/arm*_test
- /common/ → Same scripts as /ci-common
- /fossils/ → Various scripts and files needed to run fossils
- /lib/
- /piglit/ → Various scripts to run piglit
- /share/
- /vkd3d-proton/ → A script needed to execute vkd3d-proton tests
- /VERSION
- /*.txt → Various expectation files
- /*.toml → Various deqp-runner config files
- /*.sh → Various scripts that serve as entry points for testing

Deployment flow per farm

Step / CI Farm	Shared runners	LAVA	Bare-metal	CI-Tron
Gitlab runner image	`debian/x86_64_test-*`	`debian/x86_64_build`	`debian/arm.*_test`	`gfx-ci/ci-tron/mesa-trigger`
GW: Artefact download	* RootFS: N/A * Kernel/DTB: N/A * Build: s3.fd.o	* RootFS: s3.fd.o * Kernel/DTB: s3.fd.o * Build: s3.fd.o	* RootFS: built-in runner img * Kernel/DTB: built-in or s3.fd.o * Build: s3.fd.o	* RootFS: `debian/x86_64_test-` Kernel/DTB: HTTP link Build: Gitlab job artefact
GW: Artefact deployment	* RootFS: N/A * Build: untared by job * Results: N/A	* RootFS: LAVA from URL * Kernel/DTB: LAVA from URL * Build: LAVA from URL	* RootFS: copied by job * Kernel/DTB: copied by job * Build: untared by job	* RootFS: pull-through registry * Kernel/DTB: local HTTP server * Build: local S3 bucket, untar on DUT
DUT: Artefact download	N/A	TFTP / NFS	TFTP / NFS	TFTP / HTTP / S3
GW: Results retrieval	N/A	NFS	NFS or no results	S3

Discussion

In my opinion, here are the pros and cons of the current solution.

Pros

The build containers are as optimal as can be with the default container storage settings
The test x86_64 containers are almost perfect too, aside from containing a kernel for venus testing
The images are relatively small and well optimised for the current targets

Cons

Duplicated code / package lists between the build, test, and rootfs jobs that need to be kept in sync & Non-uniform test environment across CI farms, drivers, and architectures:
- Makes every job a little different → Increases the size and complexity of the CI code
- Potentially-subtle breakages due to the build artefacts being generated for a different set of libraries (as encountered a few weeks ago)
- Makes dependency uprevs harder (Don’t Repeat Yourself)
- Possible solutions:
  - Make both the build and test containers/rootfs inherit from the base test environment (basic list of packages needed for mesa)
  - Make the build containers contain all the build dependencies of Mesa
  - Make the test containers contain all the test suites needed for testing
  - Make the rootfs tarballs by extracting the test containers then adding farm-specific files (LAVA-only)
  - Make the build and test containers/rootfs tags include the base test environment’s tag
Container and rootfs images generated in pre-merge are not labelled as such have the same lifespan as post-merge:
- Increases the storage costs for fd.o with no benefit
- Possible solution:
  - Add pre-/post-merge to the tag, copy the pre-merge container/rootfs to the post-merge one when missing. Containers won’t duplicate the layers (~0 cost) but the rootfs will get duplicated…
  - Hard to near-impossible to reproduce the test environment locally (depending on the job):
  - Any job running on LAVA / Bare-metal requires intimate knowledge of the code to be able to reproduce
  - Makes it very hard for developer to reason with the test environment
  - Possible solution: We need single-line reproducers!
    - Have the test environment be a container that is common between all farms
    - Make the entrypoint a script that takes the URL of the build as an env var, then download/extract it before running
    - Generate a one-liner developers can copy/paste to get a shell in the test environment
Our testing environment is a crude reimplementation of container runtimes:
- Changing one byte in a rootfs requires a complete new rootfs to be stored on fd.o (no sharing at all), downloaded in full, and stored in full at every CI infra using it
- Increases both storage and egress costs for fd.o
- Increases bandwidths and storage costs at the CI Farms
- Rebuild counts will grow as new DUTs / Farms are added (assuming they can even share the same files)
- Potential solution:
  - Use containers as a shared base for every test jobs
  - Reduce cpu, storage and bandwidth cost for fd.o by switching to zstd:chunked compress →zstd is faster to compress/decompress, and has a smaller size
  - Reduce cpu, storage, and bandwidth costs for CI farms by switching to zstd:chunked images and enabling composefs as a storage driver → The same file will be downloaded and stored only once on disk even when it comes from different containers / tags
The rootfs and debian/arm.*_test containers largely contain the same data
- Doubles storage cost for fd.o
- Potential solution: Make the bare-metal farms:
  - Implement the previous solution to get a generic test container
  - Have a generic trigger container that contains all the dependencies for setting up the test environment
  - Extract the generic test container straight to the container using podman export
  - Avoid re-downloads of the test container by exposing a podman rootless socket inside the runner
  - Make the podman rootless service unable to run containers by periodically killing any container being run on it

Summary

The current solution works but has the following overall limitations:

Hard to maintain
Hard to reproduce locally
Inefficient on the fd.o side (storage and bandwidth)
Inefficient on the CI gateway / DUT side

But container runtimes should be able to help by switching to zstd:chunked + composefs to reduce fd.o storage costs (zstd is better than gzip), bandwidth cost (by only downloading the files that changed between containers), and local storage in the CI farms by deduplicating files between containers (thanks to composefs).

Next steps?

There are changes needed in multiple projects to make the new strategy bear fruits

Mesa

Reduce code duplication between the build and test containers
Make the x86_64 base container scripts architecture-independent, with per-arch adjustments possible
Add arm64 -base, -vk, and -gl containers
Make the lava rootfs and the bare-metal use the new test containers (either extract during rootfs creation, or at runtime)

Bare-metal

Add new podman daemon (not root) dedicated to container caching, exposed to the jobs through a unix socket in an env var (DOCKER_CONTAINER_CACHE_SOCKET):
Make sure any container that could be run through this socket does not have access to the filesystem or the internet
Add periodic scripts to kill every container running
Document the needed change for every farm to reproduce
Drop the rootfs from the arm*_test containers and use podman export to extract the test container to the NFS share

LAVA

Not much can be done there, aside from maybe generating and caching the rootfs locally… and enabling composefs when the time comes.

CI-templates

Let’s reduce bandwidth usage by switching our containers to zstd:chunked.

Relevant upstream issue in podman which is trying to make this the default: https://github.com/containers/image/issues/2189

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message