Executor/Job: Generalize the use of artifacts
Right now, the only artifacts usable in CI-tron are a kernel, an initramfs, and containers (through the pull-through registries). This is however a little too limiting outside of x86 DUTs where boot methods are more... diverse.
In order to provide power to the CI-tron users without ending up with layer violations, I think we should allow users to expose artifacts through multiple means, via the job description.
Here is an example of how it could look like:
[...]
deployment:
# Initial boot
start:
kernel:
url: "{{ minio_url }}/test-kernel"
cmdline:
- b2c.container="docker://{{ pull_thru_registry }}/infra/machine-registration:latest check"
- b2c.ntp_peer="ci-gateway" b2c.pipefail b2c.cache_device=auto
- b2c.container="-v /container/tmp:/storage docker://10.42.0.1:8002/tests/mesa:12345"
- console={{ local_tty_device }},115200 earlyprintk=vga,keep SALAD.machine_id={{ machine_id }}
initramfs:
url: "{{ minio_url }}/test-initramfs"
storage:
# ci-gateway:69 ?
tftp:
- path: "/job/config.txt"
# Inline data editing
data: |
blabla
blabla
blabla
- path: "/job/dtbs/blabla"
url: "blabla"
# http://ci-gateway:80 ?
http:
- path: "/config.txt"
data: |
blabla
blabla
blabla
- path: "/job/dtbs/blabla"
url: "blabla"
- path: "/job/secrets"
data: |
MYSECRET_KEY: blabla
lifetime:
access_count: 1 # The resource will stop being available after N accesses (0 by default). This gets reset after every reboot
first_read
# The nbd server will likely have to be per job and boot, so we'll need a variable here
nbd:
- name: "block name"
readonly: False
url: https://path/to/my/disk.img
# or define a filesystem that gets re-created for every job
filesystem:
type: ext4
size: 20G
data:
url: https://path/to/my/tarball.tar.gz
# or, extract a whole container
container: "docker://alpine:latest"
# or simply define a raw block of a wanted size
raw:
size: 20G
reset_on: boot|job_start|never # The disk will remain between jobs (not guaranteed)
shared: False # The disk will only be accessible to this DUT, and not be accessible to others
expiration: 7 days # The disk will be removed 7 days after the last job requested it
s3:
# TBD
nfs:
# TBD
Here are some things to think about:
- How do we distinguish between user paths and fixed paths like the executor's own rest API?
- Which part of the executor is in charge of which protocol? Any server that is shared between jobs/DUTs (HTTP/TFTP) would have to be hosted by the executor server while anything that can be run as a per-job basis could be handled by the job process.
- How can the job process reference an artifact? AKA, how does a client reconstruct the final URL for the artifact / what host:port should they use? Should this be a well-known or should we have a variable that would define the base per protocol like we do with minio (
{{http-artifact-base}}
,tftp-artifact-base
, ...) - How do we guarantee that jobs don't interfere with each other? I guess that would mean always treating every request coming from a DUT as being in their own namespace... except for blessed paths like for the executor machine registration API or the boot artifacts on minio.
- Do we allow users to specify the port on which they want to expose a particular server? Do we allow them to set TLS certificates, and such?
- In case a user decides to define a file that would otherwise be auto-generated by boots (
http://ci-gateway/boot/<machine_id>/boot.ipxe
ortftp://ci-gateway/config.txt
), do we use their file or ours? If we use theirs, do we just ignore anything found inkernel
andinitramfs
? - What's the impact on job folders? Should we allow creating more buckets through the job description?
- For nbd/nfs/s3, figure out a way for the user to decide the data's lifetime (when to reset it): every boot, beginning of job, never? Do we allow changing them in the
continue
deployment?