executor: split job execution to a separate process
By moving execution of jobs to their own process, we tie more
effectively the ressources used by the job to a process's lifetime.
This should result in a more resilient tracking of the DUT state,
and will enable updating the executor at runtime without affecting
any job currently executing \o/.
This commit is relatively big, as I wanted to keep the series
bisectable. Here are the list of changes that happened:
- executor.py:
- Run a flask server over a unix socket: This prevents multiple
instances to run concurrently, as only one process can listen
on the unix socket. Conversally, being able to connect to this
socket indicates that the machine is busy.
- __init__.py:
- Introduce the "executor run-job" command
- dut.py:
- Add a dependency to requests_unixsocket, to make REST queries to
the job process to get the state, cancel job, ...
- Start the per-job process, passing the job bucket initial tarball
by first writing it to a temporary file, then passing the fd to
the new process.
- To prevent a race condition where the job process may be starting
up while another job comes in and thus would consider the machine
free because the unix socket is not listened to by the job process,
we wait for up to 5 seconds after the job got queued for the unix
socket to become active, or we kill the process and fail the call.
- Move MachineState to dut.py, and rename it to DUTState
- Sergent Hartman:
- Move to dut.py, as we cannot have cross-dependencies between dut.py
and executor.py
- Introduce execute_next_task() which is basically queueing the
next training task, executing it, then reporting back the result.
It thus had to implement a minimal client.
- Prefix the next_task() and report() function with an underscore
- Run the different tasks in a per-dut thread
Fixes: #64