executor/jobs: allow marking machines unfit for service + add user watchdogs
Machine unfit for service
I added a "machine_unfit_for_service" console pattern, that allows jobs to mark machines as broken.
This can be used to drop
Watchdogs
Given the following job description:
timeouts:
overall:
hours: 1
retries: 0
# no retries possible here
watchdogs:
custom:
seconds: 30
retries: 1
console_patterns:
session_end:
regex: "^.*It's now safe to turn off your computer\r$"
watchdogs:
custom:
start:
regex: "CUSTOM START"
reset:
regex: "CUSTOM RESET"
stop:
regex: "CUSTOM STOP"
[...]
We got the following log while using a job in an interactive session:
root@boot2container:/app# echo "CUSTOM START"
CUSTOM +213.257s: Matched the following patterns: custom.start
START
root@boot2container:/app# +243.274s: Hit the timeout <Timeout custom: value=0:00:30, retries=1/1> --> Try again!
[...]
root@boot2container:/app# echo "CUSTOM START"
CUSTO+572.804s: Matched the following patterns: custom.start
M START
root@boot2container:/app# echo "CUSTOM RESET"
+581.698s: Matched the following patterns: custom.reset
CUSTOM RESET
root@boot2container:/app# echo "CUSTOM STOP+587.173s: Matched the following patterns: custom.reset, custom.stop
CUSTOM STOP
As you can see, when the stop
pattern is played, we also get the reset pattern. This is because bash is doing something funky with backspaces to erase characters, which SALAD completely ignores... I guess I can live with this, as this would only happen in interactive sessions.