Executor: the job setup can hang for days
Oct 24 08:38:54 keywords-gateway executor[1715582]: 2023-10-24 08:38:54,095 [MainThread] [INFO] log: +1.303s: Setup the infrastructure
Oct 24 08:38:54 keywords-gateway executor[1715582]: 2023-10-24 08:38:54,095 [MainThread] [INFO] _cache_remote_artifacts: Caching the kernel...
Oct 24 08:38:54 keywords-gateway executor[1715582]: 2023-10-24 08:38:54,095 [MainThread] [INFO] log: +1.303s: Caching https://gitlab.freedesktop.org/gfx-ci/ci-tron/-/package_files/519/download into minio...
Oct 24 08:38:54 keywords-gateway executor[1715582]: <local> - - [24/Oct/2023 08:38:54] "GET /api/v1/state HTTP/1.1" 200 -
Oct 24 08:38:54 keywords-gateway executor[1715582]: <local> - - [24/Oct/2023 08:38:54] "GET /api/v1/state HTTP/1.1" 200 -
Oct 24 08:38:55 keywords-gateway executor[1715582]: <local> - - [24/Oct/2023 08:38:55] "GET /api/v1/state HTTP/1.1" 200 -
Oct 24 08:38:55 keywords-gateway executor[1715582]: <local> - - [24/Oct/2023 08:38:55] "GET /api/v1/state HTTP/1.1" 200 -
Oct 24 08:38:56 keywords-gateway executor[1715582]: <local> - - [24/Oct/2023 08:38:56] "GET /api/v1/state HTTP/1.1" 200 -
Oct 24 08:38:56 keywords-gateway executor[1715582]: <local> - - [24/Oct/2023 08:38:56] "GET /api/v1/state HTTP/1.1" 200 -
Oct 24 08:38:56 keywords-gateway executor[1715582]: <local> - - [24/Oct/2023 08:38:56] "GET /api/v1/state HTTP/1.1" 200 -
[...]
Oct 26 09:48:21 keywords-gateway executor[1715582]: <local> - - [26/Oct/2023 09:48:21] "GET /api/v1/state HTTP/1.1" 200 -
Oct 26 09:48:21 keywords-gateway executor[1715582]: <local> - - [26/Oct/2023 09:48:21] "GET /api/v1/state HTTP/1.1" 200 -
Oct 26 09:48:21 keywords-gateway executor[1715582]: <local> - - [26/Oct/2023 09:48:21] "GET /api/v1/state HTTP/1.1" 200 -
Oct 26 09:48:22 keywords-gateway executor[1715582]: <local> - - [26/Oct/2023 09:48:22] "GET /api/v1/state HTTP/1.1" 200 -
We probably should introduce a watchdog at the run-job
level, and maybe one at the executor server level.