Unpacking the initial job folder can fail after a wait
Noticed this in https://gitlab.freedesktop.org/tanty/mesa-valve-ci/-/jobs/13619310
python3 /usr/local/bin/client.py run -w b2c.yml.jinja2 -j deqp-renoir-valve -s results
Packing up the share_directory
--> Wrote 224 bytes...
No machines available for the job, waiting: ...........................................................................................................................................................................................................................................Waiting for the executor to connect to our local port 33983
Connection established: Switch to proxy mode
+0.000s: Job console state changed from CREATED -> ACTIVE
+0.000s: Setup the infrastructure
+0.005s: An exception got caught: Traceback (most recent call last):
File "/app/executor/executor.py", line 788, in run
execute_job()
File "/app/executor/executor.py", line 674, in execute_job
self._cache_remote_artifacts()
File "/app/executor/executor.py", line 596, in _cache_remote_artifacts
self.job_bucket.setup()
File "/app/executor/executor.py", line 472, in setup
self.minio.extract_archive(self.initial_state_tarball_file, self.name)
File "/app/executor/minioclient.py", line 112, in extract_archive
with TarFile.open(fileobj=archive_fileobj, mode='r') as archive:
File "/usr/local/lib/python3.9/tarfile.py", line 1616, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
+2.008s: Job console state changed from ACTIVE -> OVER
Traceback (most recent call last):
File "/usr/local/bin/client.py", line 279, in _read_executor_message_v1
msg = Message.next_message(job_socket)
File "/usr/local/bin/message.py", line 75, in next_message
length = struct.unpack("!I", self.recv(sock, 4))[0]
File "/usr/local/bin/message.py", line 66, in recv
raise EOFError("The connection got interrupted before receiving the end of the message")
EOFError: The connection got interrupted before receiving the end of the message
2021-09-13 15:32:15,118 [INFO] run_job: status: JobStatus.INCOMPLETE [MainThread]
I had the Renoir held up in an interactive session, as soon as I closed my interactive session, I saw the CI job try to start, and the above happened.
Lets investigate why this happens...
Edited by Charlie Turner