Skip to content

executorctl: try downloading the job bucket up to 3 times

Martin Roukala requested to merge executorctl_mcli_retry into master

This should work around the following mcli bug:

panic: sync: WaitGroup is reused before previous Wait has returned
goroutine 118 [running]:
sync.(*WaitGroup).Wait(0xc000a2e540?)
	sync/waitgroup.go:141 +0x85
github.com/minio/mc/cmd.(*ParallelManager).stopAndWait(...)
	github.com/minio/mc/cmd/parallel-manager.go:221
github.com/minio/mc/cmd.(*mirrorJob).mirror.func3()
	github.com/minio/mc/cmd/mirror-main.go:771 +0x4b
created by github.com/minio/mc/cmd.(*mirrorJob).mirror
	github.com/minio/mc/cmd/mirror-main.go:769 +0x1de
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/valve_gfx_ci/executor/client/client.py", line 256, in _read_executor_message_v1
    self._handle_end_message(msg)
  File "/usr/local/lib/python3.10/dist-packages/valve_gfx_ci/executor/client/client.py", line 239, in _handle_end_message
    subprocess.check_call(["mcli", "--no-color", "mirror",
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mcli', '--no-color', 'mirror', '--overwrite', '--remove', 'client/job-24-4b-fe-8c-18-db-vkcts-navi21-valve-2-2', 'job_folder']' returned non-zero exit status 2.
2022-12-01 15:35:31,389 [INFO] run_job: status: JobStatus.INCOMPLETE [MainThread]

If even after 3 attempts downloading still failed, we just print a WARNING and ignore the fact that some artifacts may be missing.

Merge request reports

Loading