ci/lava: Enhance error handling and Job submission logic
Overview
This merge request enhances the wait for job
mechanism, making it smarter by integrating a stop condition based on the remaining execution time. Using a new environment variable, EXPECTED_JOB_DURATION_SEC
, which is customizable in the job definition, the script will now cease waiting and fail the job if insufficient time remains. This will accelerate job failures, thereby updating the merge queue more efficiently.
Improved Job Failure Timing
Job Link | Original Issue | New Behavior |
---|---|---|
Job 57451349 & Job 57452819 | The LAVA job didn't even start. | The job would fail 10 minutes earlier. |
Job 56981488 | The LAVA job had less than 9 minutes to run. | The job would fail 10 minutes earlier. |
Additional Enhancements
-
Refactored Exception Hierarchy: The exception structure has been overhauled to clearly differentiate between errors that can be retried and those that are fatal.
-
MesaCIRetriableException
: New base class for exceptions that should trigger a job retry. -
MesaCIFatalException
: Introduced for irremediable errors that necessitate an immediate halt.
-
-
Improved Error Logging: Enhanced the logging mechanism to record the full exception message instead of merely the type, applicable only to structured logs.
-
Clearer Script Interruption Messages: More explicit interruption messages have been implemented to facilitate quicker understanding and resolution of job submission failures.