ci: Rebalance LAVA jobs
To optimize our CI pipeline efficiency, we aim to achieve consistent job durations of approximately 10 minutes. This ensures balanced resource utilization, reduces wait times, and improves overall throughput.
Proposed Changes:
Based on the analysis of job durations and resource allocations, this MR proposes implementing the following adjustments to job fractions and parallelism:
Job Name | Mean Job Duration (min) | Current Fraction | New Fraction | Current Parallel | New Parallel |
---|---|---|---|---|---|
a618_gl | 17 | 1 | 2 | 4 | 2 |
a618_vk | 13 | 2 | 3 | 12 | 10 |
a660_gl | 17 | - | - | 2 | 3 |
a660_vk | 14 | 4 | 5 | - | - |
anv-jsl | 15 | - | - | 4 | 5 |
anv-jsl-angle | 20 | - | - | 1 | 2 |
iris-jsl-deqp | 18 | 4 | 8 | - | - |
radv-stoney-angle | 16 | 1 | 2 | - | - |
radv-stoney-vkcts | 15 | 11 | 15 | - | - |
radeonsi-stoney-gl | 19 | 1 | 2 | - | - |
zink-tu-a618 | 18 | 2 | 3 | - | - |
Details:
Movement between Kingoftown and Limozeen Devices
To optimize job durations and resource utilization, we've adjusted workloads between the kingoftown
and limozeen
devices, knowing they are interchangeable for Mesa purposes.
Kingoftown Devices
We increased the fraction of the a618_vk
job from 2 to 3 and reduced its parallelism from 12 to 10. This change makes the job more efficient and frees up two kingoftown
devices. These devices are now reallocated to cover up limozeen
jobs, specifically a618_traces
and a618_skqp
. This reallocation balances the workload and reduces queue times for these jobs.
Limozeen Devices
With only six limozeen
devices available but ten jobs competing for them, resource contention was an issue. The jobs using limozeen
included:
a618_traces
a618_egl
-
a618_gl
(x4) a618_piglit
a618_skqp
zink-tu-a618
zink-tu-a618-traces
To alleviate this, we adjusted the a618_gl
job by changing its fraction from 1 to 2 and reducing its parallelism from 4 to 2. This adjustment reduced the number of limozeen
devices needed for a618_gl
from four to two, freeing up two limozeen
devices. These freed devices can now support other a618_*
jobs, decreasing wait times and improving efficiency across the board.
Summary of Reallocations
By modifying job fractions and parallelism:
-
Kingoftown Devices: Freed up two devices from
a618_vk
and reallocated them toa618_traces
anda618_skqp
. -
Limozeen Devices: Freed up two devices from
a618_gl
adjustments, improving resource distribution amonga618_*
jobs.
Note:
- The mean job durations are based on the last week performance metrics.
- Adjustments are made with consideration for available resources and expected workload.
- We'll continue to monitor job durations and resource utilization to make further optimizations as needed.