Skip to content

ci: Rebalance LAVA jobs

Guilherme Gallo requested to merge gallo/mesa:ci-rebalance-lava-jobs into main

To optimize our CI pipeline efficiency, we aim to achieve consistent job durations of approximately 10 minutes. This ensures balanced resource utilization, reduces wait times, and improves overall throughput.

Proposed Changes:

Based on the analysis of job durations and resource allocations, this MR proposes implementing the following adjustments to job fractions and parallelism:

Job Name Mean Job Duration (min) Current Fraction New Fraction Current Parallel New Parallel
a618_gl 17 1 2 4 2
a618_vk 13 2 3 12 10
a660_gl 17 - - 2 3
a660_vk 14 4 5 - -
anv-jsl 15 - - 4 5
anv-jsl-angle 20 - - 1 2
iris-jsl-deqp 18 4 8 - -
radv-stoney-angle 16 1 2 - -
radv-stoney-vkcts 15 11 15 - -
radeonsi-stoney-gl 19 1 2 - -
zink-tu-a618 18 2 3 - -

Details:

Movement between Kingoftown and Limozeen Devices

To optimize job durations and resource utilization, we've adjusted workloads between the kingoftown and limozeen devices, knowing they are interchangeable for Mesa purposes.

Kingoftown Devices

We increased the fraction of the a618_vk job from 2 to 3 and reduced its parallelism from 12 to 10. This change makes the job more efficient and frees up two kingoftown devices. These devices are now reallocated to cover up limozeen jobs, specifically a618_traces and a618_skqp. This reallocation balances the workload and reduces queue times for these jobs.

Limozeen Devices

With only six limozeen devices available but ten jobs competing for them, resource contention was an issue. The jobs using limozeen included:

  • a618_traces
  • a618_egl
  • a618_gl (x4)
  • a618_piglit
  • a618_skqp
  • zink-tu-a618
  • zink-tu-a618-traces

To alleviate this, we adjusted the a618_gl job by changing its fraction from 1 to 2 and reducing its parallelism from 4 to 2. This adjustment reduced the number of limozeen devices needed for a618_gl from four to two, freeing up two limozeen devices. These freed devices can now support other a618_* jobs, decreasing wait times and improving efficiency across the board.

Summary of Reallocations

By modifying job fractions and parallelism:

  • Kingoftown Devices: Freed up two devices from a618_vk and reallocated them to a618_traces and a618_skqp.
  • Limozeen Devices: Freed up two devices from a618_gl adjustments, improving resource distribution among a618_* jobs.

Note:

  • The mean job durations are based on the last week performance metrics.
  • Adjustments are made with consideration for available resources and expected workload.
  • We'll continue to monitor job durations and resource utilization to make further optimizations as needed.
Edited by Guilherme Gallo

Merge request reports

Loading