Skip to content

etl: Improve Data Extraction and Dashboards for Mesa CI Quality stats

The goal is to:

  • Optimize our ETL pipeline by reusing data, transforming ci-quality-stats.py in a transform script
  • Provide clearer insights into CI performance and make it easier to identify and address issues.

Transforming ci-quality-stats.py into a Data Transformation Script

Previously, ci-quality-stats.py was re-fetching much of the same data from GitLab that basic-stats.py and other scripts were already retrieving. This redundancy led to inefficiencies and longer processing times.

In this merge request, we have moved the data extraction responsibilities from ci-quality-stats.py to basic-stats.py. Now, ci-quality-stats.py serves as a transformation script that utilizes pandas for data manipulation and uses InfluxDB as its data source.

By consolidating the data extraction into basic-stats.py and employing pandas for efficient data handling, we have significantly improved performance. The impact of the new data extraction in basic-stats.py is minimal, adding only about 30 seconds per day of extraction time.

Underlying new data abstractions

  • Detailed Log Parsing: Implemented new scripts to parse GitLab CI logs more effectively, extracting valuable information such as unit test failures, job durations, and failure reasons.
  • Structured Data Handling: Utilized pandas DataFrame for efficient data manipulation and processing, leading to faster and more reliable data handling.
  • Simplified Codebase: Refactored existing scripts to remove obsolete functions and improve maintainability.

Updated Dashboards

Mesa CI Quality Dashboard

  • Accurate Metrics: Fixed issues with false-positive counters and refined metrics to focus on merge requests and CI pipeline performance.
  • image
  • More useful info Change web_url to MR title in streak pipelines/MRs table
  • image
  • Visual Enhancements: Switched to more informative chart types (e.g., donut charts) for better data visualization.
  • image
  • Improved Panels: Enhanced panels to display additional details like test suites and device information, making it easier to pinpoint issues.
  • image
  • With the help of structural logging, reworked the LAVA row with new relevant data
    • Distribution of DUT retries per device name
    • image
    • DUT jobs which take most time on average
    • image

New DUT Job Details Dashboard

  • Device Insights: Added a dashboard specifically for Device Under Test (DUT) metrics, showing mean job duration, mean pending time, and the number of retries.
  • image
  • image
Edited by Guilherme Gallo

Merge request reports