etl: Improve Data Extraction and Dashboards for Mesa CI Quality stats (!114) · Merge requests · gfx-ci / Mesa Performance Tracking

Guilherme Gallo requested to merge gallo/mesa-performance-tracking:cqs-async into master Sep 28, 2024

The goal is to:

Optimize our ETL pipeline by reusing data, transforming ci-quality-stats.py in a transform script
Provide clearer insights into CI performance and make it easier to identify and address issues.

Transforming `ci-quality-stats.py` into a Data Transformation Script

Previously, ci-quality-stats.py was re-fetching much of the same data from GitLab that basic-stats.py and other scripts were already retrieving. This redundancy led to inefficiencies and longer processing times.

In this merge request, we have moved the data extraction responsibilities from ci-quality-stats.py to basic-stats.py. Now, ci-quality-stats.py serves as a transformation script that utilizes pandas for data manipulation and uses InfluxDB as its data source.

By consolidating the data extraction into basic-stats.py and employing pandas for efficient data handling, we have significantly improved performance. The impact of the new data extraction in basic-stats.py is minimal, adding only about 30 seconds per day of extraction time.

Underlying new data abstractions

Detailed Log Parsing: Implemented new scripts to parse GitLab CI logs more effectively, extracting valuable information such as unit test failures, job durations, and failure reasons.
Structured Data Handling: Utilized pandas DataFrame for efficient data manipulation and processing, leading to faster and more reliable data handling.
Simplified Codebase: Refactored existing scripts to remove obsolete functions and improve maintainability.

Updated Dashboards

Mesa CI Quality Dashboard

Accurate Metrics: Fixed issues with false-positive counters and refined metrics to focus on merge requests and CI pipeline performance.
More useful info Change web_url to MR title in streak pipelines/MRs table
Visual Enhancements: Switched to more informative chart types (e.g., donut charts) for better data visualization.
Improved Panels: Enhanced panels to display additional details like test suites and device information, making it easier to pinpoint issues.
With the help of structural logging, reworked the LAVA row with new relevant data
- Distribution of DUT retries per device name
- DUT jobs which take most time on average

New DUT Job Details Dashboard

Device Insights: Added a dashboard specifically for Device Under Test (DUT) metrics, showing mean job duration, mean pending time, and the number of retries.

Edited Oct 04, 2024 by Guilherme Gallo

Admin message

etl: Improve Data Extraction and Dashboards for Mesa CI Quality stats

Transforming ci-quality-stats.py into a Data Transformation Script

Underlying new data abstractions

Updated Dashboards

Mesa CI Quality Dashboard

New DUT Job Details Dashboard

Merge request reports

Transforming `ci-quality-stats.py` into a Data Transformation Script