CI job monitoring
I've been working on Influx/Grafana monitoring for our CI pipelines. Unfortunately the REST API to check out job status is not only really slow, but it doesn't give us the information we need for it. Right now this data is living in a bucket on Influx2 in the K3s cluster, but we need to automate it as well as keep on extending it and make it better.
The script I currently have together needs to be wrapped up into a K8s Job which periodically executes against task-runner, I expect. At a guess the right thing to do is (like backups) for it to exec in a task-runner pod so it can run the script against Rails, but with the caveat that it requires the InfluxDB2 gem to be installed. Maybe it would be better to duplicate the task-runner definition and spin up a new pod specifically for this? I'm not sure off the top of my head how we'd achieve that though; is there a way to extend the definition?
- Also needs secret defined for Influx token
- Also needs to have links to services correctly templated (e.g. influx/gitlab service names)
@bentiss How did you handle horizons for the nginx data? How do you know which time period you should be querying? We need to persist that data point somewhere ...