Enable logging and monitoring
We need to properly handle logging, monitoring, and alerting for our GitLab services.
The current setup I'm thinking is:
- Prometheus to scrape data from all our sources
- Grafana to visualise that data
- Sentry to handle error logging
Sentry should almost certainly be private, unless we can determine that all the logging we're getting doesn't contain any sensitive information at all. Prometheus and Grafana can be public though.
General:
-
Create config repos for the above three (one public / one secret)
Prometheus:
-
Install Prometheus chart -
Ingest Kubernetes pod information -
Double-check cluster RBAC -
Enable persistent storage using SSD storage class -
Ingest nginx information (need monitor installed?) -
Ingest GitLab information (monitor already running) -
Ingest GitLab-PostgreSQL information (need monitor installed) -
Ingest GitLab-Redis information (need monitor installed) -
Expose as prometheus.fd.o
Grafana:
-
Install Grafana chart -
Connect to Prometheus -
Create GitLab auth provider and use this -
Configure outbound email? -
Enable persistent storage using SSD storage class -
Expose as grafana.fd.o -
Figure out a good Kubernetes dashboard -
Figure out a good GitLab dashboard -
Set up alerting for critical disk space -
Set up alerting for critical CPU usage -
Set up alerting for critical memory usage
Sentry:
-
Install Sentry chart -
Use SSD storage class for persistent volumes -
Create GitLab auth provider and use this -
Expose as sentry.fd.o -
Enable outbound email -
Ingest error logs from GitLab -
Ingest error logs from Kubernetes? (e.g. pod termination) -
Set up alerting for errors -
Allow reporting of errors as GitLab issues (to freedesktop/freedesktop?)
Some open questions:
-
Where should we be configuring alerting? Both Prometheus and Grafana can do it; Grafana can define its own alerts or, through a plugin, ingest Prometheus alerts. I have no idea which one is better. -
StackDriver can currently pull our logs, as well as Sentry. Do we need both?