Monitoring¶
Overené podľa
monitoring/*asrc/app/observability/*dňa 2026-05-07.
Stack¶
Aktuálny monitoring profil (docker compose --profile monitoring) spúšťa:
- Prometheus
- Grafana
- Jaeger
flowchart LR
API["FastAPI /metrics"] --> Prom["Prometheus"]
W["Worker metrics"] --> Prom
Prom --> Graf["Grafana"]
API -. "OTLP" .-> J["Jaeger"]
W -. "OTLP" .-> J Spustenie¶
| Bash | |
|---|---|
Prístupy:
- Grafana:
http://localhost:3000 - Prometheus:
http://localhost:9090 - Jaeger:
http://localhost:16686
Kľúčové metriky (implementované)¶
| Metrika | Typ | Labely | Význam |
|---|---|---|---|
webhook_events_received_total | Counter | event_type | prijaté webhooky |
webhook_events_processed_total | Counter | event_type, status | výsledok spracovania |
webhook_processing_duration_seconds | Histogram | event_type | latencia spracovania |
gitlab_api_requests_total | Counter | endpoint, status | volania na GitLab API |
gitlab_api_duration_seconds | Histogram | endpoint | latencia GitLab API |
http_requests_total | Counter | method, path, status | HTTP request objem |
http_request_duration_seconds | Histogram | method, path | HTTP latencia |
rq_queue_depth | Gauge | queue_name | hĺbka queue |
rq_failed_jobs_total | Gauge | - | počet failed jobov |
rq_workers_active | Gauge | - | aktívni workeri |
metrics_compute_total | Counter | status | výpočty metrík |
metrics_compute_duration_seconds | Histogram | - | čas výpočtu metrík |
report_generation_duration_seconds | Histogram | format | čas generovania reportu |
db_pool_size | Gauge | - | veľkosť DB poolu |
db_pool_checked_out | Gauge | - | využité DB connections |
Grafana dashboard¶
Provisioned dashboard:
monitoring/grafana/dashboards/gitpulse.json
Hlavné panely:
- Webhook rate (received/processed)
- Processing latency (p50/p95/p99)
- Queue depth
- Success/failure rate
- DB pool utilization
- HTTP request rate
- GitLab API latency
- Report generation time
- Active workers
- Metrics compute duration
Alerty¶
Alert pravidlá: monitoring/prometheus/alerts.yml
| Alert | Podmienka |
|---|---|
QueueStuck | rq_queue_depth > 50 počas 5 min |
HighFailureRate | rate(webhook_events_processed_total{status="error"}[5m]) > 0.1 |
DatabaseDown | up{job="gitpulse-api"} == 0 |
HighProcessingLatency | p95 webhook latencia > 10 s |
WorkerDown | rq_workers_active == 0 |
DBPoolExhausted | db_pool_checked_out / db_pool_size > 0.9 |
HighHTTPErrorRate | podiel 5xx > 5% |
Operačný checklist¶
/healtha/metricsvracajú 200.http://localhost:9090/targetsmágitpulse-apiv stave UP.- Grafana datasource smeruje na
http://prometheus:9090. - Pri incidentoch korelujte
correlation_idmedzi logmi, metrikami a trace.