Custom Services

Built 5+ Python services to fill gaps where off-the-shelf tools didn't exist: exporters, automation, and data collection

PythonPrometheusKubernetesDockerRancher FleetVictoriaLogs

Architecture

Why Custom

The monitoring stack had gaps that no existing tool covered. Business state awareness, DNS zone health, dynamic metric generation from label values, automated health discovery across hundreds of namespaces. Each gap needed its own service.

Exporters

Health Check Exporter - auto-discovers services via Kubernetes API, runs concurrent checks, pushes structured results to VictoriaLogs. The data source for the entire Status Dashboard platform
DNS Health Exporter - reads Cloud DNS zones and exposes health check status per zone as Prometheus metrics
Dynamic Metrics Exporter - generates new metrics from label values of existing metrics via YAML config with hot-reload. Example: extracting Kubernetes node version labels into version tracking metrics
Environment Status Exporter - scrapes a reporting API to expose which environments are live, active, or in demo mode. Feeds into alerting priority and dashboard filtering

Automation

Silence Manager - checks environment business state, auto-creates AlertManager silences for non-live environments. Handles ~40% of environments at any time

Pattern

All services follow the same pattern: containerized with Docker, deployed via Rancher Fleet, output consumed by VictoriaMetrics or VictoriaLogs. Each solves one specific problem that couldn't be solved with configuration alone.

Deep dive

Turning Prometheus Label Values Into Metrics You Can Alert On→