← All Projects

Alert Pipeline

Custom severity model with alert correlation, suppression, and automated on-call routing

500+

Alert Rules

3

Severity Levels

Fleet

Deployed via

vmalertAlertManagerOpsGenieJenkinsRancher FleetPython

Architecture

The Problem

Alert rules were managed individually with no correlation between severity levels. Engineers had to write and maintain separate rules for each severity, leading to inconsistency and gaps in coverage.

Template System

Designed a template system where a single alert definition includes all three severity levels - critical, warning, and low. A Jenkins job transforms these templates into proper vmalert rules, grouping related severities together for consistent management.

Correlation and Suppression

AlertManager inhibition rules correlate alerts by name, so when a critical fires it suppresses the warning and low for the same alert. This prevents alert storms and ensures on-call only sees the highest severity.

Silence Manager

Built a Python service that checks an external API for operator business state. If an operator has no live agents, the silence manager automatically creates silences in AlertManager. Prevents false pages for environments that are intentionally inactive.

Deployment

Alert rules live on a dedicated Git branch with CI validation. Jenkins validates the templates, transforms them, and Fleet deploys the generated rules to vmalert across all clusters automatically.

Deep dive