← All Posts
2026-03-10-8 minprometheusvictoriametricspythonopen-source

Turning Prometheus Label Values Into Metrics You Can Alert On

The Problem

Kubernetes exposes version information as labels on info metrics. kube_node_info has a minor label with the kubelet version, major for the major version, os_image for the OS, kernel_version for the kernel. The metric value is always 1. It's metadata.

I needed to track kubelet version drift across nodes and clusters. Specifically: alert when the spread between the lowest and highest minor version exceeds a threshold, graph per-node versions over time, and find nodes running outdated versions after a rolling upgrade.

None of that is possible when the version is a string label.

Why PromQL Can't Do This

The version data exists in VictoriaMetrics. It's right there in the labels. But PromQL treats labels as strings for filtering and grouping. There's no function to take a label value, convert it to a number, and use it as the metric value.

label_replace: Works on labels, not values. label_replace(kube_node_info, "ver", "$1", "minor", "(.*)") creates a new label. The metric value is still 1.

Recording rules: Store the result of a PromQL expression. But any expression on an info metric still returns 1. You can't write expr: extract_label_as_value(kube_node_info, "minor") because no such function exists.

Relabeling (metric_relabel_configs): Operates at scrape time on labels. Can drop, rename, or filter labels. Can't turn a label into a metric value.

Count tricks: You can detect that drift exists:

count(count by (minor) (kube_node_info)) > 1

This fires when there are multiple unique minor versions across nodes. But it can't tell you the spread is 2 versions vs 5 versions. minor="28" is a string. You can't do max(minor) - min(minor) because PromQL doesn't do arithmetic on label values.

This is the fundamental gap: PromQL has no mechanism to convert a string label value into a numeric metric value. Info metrics store useful data in labels, but that data is invisible to numeric operations like <, >, graphing, and alerting thresholds.

The Exporter

I built a Python exporter that bridges this gap outside of PromQL. It queries VictoriaMetrics for a source metric, reads specified label values, and exposes them as new Prometheus metrics with the label value as the metric value.

The transformations are defined in YAML:

victoriametrics_url: "http://vmselect:8481/select/0/prometheus"
scrape_interval: 300
exporter_port: 8000

metric_transformations:
  - name: "kubelet_minor_version"
    description: "Minor version of kubelet extracted from kube_node_info"
    source_metric: "kube_node_info"
    metric_type: "gauge"
    value_from_label: "minor"
    preserve_labels:
      - cluster
      - node
      - instance
    default_value: 0

This takes kube_node_info{minor="28", node="worker-1", cluster="prod"} (value: 1) and produces kubelet_minor_version{node="worker-1", cluster="prod"} (value: 28).

Now you can do everything PromQL is good at:

# Alert when version spread exceeds 2 minor versions
max(kubelet_minor_version) - min(kubelet_minor_version) > 2

# Find nodes below a specific version
kubelet_minor_version < 28

# Graph per-node version over time
kubelet_minor_version{cluster="prod"}

How It Works

The exporter runs a scrape loop on a configurable interval:

  1. For each transformation in the YAML config, it queries VictoriaMetrics for the source metric
  2. For each returned series, it reads the specified label (value_from_label) and converts it to a number
  3. It preserves the labels you specify (preserve_labels) from the source metric
  4. It exposes the result as a standard Prometheus gauge on /metrics

Each transformation can also include a PromQL filter to narrow the source query:

metric_transformations:
  # Worker nodes only (kubelet endpoints)
  - name: "kubelet_version_workers"
    source_metric: "kubernetes_build_info"
    metric_type: "gauge"
    value_from_label: "minor"
    preserve_labels:
      - cluster
      - instance
    query_filter: 'instance=~".*:10250"'

  # Master nodes only (API server endpoints)
  - name: "kubelet_version_masters"
    source_metric: "kubernetes_build_info"
    metric_type: "gauge"
    value_from_label: "minor"
    preserve_labels:
      - cluster
      - instance
    query_filter: 'instance=~".*:443"'

Hot Reload

The exporter watches its config file for changes using a filesystem watcher with debouncing. In Kubernetes, mount the config as a ConfigMap. When you update the ConfigMap, the exporter detects the change, validates the new config, rebuilds its metric registry, and starts using the new transformations. No restart needed.

There's also a /reload HTTP endpoint for manual triggers and a /health endpoint for liveness probes.

Stale Metric Cleanup

In autoscaled clusters, nodes come and go. When a node is terminated, its metrics should disappear. By default, the exporter clears all metric labels for a transformation before each scrape cycle and repopulates from the current source data. Nodes that no longer exist in the source simply don't get recreated.

This is configurable per transformation with clear_stale_metrics: false if you want to keep historical data.

Beyond Kubelet Versions

I built this for one specific use case, but the design is generic. Any info-type metric where useful data lives in labels is a candidate:

metric_transformations:
  # OS information per node
  - name: "node_os_info"
    source_metric: "kube_node_info"
    metric_type: "info"
    value_from_label: "os_image"
    preserve_labels:
      - cluster
      - node
      - kernel_version

  # Container runtime version
  - name: "container_runtime_version"
    source_metric: "kube_node_info"
    metric_type: "gauge"
    value_from_label: "container_runtime_version"
    preserve_labels:
      - cluster
      - node
    skip_empty_labels: true

Any metric where you've thought "I wish I could graph/alert on that label value" is a use case.

Limitations

This exporter is not polished production software. It works for the cases I've used it for, but it won't handle every edge case:

  • Only works with label values that can be converted to numbers (for gauge type). String values work with the info metric type but those can't be used in numeric alerts.
  • The query runs against the VictoriaMetrics API on a fixed interval, not in real-time. The scrape interval determines how fresh the derived metrics are.
  • No support for counter or histogram types yet.
  • Error handling is basic. If VictoriaMetrics is temporarily unavailable, the exporter logs the error and retries on the next cycle.

It's a starting point. If you've hit the same "label value should be a metric value" problem, contributions are welcome.

What's Next

The current implementation is in Python and works for my use cases. I'm planning to rewrite it in Go for better performance, lower resource footprint, and easier distribution as a single binary. The Go version will also be the foundation for a proper open-source release with broader metric type support and better error handling.

If you've hit the same "label value should be a metric value" problem, keep an eye on my GitHub for the release.