Alerting Configuration

Purpose

The alerting configuration subtopic focuses on setting up and managing automated alerts for the ShapeShift Unchained platform’s infrastructure and services. It addresses the need to promptly detect and notify operators about critical and warning conditions affecting blockchain node daemons, indexers, API services, and Kubernetes resources. By integrating Alertmanager with Discord notifications, this configuration ensures that alerts are grouped, routed, and delivered effectively to appropriate channels, enabling efficient incident response and system reliability.

Functionality

This configuration defines how Prometheus alerts are handled, grouped, inhibited, and routed to different Discord channels based on severity and environment. The key functionalities include:

Key Configuration Elements

Integration

This alerting configuration complements the overall Prometheus & Grafana monitoring setup by handling the notification aspect of observability. While Prometheus collects metrics and evaluates alerting rules (defined in `rules.json`), Alertmanager processes these alerts with this configuration to ensure the right stakeholders are informed.

It integrates tightly with:

This setup enhances operational visibility by bridging metric collection with actionable notifications, ensuring alerts are meaningful, timely, and directed to appropriate teams.

Code Snippets Illustrating Core Interactions

Alert Routing Example (from config.yaml)

routes:
  - receiver: "discord_critical"
    group_by: ["alertname", "namespace", "statefulset"]
    group_wait: 5m
    group_interval: 30m
    repeat_interval: 1h
    matchers:
      - alertname = "UnchainedStatefulSetDown"
      - namespace = "unchained"
      - severity = "critical"

This route sends critical stateful set down alerts in production to the `discord_critical` receiver, grouping alerts by alert name, namespace, and stateful set, with specified notification intervals.

Discord Notification Template (from discord.tmpl)

{{ define "discord.title" }}
Unchained Alert {{ .Status | title }}: {{ .GroupLabels.alertname }}
{{ end }}

{{ define "discord.message" }}
{{ range .Alerts }}
**{{ .Labels.severity | toUpper }}**

**Alert:** {{ .Annotations.summary }}
**Description:** {{ .Annotations.description }}

**Details:**
{{ range .Labels.SortedPairs }}- {{ .Name }}: {{ .Value }}
{{ end }}
{{ end }}
{{ end }}

This template formats alert titles and messages sent to Discord channels, clearly showing status, severity, summary, description, and key labels for context.

Alert Rule Example (from rules.json)

{
  "alert": "UnchainedStatefulSetDown",
  "annotations": {
    "summary": "Unchained stateful set is currently down",
    "description": "Service {{ $labels.statefulset }} has been down for more than 15 minutes"
  },
  "expr": "kube_statefulset_status_replicas_available == 0",
  "for": "15m",
  "labels": {
    "severity": "critical"
  }
}

This Prometheus alert rule triggers a critical alert if a Kubernetes stateful set has zero available replicas for 15 minutes.

Diagram

flowchart TD
    Prometheus[Prometheus] -->|Evaluates Alert Rules| Alertmanager[Alertmanager]
    Alertmanager -->|Applies Routing & Inhibition| Router[Routing Logic]
    Router -->|Sends Notification| DiscordCritical[Discord Critical Channel]
    Router -->|Sends Notification| DiscordWarning[Discord Warning Channel]
    Router -->|Sends Notification| DiscordDev[Discord Dev Channel]
    Alertmanager -->|Uses Templates| TemplateEngine[Message Templates]
    Prometheus -->|Scrapes Metrics| Kubernetes[Kubernetes & Services]

    classDef prod fill:#f96,stroke:#333,stroke-width:1px;
    classDef dev fill:#bbf,stroke:#333,stroke-width:1px;

    DiscordCritical:::prod
    DiscordWarning:::prod
    DiscordDev:::dev

This flowchart illustrates how Prometheus sends alerts to Alertmanager, which applies routing and inhibition rules, formats messages with templates, and dispatches notifications to appropriate Discord channels based on alert severity and environment.