Alerting Configuration
Purpose
The alerting configuration subtopic focuses on setting up and managing automated alerts for the ShapeShift Unchained platform’s infrastructure and services. It addresses the need to promptly detect and notify operators about critical and warning conditions affecting blockchain node daemons, indexers, API services, and Kubernetes resources. By integrating Alertmanager with Discord notifications, this configuration ensures that alerts are grouped, routed, and delivered effectively to appropriate channels, enabling efficient incident response and system reliability.
Functionality
This configuration defines how Prometheus alerts are handled, grouped, inhibited, and routed to different Discord channels based on severity and environment. The key functionalities include:
Alert Routing: Alerts are classified and routed to receivers corresponding to critical, warning, or development-level notifications. For example, critical issues in the production namespace are sent to the
discord_criticalreceiver, while warnings are sent todiscord_warning.Grouping and Deduplication: Alerts with similar labels (such as
alertnameandnamespace) are grouped to avoid notification flooding. Group wait, group interval, and repeat interval settings control the timing of alert notifications.Inhibition Rules: Certain alerts suppress others to reduce noise. For instance, critical alerts inhibit warnings and info alerts for the same namespace and alert name, preventing redundant notifications.
Discord Notification Templates: Custom templates format alert messages for Discord with clear titles and detailed descriptions including severity, summary, and labels to provide context.
Environment-Specific Handling: Separate routes and receivers handle alerts for production (
unchained) and development (unchained-dev) namespaces with distinct timing and grouping parameters.
Key Configuration Elements
Route Configuration:
Routes specify matching criteria for alerts (e.g., alert name, namespace, severity) and assign them to specific Discord receivers. Each route also defines grouping and timing parameters to control notification behavior.Receivers:
Defined receivers use Discord webhook URLs to send notifications, each tailored for critical, warning, or development alerts.Inhibit Rules:
Manage alert suppression logic to avoid alert storms by inhibiting lower severity alerts when higher severity alerts are active for the same issue.Templates:
Thediscord.tmplfile uses Go templating to generate human-readable messages for Discord, enhancing alert clarity.
Integration
This alerting configuration complements the overall Prometheus & Grafana monitoring setup by handling the notification aspect of observability. While Prometheus collects metrics and evaluates alerting rules (defined in `rules.json`), Alertmanager processes these alerts with this configuration to ensure the right stakeholders are informed.
It integrates tightly with:
Metrics Collection: Alerts trigger based on Prometheus metrics collected from Kubernetes resources and blockchain services. For example, alerts like
UnchainedStatefulSetDownrely on Kubernetes replica availability metrics.Prometheus Alerting Rules: The alert definitions in
rules.jsongenerate alert events that Alertmanager routes according to this configuration.Discord Channels: Using webhook URLs, notifications are sent directly to Discord, facilitating team awareness and quick response without requiring direct access to the monitoring system.
This setup enhances operational visibility by bridging metric collection with actionable notifications, ensuring alerts are meaningful, timely, and directed to appropriate teams.
Code Snippets Illustrating Core Interactions
Alert Routing Example (from config.yaml)
routes:
- receiver: "discord_critical"
group_by: ["alertname", "namespace", "statefulset"]
group_wait: 5m
group_interval: 30m
repeat_interval: 1h
matchers:
- alertname = "UnchainedStatefulSetDown"
- namespace = "unchained"
- severity = "critical"
This route sends critical stateful set down alerts in production to the `discord_critical` receiver, grouping alerts by alert name, namespace, and stateful set, with specified notification intervals.
Discord Notification Template (from discord.tmpl)
{{ define "discord.title" }}
Unchained Alert {{ .Status | title }}: {{ .GroupLabels.alertname }}
{{ end }}
{{ define "discord.message" }}
{{ range .Alerts }}
**{{ .Labels.severity | toUpper }}**
**Alert:** {{ .Annotations.summary }}
**Description:** {{ .Annotations.description }}
**Details:**
{{ range .Labels.SortedPairs }}- {{ .Name }}: {{ .Value }}
{{ end }}
{{ end }}
{{ end }}
This template formats alert titles and messages sent to Discord channels, clearly showing status, severity, summary, description, and key labels for context.
Alert Rule Example (from rules.json)
{
"alert": "UnchainedStatefulSetDown",
"annotations": {
"summary": "Unchained stateful set is currently down",
"description": "Service {{ $labels.statefulset }} has been down for more than 15 minutes"
},
"expr": "kube_statefulset_status_replicas_available == 0",
"for": "15m",
"labels": {
"severity": "critical"
}
}
This Prometheus alert rule triggers a critical alert if a Kubernetes stateful set has zero available replicas for 15 minutes.
Diagram
flowchart TD
Prometheus[Prometheus] -->|Evaluates Alert Rules| Alertmanager[Alertmanager]
Alertmanager -->|Applies Routing & Inhibition| Router[Routing Logic]
Router -->|Sends Notification| DiscordCritical[Discord Critical Channel]
Router -->|Sends Notification| DiscordWarning[Discord Warning Channel]
Router -->|Sends Notification| DiscordDev[Discord Dev Channel]
Alertmanager -->|Uses Templates| TemplateEngine[Message Templates]
Prometheus -->|Scrapes Metrics| Kubernetes[Kubernetes & Services]
classDef prod fill:#f96,stroke:#333,stroke-width:1px;
classDef dev fill:#bbf,stroke:#333,stroke-width:1px;
DiscordCritical:::prod
DiscordWarning:::prod
DiscordDev:::dev
This flowchart illustrates how Prometheus sends alerts to Alertmanager, which applies routing and inhibition rules, formats messages with templates, and dispatches notifications to appropriate Discord channels based on alert severity and environment.