Claude Alert Analyzer#
Claude-powered alert analyzers that receive monitoring webhooks, gather context, send everything to an LLM for root-cause analysis, and publish results to ntfy. Part of the claude-alert-analyzer shared repository.
For full documentation, architecture, configuration, and security details, see the README in the analyzer repository.
Kubernetes Analyzer#
Receives Alertmanager webhooks and gathers cluster context (Prometheus metrics, K8s events, pod status, logs).
Architecture#
API Endpoints#
GET /health-- liveness/readiness probePOST /webhook-- Alertmanager webhook receiver (requiresAuthorization: Bearer <WEBHOOK_SECRET>)
Configuration#
All configuration is via environment variables.
Required (fail-closed)#
| Variable | Description |
|---|---|
WEBHOOK_SECRET | Shared secret for Alertmanager webhook authentication |
API_KEY | API key for the LLM provider (Anthropic or OpenRouter) |
Optional#
| Variable | Default | Description |
|---|---|---|
API_BASE_URL | https://api.anthropic.com/v1/messages | LLM API endpoint. Set to https://openrouter.ai/api/v1/messages for OpenRouter |
CLAUDE_MODEL | claude-sonnet-4-6 | Model name. For OpenRouter use anthropic/claude-sonnet-4-6 |
PORT | 8080 | HTTP server port |
PROMETHEUS_URL | http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090 | Prometheus query endpoint |
COOLDOWN_SECONDS | 300 | Per-alert cooldown to avoid duplicate analyses |
SKIP_RESOLVED | true | Skip resolved alerts |
ALLOWED_NAMESPACES | monitoring,databases,media | Comma-separated namespace allowlist for pod log collection |
MAX_LOG_BYTES | 2048 | Per-pod log truncation limit |
NTFY_PUBLISH_URL | https://ntfy.geekbundle.org | ntfy server URL |
NTFY_PUBLISH_TOPIC | kubernetes-analysis | ntfy topic |
NTFY_PUBLISH_TOKEN | (empty) | ntfy authentication token |
Provider-specific authentication#
The service detects the provider from API_BASE_URL:
- URL contains
anthropic.com: usesx-api-keyheader +anthropic-versionheader - All other URLs (OpenRouter, etc.): uses
Authorization: Bearerheader
Alert Processing Flow#
- Alertmanager sends webhook with one or more alerts
- Webhook secret is validated (401 if invalid)
- Resolved alerts are skipped (if
SKIP_RESOLVED=true) - Per-alert fingerprint cooldown is checked
- Alert is queued (bounded queue of 20, 5 concurrent workers)
- If queue is full, returns 503 (triggers Alertmanager retry)
- Context is gathered in parallel: - Prometheus metrics (firing alerts, namespace CPU/memory/restarts, alert-specific queries) - K8s events (Warning type, last 20) - Pod status (all pods in namespace) - Pod logs (only for allowlisted namespaces, redacted, truncated)
- LLM analyzes alert + context
- Result is published to ntfy with priority based on severity
- On failure (analysis or publish), cooldown is cleared to allow retry
Security#
- Scratch container image (no shell, no package manager)
- Runs as non-root (UID 65534)
- Read-only root filesystem
- All capabilities dropped
- Webhook authentication required (fail-closed)
- Pod logs only collected from allowlisted namespaces
- Secrets redacted from logs before sending to LLM (passwords, tokens, keys, emails, PEM keys)
- Log output truncated to
MAX_LOG_BYTES
CI/CD#
GitHub Actions workflow (.github/workflows/build-claude-alert-analyzer.yaml):
- Triggers on push to
mainwhenapps/claude-alert-analyzer/src/**,Dockerfile, or the workflow changes - Builds multi-stage Docker image (golang:1.26-alpine -> scratch)
- Pushes to
ghcr.io/madic-creates/claude-alert-analyzer:<short-sha> - Auto-commits updated image tag to
k8s.deployment.yaml - ArgoCD picks up the change and deploys
Alertmanager Configuration#
The claude-analyzer receiver is configured in apps/monitoring/values.enc.yaml:
Testing#
CheckMK Analyzer#
Receives CheckMK notification webhooks and analyzes Nagios-based monitoring alerts.
GitOps Deployment#
The analyzer is deployed in the monitoring namespace via ArgoCD (apps/claude-checkmk-analyzer/).
Prerequisites#
-
Encrypt secrets with real values:
Required values:
API_KEY,WEBHOOK_SECRET,CHECKMK_API_USER,CHECKMK_API_SECRET,NTFY_PUBLISH_TOKEN -
Populate known_hosts with monitored host keys:
-
Configure CheckMK notification rule:
- Go to Setup > Notifications > Add rule
- Notification method: Custom script
claude-analyzer-notify.sh - Parameter 1: Webhook URL (default:
http://claude-checkmk-analyzer.monitoring:8080/webhook) - Parameter 2: Webhook secret (must match
WEBHOOK_SECRETin the secret)