Backup#
For backups I need a solution that backups the files within a persistend volume. I've taken a look at velero and the included backup solution from Longhorn but wasn't able to get what I want. Longhorn seems to be doing block based backups and with Velero I wasn't able to create file backups. Althoug I do not exclude that I've missed something.
At the moment I'm evaluating restic as a sidecar container. The sidecar container uses crontab to create backups in regular intervals. The container mounts the crontab config and backup script from configmaps and reads the environment variables from a secret, which configure the backup.
Backups with restic provide:
- ✅ ... encryption
- ✅ ... compression
- ✅ ... deduplication
- ✅ ... retention policies
So I can upload my backup to any storage backend without leaking any data. And because of deduplication and compression a lot of storage space can be saved. Included retention policies make it easy to delete old backups.
Script Features:
- Backup any folder to a restic supported storage backend
- Delete old backups (Daily, Weekly, Monthly, Always Keep Last)
- ntfy.sh notification on failure
- prometheus pushgateway metrics
Info
The script runs restic and uploads the backup to the storage backend. Additionally it deletes old backups. It get's the configuration from environment variables.
crontab configuration | |
---|---|
I won't include the backup script here, because it can change over time. The script is available in my git repository as a ConfigMap. The ConfigMap get's replicated via reflector to all namespaces so all deployments can use the same script. But after changes to the script the pods using the script need to be restarted. This can be done by deleting the pods or by rolling out a new deployment.
Notifications#
To get notified if a backup fails I'm using ntfy. Ntfy is a simple notification service that can be self-hosted.
Pushgateway Integration#
Additionaly the backup script supports integration with a Prometheus Pushgateway to send custom metrics about the backup process. This enables tracking of backup duration, start time, and status.
Configuration#
To enable metrics pushing to the Pushgateway, the following environment variables should be configured:
PUSHGATEWAY_ENABLED
: Set this to"true"
to enable sending metrics to the PushgatewayPUSHGATEWAY_URL
: Specify the URL of the Pushgateway server where metrics should be sent to
Metrics Published#
Warning
The metrics might change in the future. Currently I'm not realy satisfied. But maybe that's because I wasn't able to create a good grafana dashboard with them yet. I would appreciate any help. See issue #3
The script publishes the following metrics to the Pushgateway:
backup_duration_seconds
: The time, in seconds, that the backup process tookbackup_start_timestamp
: The timestamp in epoch at which the backup process beganbackup_status
: The status of the backup process, with eitherstatus="success"
orstatus="failure"
Example Environment Configuration:
📝 Environment Variables#
The following environment variables are used to configure the backup script.
Environment Variable | Default | Description |
---|---|---|
RESTIC_SOURCE | Unset | Source directory to back up using Restic |
RESTIC_REPOSITORY | Unset | Destination repository for the backup |
RESTIC_PASSWORD | Unset | Password for encrypting the backup |
RESTIC_HOSTNAME | $(hostname | cut -d '-' -f1) | Optional. Hostname to use for the backup. Defaults to the pod name. Especially usefull for pods with host networking. |
AWS_ACCESS_KEY_ID | Unset | Access key ID for authenticating with an S3 compatible storage backend |
AWS_SECRET_ACCESS_KEY | Unset | Secret access key for authenticating with an S3 compatible storage backend |
RESTIC_RETENTION_POLICIES_ENABLED | true | Optional. Enable or disable retention policies |
KEEP_HOURLY | 24 | Optional. Number of hourly backups to retain |
KEEP_DAILY | 7 | Optional. Number of daily backups to keep |
KEEP_WEEKLY | 4 | Optional. Number of weekly backups to maintain |
KEEP_MONTHLY | 12 | Optional. Number of monthly backups to keep. Not implemented yet. |
KEEP_YEARLY | 0 | Optional. Number of yearly backups to keep. Not implemented yet. |
KEEP_LAST | 1 | Optional. Total number of most recent backups to keep, irrespective of time-based intervals |
NTFY_ENABLED | false | Optional. Indicates whether notification via ntfy is enabled. Possible values are "true" or "false" |
NTFY_TITLE | ${RESTIC_HOSTNAME - Backup failed} | Optional. Title of the ntfy notification message. Can be a string or shell command |
NTFY_CREDS | Unset | Optional. Credentials for authenticating with the ntfy notification service. Needs to include the -u option |
NTFY_PRIO | 4 | Optional. Priority level for the ntfy notification. Determines the importance of the notification |
NTFY_TAG | bangbang | Optional. Tags to categorize the ntfy notification, allowing filtering or grouping of messages |
NTFY_SERVER | ntfy.sh | Optional. URL of the ntfy server used for sending notifications |
NTFY_TOPIC | backup | Optional. Topic on the ntfy server where the message will be sent to |
PUSHGATEWAY_ENABLED | false | Optional. Indicates whether sending metrics to the Pushgateway is enabled. Possible values are "true" or "false" |
PUSHGATEWAY_URL | Unset | Optional. URL of the Pushgateway server for sending metrics |
rclone#
At some point I was also evaluating rclone as a sidecar container. But it doesn't support de-duplication and I want my storage costs to be as low as possible. For history reasons I keep the script that I've written and mounted in the container.