Distributed Storage#
Longhorn#
Longhorn provides the distributed block storage layer for this cluster. The Helm release, default settings, and supporting resources are managed declaratively in apps/longhorn/kustomization.yaml, so all configuration changes should flow through GitOps.
Accessing the Longhorn UI#
- Dashboard:
https://storage.internal.neese-web.de(published through Traefik with Authelia forward-auth). - In-cluster service:
http://longhorn-frontend.longhorn.svc.cluster.local. - Use the UI for day-to-day administration (volume health, replicas, backups, recurring jobs) and to confirm Git-driven changes after syncs.
Default Storage Classes and PVC Usage#
- The
longhornStorageClassis marked as the default via a sync-time patch, so workloads can omitstorageClassNameunless an alternative is required. - The reclaim policy is set to
Retain. PVCs and underlying volumes are kept after a workload is deleted to prevent accidental data loss; clean up volumes explicitly in the UI or viakubectl delete volume.longhorn.io/<name>. - Typical PVC:
- Validate provisioning with
kubectl get pvcandkubectl get volumes.longhorn.io -n longhorn.
Backups, Snapshots, and Automation#
- Longhorn stores backups in Wasabi (
s3://k3s-at-home-01@eu-central-2/); credentials live in thelonghorn/wasabi-secretand are managed withsops. - CSI snapshots use the
VolumeSnapshotClassnamedlonghorn-vsc-backup. CreateVolumeSnapshotobjects against that class to trigger Longhorn backups that Velero can consume. - A recurring job named
trim-filesystemruns daily at 06:00 to perform filesystem trim on all volumes assigned to thedefaultgroup. Adjust or add jobs through Git inapps/longhorn/longhorn-config.enc.yaml.
Operational Guidance#
- Follow the controlled upgrade/maintenance steps captured in
docs/operations/longhorn-maintenance.mdwhen performing Longhorn upgrades or disruptive changes. - Monitor Longhorn health dashboards exposed through Grafana (PVC
longhorn-pvc-grafana) to detect replica rebuilds or disk pressure early.
Troubleshooting#
- Volume still attached to a node: Longhorn enforces exclusive attachment. If a pod restart reports
Volume is already exclusively attached to one node, wait for the detachment timeout or force detach the volume (UI → Volume →Detach), ensuring the old workload is fully terminated first. - Replica rebuild loops: Check node disk usage and engine replica logs in the UI; if disk capacity is low, cordon the node, migrate workloads, and expand storage.
- Backup failures: Verify Wasabi connectivity (
Settings → General → Test backup target) and confirm thewasabi-secretmatches the remote credentials.