CloudNativePG VolumeSnapshot Retention Policy Enforcement

September 27, 2024

I’ve been using CloudNativePG to run Postgres clusters for small and non-production workloads for some time now. It proved to be cost-effective and easy to set up and maintain.

One of the great features of CloudNativePG is that it makes it easy to set up proper backups. It supports two methods for creating physical base backups. For me the preferred method has been VolumeSnapshot, as I think it should be for deployments using volumes backed by CSI drivers that support snapshots.

RetentionPolicy determines when backups and WALs should be deleted (e.g. 7d). However, retention policy isn’t enforced for the snapshots when using VolumeSnapshot method:

It’s currently only applicable when using the BarmanObjectStore method.

This means that old snapshots won’t be deleted automatically, so we need to do it ourselves. This can be done in many different ways, but the easiest is to simply add a CronJob that will find and delete the expired backups.

Here are the manifests that can be used to set up a cron job that deletes the backups that are older than the configured retention:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
 name: snapshot-cleanup
rules:
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshots"]
    verbs: ["list", "get", "delete"]
  - apiGroups: ["postgresql.cnpg.io"]
    resources: ["backups"]
    verbs: ["list", "get", "delete"]

---

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
 name: snapshot-cleanup
subjects:
  - kind: ServiceAccount
    name: snapshot-cleanup
roleRef:
  kind: Role
  name: snapshot-cleanup
  apiGroup: rbac.authorization.k8s.io

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: snapshot-cleanup

apiVersion: v1
kind: ConfigMap
metadata:
  name: snapshot-cleanup-script
data:
  cleanup.sh: |
    #!/usr/bin/env bash

    set -eou pipefail;

    delete_before() {
      local obj_kind=$1
      local cluster=$2
      local interval=$3

      kubectl get $obj_kind \
        --selector=cnpg.io/cluster=$cluster \
        -o go-template \
        --template '{{ range .items }}{{ .metadata.name }} {{ .metadata.creationTimestamp }}{{ "\n" }}{{ end }}' \
          | awk '$2 <= "'$(date --date="now - $interval day" -I'seconds' -u | sed 's/+00:00/Z/')'" { print $1 }' \
          | xargs --no-run-if-empty kubectl delete $obj_kind
    }

    delete_before backup $CNPG_CLUSTER_NAME $RETENTION_IN_DAYS
    delete_before volumesnapshot $CNPG_CLUSTER_NAME $RETENTION_IN_DAYS

apiVersion: batch/v1
kind: CronJob
metadata:
  name: snapshot-cleanup
spec:
  # you can set this to some time after the backup is likely to have been completed
  schedule: "0 5 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: snapshot-cleanup
          containers:
            - name: snapshot-cleanup
              image: "bitnami/kubectl:1.31"
              env:
                CNPG_CLUSTER_NAME: example
                RETENTION_IN_DAYS: "14"
              command:
                - bash
                - /opt/cleanup.sh
              volumeMounts:
                - name: snapshot-cleanup-script
                  mountPath: /opt/cleanup.sh
                  subPath: cleanup.sh
          volumes:
            - name: snapshot-cleanup-script
              configMap:
                name: snapshot-cleanup-script

You can create a library helm chart or package the cleanup script as a container image to make it easier to share code between multiple projects.

Alternatively you can use a controller like Kubernetes Janitor to delete the objects, or even build a custom controller specifically for this purpose, but I don’t think it’s worth the effort since I see this as a temporary solution and expect CNPG to support retention policy enforcement out of the box at some point.

If you've come this far with the article you may want to know a thing or two about me if you don't already. You can also read other blog posts or about stuff I've learned recently.

This website is open source. If you've come across a mistake please let me know there. For other types of feedback you can reach out to me through email or social media.