This blog is on the life cycle of pvc with a volume populator. Volume populator feature is in alpha in Kubernetes (v1.21 release). Creating a volume with base or seed data is one of the requirements in storage. In Kubernetes 1.12, the DataSource field is added to the PVC spec. The field was implemented as a TypedLocalObjectReference to give flexibility in the future about what objects could be data sources for new volumes. Since then, it supports only two things to be the source of a new volume - existing PVC(clone volume) and snapshots(restore a snapshot). Implementation of these two data sources relies on the CSI plugin to do the actual work. As it can point to any CR in Kubernetes validation on the DataSource field of PVC is relaxed and relies on a new async controller to perform the validation on those data sources and provide feedback to users. A new CRD is introduced to register valid dataSource, and we need to create a resource for individual volume populators to inform that the volume populator is a valid dataSource for a PVC.

In the first part we will see the volume lifecycle without a volume populator. In this process involved components are API server, etcd, CSI external provisioner and CSI controller plugin. We will not go into the CSI node plugin, we will stop as soon as PVC is in bound state. Let's understand the responsibility of some modules present here.

CSI external provisioner

This is a CSI driver sidecar component. It is a control loop on PVC. As clone and restore volumes come under CSI workflow if DataSource is present and it is PVC or snapshot then only it takes some actions for that PVC else it will ignore that PVC.

CSI controller plugin

This is a CSI driver sidecar component provided by the storage provider(SP). It is a gRPC server which implements CSI controller services. CSI external provisioner makes a CreateVolume gRPC call to it once new PVC is created.

  • User creates a pvc and it is written in etcd with the help of API server and the pvc goes to the pending state.
  • CSI external provisioner observes that pvc and makes CreateVolume call to the CSI controller plugin sidecar. On successful response it creates a PV for that PVC and applies bi-directional reference between them.
  • Once bi-directional reference is done, pvc goes to bound state.

Now we will see the volume lifecycle with the data populator. Involved components are API server, etcd, CSI external provisioner, CSI controller plugin and Volume Populator controller. We will not go into the CSI node plugin, we will stop as soon as PVC is in bound state. Let's understand the responsibility of some modules present here.

Volume Populator Controller

It is a control loop on PVC, responsible to manage PVC pointing to a particular dataSource. Namespace in which this controller is deployed is reserved for creating some intermediate PVC and pod to populate the data in the volume. If we create a PVC in that namespace for our application with a valid dataSource it will not work, this control loop will skip that PVC. This controller adds a finalizer to the initial PVC. It is written using lib volume populator. This is an example of a data populator written using lib volume populator. One new CRD Volume Populator is introduced. Like CSI drivers we have to register the volume populator.

  • User creates a PVC with dataSource pointing to a volume populator and it is written in etcd with the help of API server and the PVC goes to the pending state.
  • CSI external provisioner observes that PVC and takes no action as the dataSource is not pointing to a PVC or snapshot.
  • Volume populator controller is another watcher on pvc. It creates a PVC’ which is the same as the PVC created by the user without the dataSource field.
  • As PVC’ does not have a dataSource CSI external provisioner makes a CreateVolume gRPC call to the CSI controller plugin. Once the gRPC call is successful PV’ is created and bi-directional reference applied on PVC’ and PV’. Then PVC’ goes to the bound state.
  • Once PVC’ is in bound state, the volume populator controller creates a pod to populate the data for that PVC’.
  • Once data is populated it patches PVC and PV’ and applies a bi-directional reference between  them. After this PVC will be in bound state and PVC’ will be in lost state. Then the populator pod and PVC’ are deleted by this controller.

As mentioned above in core Kubernetes validation on the DataSource field is relaxed now we can create PVC pointing to any data source. To add validation around it one new asynchronous PVC controller is added. This controller is responsible for posting warning events if PVC is pointing to invalid DataSource. One new CRD of kind VolumePopulator and API version populator.storage.k8s.io/v1alpha1 is introduced to specify a valid volume populator.

Here is one valid data source -

kind: VolumePopulator
apiVersion: populator.storage.k8s.io/v1alpha1
metadata:
  name: valid-populator
sourceKind:
  group: valid.storage.k8s.io
  kind: Valid

Here is an example PVC with valid data source -

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: valid-pvc-datasource
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  dataSource:
    apiGroup: valid.storage.k8s.io
    kind: Valid
    name: valid

Here is an example of PVC with invalid data source -

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: invalid-pvc-datasource
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  dataSource:
    apiGroup: invalid.storage.k8s.io
    kind: Invalid
    name: invalid

Event in invalid data source PVC-

shovan@probot:~$ kubectl get event | grep Warning
58s         Warning   UnrecognizedDataSourceKind   persistentvolumeclaim/invalid-pvc-datasource   The data source for this PVC does not match any registered VolumePopulator
shovan@probot:~$ kubectl describe pvc invalid-pvc-datasource
Events:
  Type     Reason                      Age                  From                                                                    Message
  ----     ------                      ----                 ----                                                                    -------
  Normal   ExternalProvisioning        115s (x2 over 115s)  persistentvolume-controller                                             waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
  Normal   Provisioning                114s                 k8s.io/minikube-hostpath_minikube_e7c97d0c-eb52-42bc-af55-d8a167ab3987  External provisioner is provisioning volume for claim "default/invalid-pvc-datasource"
  Normal   ProvisioningSucceeded       113s                 k8s.io/minikube-hostpath_minikube_e7c97d0c-eb52-42bc-af55-d8a167ab3987  Successfully provisioned volume pvc-7df410d5-1efe-4d69-b481-57cbde9f10cd
  Warning  UnrecognizedDataSourceKind  33s (x6 over 114s)   data-source-validator                                                   The data source for this PVC does not match any registered VolumePopulator

Please join our community if you have any feedback or queries on the article or anything related to OpenEBS

Related Article:

A hands on demo of Volume Populator using OpenEBS LVM CSI driver
This blog illustrates a hands on demo to write a volume populator and create a volume with that populator using OpenEBS LVM CSI driver