cStor User Guide - Advanced
This user guide of cStor contains advanced level of cStor related topics such as expanding a cStor volume, taking Snapshot and Clone of a cStor volume, scaling up cStor pools, Block Device Tagging, Tuning cStor Pools and Tuning cStor Volumes
- Scaling up cStor pools
- Snapshot and Clone of a cStor volume
- Expanding a cStor volume
- Block Device Tagging
- Tuning cStor Pools
- Tuning cStor Volumes
#
Scaling cStor poolsOnce the cStor storage pools are created you can scale-up your existing cStor pool. To scale-up the pool size, you need to edit the CSPC YAML that was used for creation of CStorPoolCluster.
Scaling up can done by two methods:
Note: The dataRaidGroupType: can either be set as stripe or mirror as per your requirement. In the following example it is configured as stripe.
#
Adding new nodes(with new disks) to the existing CSPCA new node spec needs to be added to previously deployed YAML,
Now verify the status of CSPC and CSPI(s):
Sample Output:
Sample Output:
As a result of this, we can see that a new pool have been added, increasing the number of pools to 4
#
Adding new disks to existing nodesA new blockDeviceName
under blockDevices
needs to be added to previously deployed YAML. Execute the following command to edit the CSPC,
Sample YAML:
#
Snapshot and Clone of a cStor VolumeAn OpenEBS snapshot is a set of reference markers for data at a particular point in time. A snapshot act as a detailed table of contents, with accessible copies of data that user can roll back to the required point of instance. Snapshots in OpenEBS are instantaneous and are managed through kubectl.
During the installation of OpenEBS, a snapshot-controller and a snapshot-provisioner are setup which assist in taking the snapshots. During the snapshot creation, snapshot-controller creates VolumeSnapshot and VolumeSnapshotData custom resources. A snapshot-provisioner is used to restore a snapshot as a new Persistent Volume(PV) via dynamic provisioning.
#
Creating a cStor volume SnapshotBefore proceeding to create a cStor volume snapshot and use it further for restoration, it is necessary to create a
VolumeSnapshotClass
. Copy the following YAML specification into a file calledsnapshot_class.yaml
.The deletion policy can be set as
Delete or Retain
. When it is set to Retain, the underlying physical snapshot on the storage cluster is retained even when the VolumeSnapshot object is deleted. To apply, execute:Note: In clusters that only install
v1beta1
version of VolumeSnapshotClass as the supported version(eg. OpenShift(OCP) 4.5 ), the following error might be encountered.In such cases, the apiVersion needs to be updated to
apiVersion: snapshot.storage.k8s.io/v1beta1
For creating the snapshot, you need to create a YAML specification and provide the required PVC name into it. The only prerequisite check is to be performed is to ensure that there is no stale entries of snapshot and snapshot data before creating a new snapshot. Copy the following YAML specification into a file called
snapshot.yaml
.Run the following command to create the snapshot,
To list the snapshots, execute:
Sample Output:
A VolumeSnapshot is analogous to a PVC and is associated with a
VolumeSnapshotContent
object that represents the actual snapshot. To identify the VolumeSnapshotContent object for the VolumeSnapshot execute:Sample Output:
The
SnapshotContentName
identifies theVolumeSnapshotContent
object which serves this snapshot. TheReady To Use
parameter indicates that the Snapshot has been created successfully and can be used to create a new PVC.
Note: All cStor snapshots should be created in the same namespace of source PVC.
#
Cloning a cStor SnapshotOnce the snapshot is created, you can use it to create a PVC. In order to restore a specific snapshot, you need to create a new PVC that refers to the snapshot. Below is an example of a YAML file that restores and creates a PVC from a snapshot.
The dataSource
shows that the PVC must be created using a VolumeSnapshot named cstor-pvc-snap
as the source of the data. This instructs cStor CSI to create a PVC from the snapshot. Once the PVC is created, it can be attached to a pod and used just like any other PVC.
To verify the creation of PVC execute:
Sample Output:
#
Expanding a cStor volumeOpenEBS cStor introduces support for expanding a PersistentVolume using the CSI provisioner. Provided cStor is configured to function as a CSI provisioner, you can expand PVs that have been created by cStor CSI Driver. This feature is supported with Kubernetes versions 1.16 and above.
For expanding a cStor PV, you must ensure the following items are taken care of:
- The StorageClass must support volume expansion. This can be done by editing the StorageClass definition to set the allowVolumeExpansion: true.
- To resize a PV, edit the PVC definition and update the spec.resources.requests.storage to reflect the newly desired size, which must be greater than the original size.
- The PV must be attached to a pod for it to be resized. There are two scenarios when resizing an cStor PV:
- If the PV is attached to a pod, cStor CSI driver expands the volume on the storage backend, re-scans the device and resizes the filesystem.
- When attempting to resize an unattached PV, cStor CSI driver expands the volume on the storage backend. Once the PVC is bound to a pod, the driver re-scans the device and resizes the filesystem. Kubernetes then updates the PVC size after the expansion operation has successfully completed.
Below example shows the way for expanding cStor volume and how it works. For an already existing StorageClass, you can edit the StorageClass to include the allowVolumeExpansion: true
parameter.
For example an application busybox pod is using the below PVC associated with PV. To get the status of the pod, execute:
The following is a Sample Output:
To list PVCs, execute:
Sample Output:
To list PVs, execute:
Sample Output:
To resize the PV that has been created from 5Gi to 10Gi, edit the PVC definition and update the spec.resources.requests.storage to 10Gi. It may take a few seconds to update the actual size in the PVC resource, wait for the updated capacity to reflect in PVC status (pvc.status.capacity.storage). It is internally a two step process for volumes containing a file system:
- Volume expansion
- FileSystem expansion
Now, we can validate the resize has worked correctly by checking the size of the PVC, PV, or describing the pvc to get all events.
Sample Output:
Sample Output:
#
Block Device TaggingNDM provides you with an ability to reserve block devices to be used for specific applications via adding tag(s) to your block device(s). This feature can be used by cStor operators to specify the block devices which should be consumed by cStor pools and conversely restrict anyone else from using those block devices. This helps in protecting against manual errors in specifying the block devices in the CSPC yaml by users.
- Consider the following block devices in a Kubernetes cluster, they will be used to provision a storage pool. List the labels added to these block devices,
Sample Output:
- Now, to understand how block device tagging works we will be adding
openebs.io/block-device-tag=fast
to the block device attached to worker-node-3 (i.e blockdevice-00439dc464b785256242113bf0ef64b9)
Sample Output:
Now, provision cStor pools using the following CSPC YAML. Note, openebs.io/allowed-bd-tags:
is set to cstor, ssd
which ensures the CSPC will be created using the block devices that either have the label set to cstor or ssd, or have no such label.
Apply the above CSPC file for CSPIs to get created and check the CSPI status.
Sample Output:
Note that CSPI for node worker-node-3 is not created because:
- CSPC YAML created above has
openebs.io/allowed-bd-tags: cstor, ssd
in its annotation. Which means that the CSPC operator will only consider those block devices for provisioning that either do not have a BD tag, openebs.io/block-device-tag, on the block device or have the tag with the values set ascstor or ssd
. - In this case, the blockdevice-022674b5f97f06195fe962a7a61fcb64 (on node worker-node-1) and blockdevice-241fb162b8d0eafc640ed89588a832df (on node worker-node-2) do not have the label. Hence, no restrictions are applied on it and they can be used as the CSPC operator for pool provisioning.
- For blockdevice-00439dc464b785256242113bf0ef64b9 (on node worker-node-3), the label
openebs.io/block-device-tag
has the value fast. But on the CSPC, the annotation openebs.io/allowed-bd-tags has value cstor and ssd. There is no fast keyword present in the annotation value and hence this block device cannot be used.
NOTE:
- To allow multiple tag values, the bd tag annotation can be written in the following comma-separated manner:
- BD tag can only have one value on the block device CR. For example,
- openebs.io/block-device-tag: fast Block devices should not be tagged in a comma-separated format. One of the reasons for this is, cStor allowed bd tag annotation takes comma-separated values and values like(i.e fast, ssd ) can never be interpreted as a single word in cStor and hence BDs tagged in above format cannot be utilised by cStor.
- If any block device mentioned in CSPC has an empty value for
the openebs.io/block-device-tag
, then those block devices will not be considered for pool provisioning and other operations. Block devices with empty tag value are implicitly not allowed by the CSPC operator.
#
Tuning cStor PoolsAllow users to set available performance tunings in cStor Pools based on their workload. cStor pool(s) can be tuned via CSPC and is the recommended way to do it. Below are the tunings that can be applied:
Resource requests and limits: This ensures high quality of service when applied for the pool manager containers.
Toleration for pool manager pod: This ensures scheduling of pool pods on the tainted nodes.
Set priority class: Sets the priority levels as required.
Compression: This helps in setting the compression for cStor pools.
ReadOnly threshold: Helps in specifying read only thresholds for cStor pools.
Example configuration for Resource and Limits:
Following CSPC YAML specifies resources and auxResources that will get applied to all pool manager pods for the CSPC. Resources get applied to cstor-pool containers and auxResources gets applied to sidecar containers i.e. cstor-pool-mgmt and pool-exporter.
In the following CSPC YAML we have only one pool spec (@spec.pools). It is also possible to override the resource and limit value for a specific pool.
Following CSPC YAML explains how the resource and limits can be overridden. If you look at the CSPC YAML, there are no resources and auxResources specified at pool level for worker-node-1 and worker-node-2 but specified for worker-node-3. In this case, for worker-node-1 and worker-node-2 the resources and auxResources will be applied from @spec.resources and @spec.auxResources respectively but for worker-node-3 these will be applied from @spec.pools[2].poolConfig.resources and @spec.pools[2].poolConfig.auxResources respectively.
Example configuration for Tolerations:
Tolerations are applied in a similar manner like resources and auxResources. The following is a sample CSPC YAML that has tolerations specified. For worker-node-1 and worker-node-2 tolerations are applied form @spec.tolerations but for worker-node-3 it is applied from @spec.pools[2].poolConfig.tolerations
Example configuration for Priority Class:
Priority Classes are also applied in a similar manner like resources and auxResources. The following is a sample CSPC YAML that has a priority class specified. For worker-node-1 and worker-node-2 priority classes are applied from @spec.priorityClassName but for worker-node-3 it is applied from @spec.pools[2].poolConfig.priorityClassName. Check more info about priorityclass.
Note:
Priority class needs to be created beforehand. In this case, high-priority and ultra-priority priority classes should exist.
The index starts from 0 for @.spec.pools list.
Example configuration for Compression:
Compression values can be set at pool level only. There is no override mechanism like it was there in case of tolerations, resources, auxResources and priorityClass. Compression value must be one of
- on
- off
- lzjb
- gzip
- gzip-[1-9]
- zle
- lz4
Note: lz4 is the default compression algorithm that is used if the compression field is left unspecified on the cspc. Below is the sample yaml which has compression specified.
Example configuration for Read Only Threshold:
RO threshold can be set in a similar manner like compression. ROThresholdLimit is the threshold(percentage base) limit for pool read only mode. If ROThresholdLimit (%) amount of pool storage is consumed then the pool will be set to readonly. If ROThresholdLimit is set to 100 then entire pool storage will be used. By default it will be set to 85% i.e when unspecified on the CSPC. ROThresholdLimit value will be 0 < ROThresholdLimit <= 100. Following CSPC yaml has the ReadOnly Threshold percentage specified.
#
Tuning cStor VolumesSimilar to tuning of the cStor Pool cluster, there are possible ways for tuning cStor volumes. cStor volumes can be provisioned using different policy configurations. However, cStorVolumePolicy
needs to be created first. It must be created prior to creation of StorageClass as CStorVolumePolicy
name needs to be specified to provision cStor volume based on configured policy. A sample StorageClass YAML that utilises cstorVolumePolicy
is given below for reference:
If the volume policy is not created before volume provisioning and needs to be modified later, it can be changed by editing the cStorVolumeConfig(CVC) resource as per volume bases which will be reconciled by the CVC controller to the respected volume resources. Each PVC creation request will create a CStorVolumeConfig(cvc) resource which can be used to manage volume, its policies and any supported operations (like, Scale up/down), per volume bases. To edit, execute:
Sample Output:
The list of policies that can be configured are as follows:
#
Replica Affinity to create a volume replica on specific poolFor StatefulSet applications, to distribute single replica volume on specific cStor pool we can use replicaAffinity enabled scheduling. This feature should be used with delay volume binding i.e. volumeBindingMode: WaitForFirstConsumer
in StorageClass. When volumeBindingMode
is set to WaitForFirstConsumer
the csi-provisioner waits for the scheduler to select a node. The topology of the selected node will then be set as the first entry in preferred list and will be used by the volume controller to create the volume replica on the cstor pool scheduled on preferred node.
The replicaAffinity
spec needs to be enabled via volume policy before provisioning the volume
#
Volume Target Pod AffinityThe Stateful workloads access the OpenEBS storage volume by connecting to the Volume Target Pod. Target Pod Affinity policy can be used to co-locate volume target pod on the same node as the workload. This feature makes use of the Kubernetes Pod Affinity feature that is dependent on the Pod labels.
For this labels need to be added to both, Application and volume Policy.
Given below is a sample YAML of CStorVolumePolicy
having target-affinity label using kubernetes.io/hostname
as a topologyKey in CStorVolumePolicy:
Set the label configured in volume policy, openebs.io/target-affinity: fio-cstor , on the app pod which will be used to find pods, by label, within the domain defined by topologyKey.
#
Volume TunablePerformance tunings based on the workload can be set using Volume Policy. The list of tunings that can be configured are given below:
- queueDepth:
This limits the ongoing IO count from iscsi client on Node to cStor target pod. The default value for this parameter is set at 32. - luworkers:
cStor target IO worker threads, sets the number of threads that are working on QueueDepth queue. The default value for this parameter is set at 6. In case of better number of cores and RAM, this value can be 16, which means 16 threads will be running for each volume. - zvolWorkers:
cStor volume replica IO worker threads, defaults to the number of cores on the machine. In case of better number of cores and RAM, this value can be 16.
Given below is a sample YAML that has the above parameters configured.
Note: These Policy tunable configurations can be changed for already provisioned volumes by editing the corresponding volume CStorVolumeConfig resources.
#
Memory and CPU Resources QoSCStorVolumePolicy can also be used to configure the volume Target pod resource requests and limits to ensure QoS. Given below is a sample YAML that configures the target container's resource requests and limits, and auxResources configuration for the sidecar containers.
To know more about Resource configuration in Kubernetes, click here.
Note: These resource configuration(s) can be changed, for provisioned volumes, by editing the CStorVolumeConfig resource on per volume level.
An example to patch an already existing CStorVolumeConfig
resource is given below,
Create a file, say patch-resources-cvc.yaml, that contains the changes and apply the patch on the resource.
To apply the patch,
#
Toleration for target pod to ensure scheduling of target pods on tainted nodesThis Kubernetes feature allows users to taint the node. This ensures no pods are be scheduled to it, unless a pod explicitly tolerates the taint. This Kubernetes feature can be used to reserve nodes for specific pods by adding labels to the desired node(s).
One such scenario where the above tunable can be used is: all the volume specific pods, to operate flawlessly, have to be scheduled on nodes that are reserved for storage.
Sample YAML:
#
Priority class for volume target deploymentPriority classes can help in controlling the Kubernetes schedulers decisions to favor higher priority pods over lower priority pods. The Kubernetes scheduler can even preempt lower priority pods that are running, so that pending higher priority pods can be scheduled. Setting pod priority also prevents lower priority workloads from impacting critical workloads in the cluster, especially in cases where the cluster starts to reach its resource capacity. To know more about PriorityClasses in Kubernetes, click here.
Note: Priority class needs to be created before volume provisioning.
Given below is a sample CStorVolumePolicy YAML which utilises priority class.