Replicated PV Mayastor Installation on Amazon Elastic Kubernetes Service with Instance Store Volumes
This document provides instructions for installing Replicated PV Mayastor on Amazon Elastic Kubernetes Service (EKS) with Instance Store Volumes. Replicated PV Mayastor is designed to work with Amazon EKS with Instance Store Volumes. It provides high-performance storage for stateful applications running in a Kubernetes environment.
Using OpenEBS Replicated PV Mayastor in EKS with Instance store volumes addresses many of the limitations associated with the ephemeral nature of local SSDs by introducing a layer of persistent storage management. Here's how OpenEBS helps mitigate these limitations:
Data Persistence through Replication
OpenEBS abstracts the underlying storage (Instance store volumes) into a set of persistent volumes. Even if local SSDs are inherently ephemeral, OpenEBS can ensure data persistence by replicating the data across multiple worker nodes.
For instance, using OpenEBS with replication (Example: 2 or 3 replicas), ensures that even if one node fails or is terminated, the data exists on other nodes, avoiding data loss.
Without OpenEBS, if a node with Instance store volumes is terminated, all the data is lost. OpenEBS ensures the data is replicated to other nodes, so even if the original node is lost, the data persists elsewhere.
In AWS Elastic Kubernetes Service (EKS), when provisioning worker nodes with instance store volumes, the storage provided by these SSDs is ephemeral due to the following reasons:
- Instance Store Characteristics
- Physical Attachment to Hardware: In AWS EC2 instances, instance store volumes are physically attached to the underlying hardware (i.e., the host machine running the instance). This configuration provides high-speed access but is inherently non-persistent.
- Data Loss on Instance Stop or Termination: When an EC2 instance is stopped, terminated, or fails, the instance store is automatically wiped, resulting in the loss of any data on those volumes. This is a design of instance store volumes and applicable across all AWS services, including EKS.
- Ephemeral Storage by Design
Local SSDs in EC2 instances are designed for temporary storage of data that does not need to persist beyond the instance's lifecycle, such as caches, scratch data, or intermediary results.
In a Kubernetes environment, this storage type is typically used for temporary logs, caches, or data that can be readily recreated. However, instance store volumes are unsuitable for long-term or critical data storage because the data will be lost if the node is replaced or terminated.
Kubernetes Dynamic Scheduling
In EKS (or any Kubernetes environment), worker nodes can be dynamically scaled up or down, and nodes can be replaced when they fail. If an EKS node using Instance store volumes is replaced, the new node will not have access to the data stored on the local SSD of the previous node.
Pod Displacement and Node Termination
In Kubernetes, pods can be scheduled on any node in the cluster. When a node is terminated or fails, pods may be rescheduled on another node. Any data stored on the terminated node’s instance store SSDs is lost, leading to potential data loss unless the data is persisted using an external storage solution like EBS or S3.
Use Case Limitations in EKS
Although local SSDs offer high performance and low-latency storage suitable for temporary data such as logs or caching, they are not designed for persistent storage needs within EKS.
Prerequisites#
Before installing Replicated PV Mayastor, make sure that you meet the following requirements:
Hardware Requirements
Your machine type must meet the requirements defined in the prerequisites.
EKS Nodes
You need to configure launch templates to create node groups with hardware/OS/kernel requirements. When using the synchronous replication feature (N-way mirroring), the number of worker nodes on which IO engine pods are deployed should be no less than the desired replication factor.
Additional Disks
Additional storage disks for nodes can be added during the cluster creation using launch templates. Each Instance store volumes disk comes in a fixed size. The number of disks that you can attach to a VM depends on the VM's machine type. In this guide, we are using m5d.4xlarge machine type.
Ports
Ensure security groups are having the OpenEBS ports allowed. Refer to the Replicated PV Mayastor Installation Documentation for more details.
Enable Huge Pages
2MiB-sized Huge Pages must be supported and enabled on the storage nodes i.e. nodes where IO engine pods are deployed. A minimum number of 1024 such pages (i.e. 2GiB total) must be available exclusively to the IO engine pod on each node. Secure Socket Shell (SSH) to the EKS worker node to enable huge pages.
Kernel Modules
SSH to the EKS worker nodes to load nvme_tcp modules.
Preparing the Cluster
Refer to the Replicated PV Mayastor Installation Documentation for instructions on preparing the cluster.
Install Replicated PV Mayastor on Amazon EKS#
- Refer to the OpenEBS Installation Documentation to install Replicated PV Mayastor using Helm.
- Refer to the Amazon EKS User Guide to install Amazon EBS CSI driver add-on during the cluster creation.
note
EKS storage class should be used for ETCD and LOKI.
- Helm Install Command
Command
Output
Helm Install Command
Command
Output
Command
Output
Pools#
The available local SSD disks on worker nodes can be viewed by using the kubectl-mayastor plugin.
Command
Output
It is highly recommended to specify the disk using a unique device link that remains unaltered across node reboots. Examples of such device links are: by-path or by-id.
Sample pool.yaml
Available Disk Pools
Command
Output
Configuration#
- Refer to the Replicated PV Mayastor Configuration Documentation for instructions regarding StorageClass creation.
Replicated PV Mayastor dynamically provisions PersistentVolumes (PVs) based on StorageClass definitions created. Parameters of the definition are used to set the characteristics and behaviour of its associated PVs. We have created a storage class with three replication as below.
Command
Output
Deployment#
Refer to the Deploy an Application Documentation for instructions regarding PVC creation and deploying an application.
If all verification steps in the preceding stages were satisfied, then Replicated PV Mayastor has been successfully deployed within the cluster. In order to verify basic functionality, we will now dynamically provision a Persistent Volume based on a Replicated PV Mayastor StorageClass.
Use
kubectlto create a PVC based on a StorageClass that you created. In the example shown below, we will consider that StorageClass to have been named "mayastor-3". Replace the value of the field "storageClassName" with the name of your own Replicated PV Mayastor-based StorageClass.
Command
Output
- We have created a mongo application as below with pvc mongo-data.
Command
Output
Random data in Mongo
Node Failure Scenario#
EKS worker nodes are part of Managed Instance groups, if a node failed for some reasons during reboot or any other scenario, a new node gets created with new Instance store Disk. In that case, you have to recreate the pool with a new name. Once the new pool got created “OpenEBS Replicated Storage Mayastor” will take care of rebuilding the volume with the replicated data.
important
When a node is replaced with a new one, all node labels and huge page configurations are removed. You have to do this configuration once again on the new node.
Example
From the below example: the node ip-10-0-1-222.ec2.internal is failed and we got a new node/disks which caused the pool-on-node-3 to be in unknown status and the Replicated PV Mayastor volume would be in degraded status as one of the replica is down.
Command
Output
We have re-configured the node labels/hugepages and loaded nvme_tcp modules on the new node. Also, a new pool has been created with it with named pool-on-node-4.
Command
Output
Once the pool got created, the degraded volume is back online after the rebuild.
Command
Output
Command
Output
Command
Output
- Replicated PV Mayastor Rebuild History
Command
Output
Meanwhile, Kubernetes moved the pod to the next available node as the node where the pod was scheduled was failed. We see the data still available without any issues.
Command
Output