Troubleshooting - Replicated Storage (a.k.a Replicated Engine or Mayastor)
Logs#
The correct set of log file to collect depends on the nature of the problem. If unsure, then it is best to collect log files for all Replicated Storage containers. In nearly every case, the logs of all of the control plane component pods will be needed.
- csi-controller
 - core-agent
 - rest
 - msp-operator
 
List all Replicated Storage Pods
Example Output
Replicated Storage Pod Log File#
Replicated Storage containers form the data plane of a Replicated Storage deployment. A cluster should schedule as many Replicated Storage container instances as required storage nodes have been defined. This log file is most useful when troubleshooting I/O errors however, provisioning and management operations might also fail because of a problem on a storage node.
Example obtaining Replicated Storage\'s Log
CSI Agent Pod Log File#
If experiencing problems with (un)mounting a volume on an application node, this log file can be useful. Generally all worker nodes in the cluster will be configured to schedule a Replicated Storage CSI agent pod, so it's good to know which specific node is experiencing the issue and inspect the log file only for that node.
Example obtaining Replicated Storage CSI driver\'s Log
CSI Sidecars#
These containers implement the CSI spec for Kubernetes and run within the same pods as the csi-controller and mayastor-csi (node plugin) containers. Whilst they are not part of Replicated Storage's code, they can contain useful information when a Replicated Storage CSI controller/node plugin fails to register with k8s cluster.
Obtaining CSI Control Containers Logs
Example obtaining CSI Node Container Log
Coredumps#
A coredump is a snapshot of process memory combined with auxiliary information (PID, state of registers, etc.) and saved to a file. It is used for post-mortem analysis and it is generated automatically by the operating system in case of a severe, unrecoverable error (i.e. memory corruption) causing the process to panic. Using a coredump for a problem analysis requires deep knowledge of program internals and is usually done only by developers. However, there is a very useful piece of information that users can retrieve from it and this information alone can often identify the root cause of the problem. That is the stack (backtrace) - a record of the last action that the program was performing at the time when it crashed. Here we describe how to get it. The steps as shown apply specifically to Ubuntu, other linux distros might employ variations.
We rely on systemd-coredump that saves and manages coredumps on the system, coredumpctl utility that is part of the same package and finally the gdb debugger.
Install systemd-coredump and gdb
If installed correctly then the global core pattern will be set so that all generated coredumps will be piped to the systemd-coredump binary.
Verify Coredump Configuration
Example Output
List Coredumps
Example Output
If there is a new coredump from the Replicated Storage container, the coredump alone cannot be that useful. GDB needs to access the binary of crashed process in order to be able to print at least some information in the backtrace. For that, we need to copy the contents of the container's filesystem to the host.
Get ID of the Replicated Storage Container
Example Output
Copy Relevant Parts of the Container\'s fs
Now we can start GDB. Don't use the coredumpctl command for starting the debugger. It invokes GDB with invalid path to the debugged binary hence stack unwinding fails for Rust functions. At first we extract the compressed coredump.
Find Location of the Compressed Coredump
Example Output
Extract the Coredump
Open Coredump in GDB
Example Output
Once in GDB we need to set a sysroot so that GDB knows where to find the binary for the debugged program.
Set sysroot in GDB
Example Output
After that we can print backtrace(s).
Obtain Backtraces for all Threads in GDB
Example Output
Diskpool Behaviour#
The below behaviour may be encountered while uprading from older releases to Replicated Storage 2.4 release and above.
Get Dsp#
Running kubectl get dsp -n openebs could result in the error due to the v1alpha1 or v1beta1 schema in the discovery cache. To resolve this, run the command kubectl get diskpools.openebs.io -n openebs. After this kubectl discovery cache will be updated with v1beta2 object for dsp. 
Create API#
When creating a Disk Pool with kubectl create -f dsp.yaml, you might encounter an error related to v1alpha1 or v1beta1 CR definitions. To resolve this, ensure your CR definition is updated to v1beta2 in the YAML file (for example, apiVersion: openebs.io/v1beta2).
note
You can validate the schema changes by executing kubectl get crd diskpools.openebs.io.
Known Limitations#
Volume and Pool Capacity Expansion#
Once provisioned, neither Replicated Storage Disk Pools nor Replicated Storage Volumes can be re-sized. A Replicated Storage Pool can have only a single block device as a member. Replicated Storage Volumes are exclusively thick-provisioned.
Snapshots and Clones#
Replicated Storage currently supports provisioning snapshots and clones on volumes with only one replica.
Volumes are "Highly Durable" but without multipathing are not "Highly Available"#
Replicated Storage Volumes can be configured (or subsequently re-configured) to be composed of 2 or more "children" or "replicas"; causing synchronously mirrored copies of the volumes's data to be maintained on more than one worker node and Disk Pool. This contributes additional "durability" at the persistence layer, ensuring that viable copies of a volume's data remain even if a Disk Pool device is lost.
A Replicated Storage volume is currently accessible to an application only via a single target instance (NVMe-oF) of a single Replicated Storage pod. However, if that Replicated Storage pod ceases to run (through the loss of the worker node on which it's scheduled, execution failure, crashloopbackoff etc.) the HA switch-over module detects the failure and moves the target to a healthy worker node to ensure I/O continuity.
Known Issues#
Installation Issues#
An IO engine pod restarts unexpectedly with exit code 132 whilst mounting a PVC#
The Mayastor process has been sent the SIGILL signal as the result of attempting to execute an illegal instruction. This indicates that the host node's CPU does not satisfy the prerequisite instruction set level for Replicated Storage (SSE4.2 on x86-64).
Deploying Replicated Storage on RKE and Fedora CoreOS#
In addition to ensuring that the general prerequisites for installation are met, it is necessary to add the following directory mapping to the services_kublet->extra_binds section of the cluster'scluster.yml file.
If this is not done, CSI socket paths won't match expected values and the Replicated Storage CSI driver registration process will fail, resulting in the inability to provision Replicated Storage volumes on the cluster.
Other Issues#
Replicated Storage pod may restart if a pool disk is inaccessible#
If the disk device used by a Replicated Storage pool becomes inaccessible or enters the offline state, the hosting Replicated Storage pod may panic. A fix for this behaviour is under investigation.
Lengthy worker node reboot times#
When rebooting a node that runs applications mounting Replicated Storage volumes, this can take tens of minutes. The reason is the long default NVMe controller timeout (ctrl_loss_tmo). The solution is to follow the best k8s practices and cordon the node ensuring there aren't any application pods running on it before the reboot. Setting ioTimeout storage class parameter can be used to fine-tune the timeout.
Node restarts on scheduling an application#
Deploying an application pod on a worker node which hosts Replicated Storage and Prometheus exporter causes that node to restart.
The issue originated because of a kernel bug. Once the nexus disconnects, the entries under /host/sys/class/hwmon/ should get removed, which does not happen in this case(The issue was fixed via this kernel patch).
Workaround
Use kernel version 5.13 or later if deploying Replicated Storage in conjunction with the Prometheus metrics exporter.
Unable to mount xfs File System#
The volume is created, but xfs is failing to mount.
Workaround
If you are trying to use xfs volumes and the cluster node hosts are running a kernel version less than 5.10, you may encounter a mount failure of the filesystem. This is due to the incompatibility of newer xfsprogs options. In order to alleviate this issue, it is recommended to upgrade the host node kernel version to 5.10 or higher.
Go to top
io-engine Fails to Start Due to IOVA Allocation Error#
When the io-engine fails to start with the error message couldn't allocate memory due to IOVA exceeding limits of current DMA mask it is likely that the host node has IOMMU enabled.
Workaround
Configure the io-engine to use physical address (PA) mode for DMA by setting the following Helm parameter during Replicated PV Mayastor installation: