Troubleshooting OpenEBS - Overview
#
General guidelines for troubleshooting- Search for similar issues mentioned in this page as well as the following troubleshooting sections.
- Contact OpenEBS Community for support.
- Search for similar issues on OpenEBS GitHub repository.
- Search for any reported issues on StackOverflow under OpenEBS tag.
#
Kubernetes relatedKubernetes node reboots because of increase in memory consumed by Kubelet
Application and OpenEBS pods terminate/restart under heavy I/O load
#
OthersNodes in the cluster reboots frequently almost everyday in openSUSE CaaS
Kubernetes related#
Kubernetes node reboots because of increase in memory consumed by KubeletSometime it is observed that iscsiadm is continuously fails and repeats rapidly and for some reason this causes the memory consumption of kubelet to grow until the node goes out-of-memory and needs to be rebooted. Following type of error can be observed in journalctl and cstor-istgt container.
journalctl logs
cstor-istgt container logs
Troubleshooting
The cause of high memory consumption of kubelet is mainly due to the following.
There are 3 modules are involved - cstor-istgt, kubelet and iscsiInitiator(iscsiadm). kubelet runs iscsiadm command to do discovery on cstor-istgt. If there is any delay in receiving response of discovery opcode (either due to network or delay in processing on target side), iscsiadm retries few times, and, gets into infinite loop dumping error messages as below:
kubelet keeps taking this response and accumulates the memory. More details can be seen here.
Workaround
Restart the corresponding istgt pod to avoid memory consumption.
#
Application and OpenEBS pods terminate/restart under heavy I/O loadThis is caused due to lack of resources on the Kubernetes nodes, which causes the pods to evict under loaded conditions as the node becomes unresponsive. The pods transition from Running state to unknown state followed by Terminating before restarting again.
Troubleshooting
The above cause can be confirmed from the kubectl describe pod
which displays the termination reason as NodeControllerEviction. You can get more information from the kube-controller-manager.log on the Kubernetes master.
Workaround:
You can resolve this issue by upgrading the Kubernetes cluster infrastructure resources (Memory, CPU).
Others#
Nodes in the cluster reboots frequently almost everyday in openSUSE CaaSSetup the cluster using RKE with openSUSE CaaS MicroOS using CNI Plugin Cilium. Install OpenEBS, create a PVC and allocate to a fio job/ busybox. Run FIO test on the same. Observed nodes in the cluster getting restarted on a schedule basis.
Troubleshooting
Check journalctl logs of each nodes and check if similar logs are observed. In the following log snippets, showing the corresponding logs of 3 nodes.
Node1:
Node2:
Node3:
You can get more details to see if the cause of reboot is due to transactional update using below command outputs.
Workaround:
There are 2 possible solutions.
Approach1:
DO the following on each nodes to stop the transactional update.
This is the preferred approach.
Approach2:
Set the reboot timer schedule at different time i.e staggered at various interval of the day, so that only one nodes get rebooted at a time.
#
See Also:Troubleshooting Install Troubleshooting Uninstall Troubleshooting NDM Troubleshooting Jiva Troubleshooting cStor Troubleshooting Local PV Troubleshooting Mayastor FAQs Seek support or help Latest release notes