- Search for similar issues mentioned in this page as well as the following troubleshooting sections.
- Contact OpenEBS Community for support.
- Search for similar issues on OpenEBS GitHub repository.
- Search for any reported issues on StackOverflow under OpenEBS tag.
Sometime it is observed that iscsiadm is continuously fails and repeats rapidly and for some reason this causes the memory consumption of kubelet to grow until the node goes out-of-memory and needs to be rebooted. Following type of error can be observed in journalctl and cstor-istgt container.
cstor-istgt container logs
The cause of high memory consumption of kubelet is mainly due to the following.
There are 3 modules are involved - cstor-istgt, kubelet and iscsiInitiator(iscsiadm). kubelet runs iscsiadm command to do discovery on cstor-istgt. If there is any delay in receiving response of discovery opcode (either due to network or delay in processing on target side), iscsiadm retries few times, and, gets into infinite loop dumping error messages as below:
kubelet keeps taking this response and accumulates the memory. More details can be seen here.
Restart the corresponding istgt pod to avoid memory consumption.
This is caused due to lack of resources on the Kubernetes nodes, which causes the pods to evict under loaded conditions as the node becomes unresponsive. The pods transition from Running state to unknown state followed by Terminating before restarting again.
The above cause can be confirmed from the
kubectl describe pod which displays the termination reason as NodeControllerEviction. You can get more information from the kube-controller-manager.log on the Kubernetes master.
You can resolve this issue by upgrading the Kubernetes cluster infrastructure resources (Memory, CPU).Others
Setup the cluster using RKE with openSUSE CaaS MicroOS using CNI Plugin Cilium. Install OpenEBS, create a PVC and allocate to a fio job/ busybox. Run FIO test on the same. Observed nodes in the cluster getting restarted on a schedule basis.
Check journalctl logs of each nodes and check if similar logs are observed. In the following log snippets, showing the corresponding logs of 3 nodes.
You can get more details to see if the cause of reboot is due to transactional update using below command outputs.
There are 2 possible solutions.
DO the following on each nodes to stop the transactional update.
This is the preferred approach.
Set the reboot timer schedule at different time i.e staggered at various interval of the day, so that only one nodes get rebooted at a time.
Troubleshooting Install Troubleshooting Uninstall Troubleshooting NDM Troubleshooting Jiva Troubleshooting cStor Troubleshooting Local PV Troubleshooting Mayastor FAQs Seek support or help Latest release notes