Best Practice for Deleting PODs in Terminating State
As necessary, you may want to forcefully delete an OpenShift pod to quickly remove a non-responsive or problematic pod from the system. Forcefully deleting an OpenShift pod can lead to a mixed user experience, especially for those new to the OpenShift world. On one hand, force-deleting a pod can be a quick solution to remove a stuck or unresponsive pod, allowing the system to recover and redeploy necessary workloads. However, it can also create a sense of instability or unpredictability, as force deletion bypasses the usual graceful shutdown process, potentially leading to data loss or disruption of services. Additionally, if the pod is part of a larger application or service, ripple effects could impact other components, leading to a more significant degradation of the overall user experience.
Problem
You are trying to delete a pod but it’s taking a long time to complete and is stuck in Terminating
state.
$ oc get pod -owide
NAME READY STATUS RESTARTS AGE NODE
dc-main-56f4b97d87 1/1 Running 0 20h node-1
ccs-post-install-job 0/1 Completed 0 18h node-2
asset-files-api-787 0/1 Terminating 0 7d node-3
In this example: If you check the asset-files-api-787 pod’s yaml output, the metadata
-> finalizers
section is missing the value or status displays an error.
Root Cause
Pods can hang in a terminating state because of finalizers running or the running resource has not exited. Normally, these actions conclude independently, as their purpose is to clean up after the pod.
Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the node indefinitely. If you look at the overview of Kubernetes architecture that helps to understand the situation. Using the apiserver
OpenShift sends the force deletion request to the individual worker node. On the node, kubelet
agent is responsible for managing the container workloads. Kubelet is used CRI-O
to communicate with the lower-level runtime of container processes. When force is used, OpenShift immediately deletes the pod reference without waiting for the kubelet
and CRI-O
to complete their operations.
It is advisable to check which node the pod and containers are running (oc get pod -o wide
) and ensure the containers on that node have successfully terminated.
Resolution
- Use one of the following workarounds to delete a
Terminating
pod from OpenShift level:
— Remove finalizers (if pod stuck in finalizers): $ oc patch pod <pod name> -p '{"metadata":{"finalizers":null}}'
For example:$ oc patch pod asset-files-api-787 -p '{"metadata":{"finalizers":null}}'
— Try adding --grace-period=0
to oc delete
:$ oc delete pod <pod name> --grace-period=0
For example:$ oc delete pod asset-files-api-787 --grace-period=0
— Or try adding --force
along with --grace-period
:$ oc delete pod <pod name> --grace-period=0 --force
For example:$ oc delete pod asset-files-api-787 --grace-period=0 --force
Warning: Immediate deletion does not wait for confirmation that the running
resource has been terminated. The resource may continue to run on the cluster
indefinitely. Pod "asset-files-api-787" force deleted.
2. If force is used, clean the corresponding resource from the CRI-O level:
a) Check inside the respective node for any leftover PID from the deleted pod.$ oc debug node/<node name> - chroot /host /bin/bash -c 'crictl pods | \
For example:
grep <pod name>'
$ oc debug node/node-3 - chroot /host /bin/bash -c 'crictl pods | \
grep asset-files-api-787'
POD ID CREATED STATE NAME NAMESPACE
f84dd361f8dc5 17 minutes ago Ready asset-files-api-787 zen
b) When a residual POD ID is identified on the node, a restart of the crio service is needed as a corrective measure. Post-restart the crio service; verify its successful return to a fully operational state. This proactive approach helps to ensure the node’s container runtime environment remains stable and efficient.$ ssh core@<node name>
> sudo -i
> systemctl restart crio
> systemctl status crio
Forcefully deletion of an OpenShift pod is a powerful tool, but one that should be used with caution and an understanding of the potential consequences.