Kubernetes Error Codes: CrashLoopBackOff

Kubernetes Error Codes: CrashLoopBackOff

{{text-cta}}

DevOps professionals value Kubernetes for its functionality in containerized environments, but those benefits come with complexity. Correctly configuring your Kubernetes infrastructure can be challenging, and crashes and other issues at the pod level can be common. 

Kubernetes generates unique error codes when certain problems arise. These codes, which are stored via Kubernetes’ logging systems, give administrators critical context around pod failures. Additionally, Kubernetes can write these issues to locations accessible to monitoring tools or dashboards so that it’s easier to investigate fatal events after they occur. You might know these codes as termination messages or exit codes. In the latter case, it’s common for Kubernetes to assign an exit code a value between 0 and 255. Each value corresponds to various internal and external issues. 

Error codes make it much easier to debug application-level problems and avoid them moving forward. Because they tell you exactly where to probe, debugging is both faster and more precise. One error code worth highlighting is `CrashLoopBackOff`, which signifies faulty pod-level behavior. Learning about this code and why it’s needed can help you better understand how your Kubernetes deployment is functioning. This article will explain the `CrashLoopBackOff` problem and how you can solve it. 

What Is CrashLoopBackOff? 

`CrashLoopBackOff` is an event that can occur when a single pod crashes repeatedly and cyclically. For example, it’s possible for pods to crash and go offline, restart, and crash again. 

Why does this happen? A pod without a command or predefined purpose might immediately exit upon starting up; Kubernetes’s goal is to maintain the state, so it will simply restart the pod. This can stem from misconfiguration. Alternatively, continual application crashes within a container or Kubernetes deployment errors can trigger the error. That might appear like this following a `kubectl get pods` command


NAME                                    READY   STATUS             RESTARTS   AGE
pod-crashloopbackoff-7f7c556bf5-9vc89   1/2     CrashLoopBackOff   35         2h

Your pod definition configuration files can play a big role here. It’s relatively easy to misconfigure specification fields related to resource limits, commands, ports, or images. For example, you might accidentally tell two containers to use the same port—even though this isn’t technically feasible. In another case, misordering your definition executions can cause problems if one fails. Updates to Kubernetes or your container images can trigger `CrashLoopBackOff`. Even missing runtime dependencies like secrets might cause hiccups if your pods rely on API tokens for authentication.

`CrashLoopBackOff` is thus useful for highlighting cases where pods crash, restart, and crash repeatedly. The error message describes a pod in an unsteady state, which means it’s important to troubleshoot and fix this problem. Following are the steps to do so:

Step One: Getting Your Pods

Before you begin solving the error, it’s useful to gather some information. Check your pods via the `kubectl get pods` command, which will tell you exactly how many times your pod restarted. This helps highlight the severity and the longevity of the crash loop(s) at hand and will reveal which pods need remediation. 

Step Two: Describing Your Problem Pods

You’ll want to fetch information about your trouble pod and really start digging into its runtime conditions—including those surrounding its failure. Executing the `kubectl describe pod [pod name]` command (inserting your pod name without the brackets) will summon a lengthy output within your CLI:


Name:               pod-crashloopbackoff-7f7c556bf5-9vc89
Namespace:          dev-k8sbot-test-pods
Priority:           0
PriorityClassName:  >none<
Node:               gke-gar-3-pool-1-9781becc-bdb3/10.128.15.216
Start Time:         Tue, 12 Feb 2019 15:11:54 -0800
Labels:             app=pod-crashloopbackoff
                    pod-template-hash=3937112691
Annotations:        >none<
Status:             Running
IP:                 10.44.46.8
Controlled By:      ReplicaSet/pod-crashloopbackoff-7f7c556bf5
Containers:
  im-crashing:
    Container ID:  docker://a3ba2841f39414390b6cbd85fe94932a0f50c2698e68c34d52a5b23cfe73094c
    Image:         ubuntu:18.04
    Image ID:      docker-pullable://ubuntu@sha256:7a47ccc3bbe8a451b500d2b53104868b46d60ee8f5b35a24b41a86077c650210
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /bin/bash
      -ec
      echo 'hello, there...'; sleep 1; echo 'hello, there...'; sleep 1; echo 'exiting with status 0'; exit 1;
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 12 Feb 2019 15:12:47 -0800
      Finished:     Tue, 12 Feb 2019 15:12:49 -0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 12 Feb 2019 15:12:28 -0800
      Finished:     Tue, 12 Feb 2019 15:12:30 -0800
    Ready:          False
    Restart Count:  2
    Environment:    >none<
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-csrjs (ro)
  good-container:
    Container ID:   docker://00d634023be399358d9496d557e2cb7501cc5c52ac360d5809c74d4ca3a3b96c
    Image:          gcr.io/google_containers/echoserver:1.0
    Image ID:       docker-pullable://gcr.io/google_containers/echoserver@sha256:6240c350bb622e33473b07ece769b78087f4a96b01f4851eab99a4088567cb76
Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 12 Feb 2019 15:12:27 -0800
    Ready:          True
    Restart Count:  0
    Environment:    >none<
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-csrjs (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-csrjs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-csrjs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  >none<
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

You’ll also see key pod events attached within that same output. It’s broken up here for readability:


Events:
  Type     Reason     Age               From                                     Message
  ----     ------     ----              ----                                     -------
  Normal   Scheduled  56s               default-scheduler                        Successfully assigned dev-k8sbot-test-pods/pod-crashloopbackoff-7f7c556bf5-9vc89 to gke-gar-3-pool-1-9781becc-bdb3
  Normal   Pulling    50s               kubelet, gke-gar-3-pool-1-9781becc-bdb3  pulling image "gcr.io/google_containers/echoserver:1.0"
  Normal   Created    24s               kubelet, gke-gar-3-pool-1-9781becc-bdb3  Created container
  Normal   Pulled     24s               kubelet, gke-gar-3-pool-1-9781becc-bdb3  Successfully pulled image "gcr.io/google_containers/echoserver:1.0"
  Normal   Started    23s               kubelet, gke-gar-3-pool-1-9781becc-bdb3  Started container
  Normal   Pulling    4s (x3 over 55s)  kubelet, gke-gar-3-pool-1-9781becc-bdb3  pulling image "ubuntu:18.04"
  Normal   Created    3s (x3 over 50s)  kubelet, gke-gar-3-pool-1-9781becc-bdb3  Created container
  Normal   Started    3s (x3 over 50s)  kubelet, gke-gar-3-pool-1-9781becc-bdb3  Started container
  Normal   Pulled     3s (x3 over 51s)  kubelet, gke-gar-3-pool-1-9781becc-bdb3  Successfully pulled image "ubuntu:18.04"
  Warning  BackOff    1s (x2 over 19s)  kubelet, gke-gar-3-pool-1-9781becc-bdb3  Back-off restarting failed container

These events are important since they reveal how Kubernetes is behaving. You specifically want to search for anything related to `BackOff`, because this may signal to crashing and failed restarts. This `BackOff` state doesn’t occur right away, however. Such an event won’t be logged until Kubernetes attempts container restarts maybe three, five, or even ten times. This indicates that containers are exiting in a faulty fashion and that pods aren’t running as they should be. The event warning message will likely confirm this by displaying `Back-off restarting failed container`. Getting your pods repeatedly will also show increases in the `Restart` counter. 

Gathering these details is essential to sound troubleshooting; because otherwise, you’re just navigating randomly through your system. That approach is time-consuming and much less targeted than it could be. 

{{text-cta}}

Step Three: Checking Logs

*Log checking* is a fantastic way to perform a retrospective analysis of your Kubernetes deployment. These records are organized and human-readable. You can do this easily with the following command:


kubectl logs [pod-name]

Your resulting output may reveal that your pod is exiting, which is a hallmark sign of the `CrashLoopBackOff` condition. Kubernetes will also associate this exit event with a numerical error code. This acts as a status and gives you clues as to why a container is experiencing crash loop issues. 

For example, an exit status of “0” means that the container exited normally. An exit code ranging from 1 to 128 would show an exit stemming from internal signals. These are the situations you’re *most* interested in. Looking back to the reasons behind `CrashLoopBackOff`, the configurations you make within Kubernetes and its internal dependencies are highly impactful. 

It’s also possible to beam your logs to an external tool for inspection. This might offer visualizations that are clearer, better organized, and easier to understand. Logs can tell you exactly *where* a problem took place and at what time, plus make it easier to draw connections between crashes and infrastructure states. 

Step Four: Checking the Liveness Probe

Finally, the `liveness` probe can cause crashes when successful statuses aren’t returned. You’ll have to use the `kubectl describe pod` command again to search for any noteworthy events. You’ll want to scan for instances where the `liveness probe failed` message is apparent. When this is visible, the odds are good that your `liveness` probe has either failed or is misconfigured. 

This probe is key for one critical reason: the kubelet uses it to determine container restart behaviors. The probe might fail to register containers and run applications effectively.

Conclusion

While `CrashLoopBackOff` isn’t something you’d necessarily want to see, the error isn’t earth-shattering. You can follow the above steps to drill down into your deployment and uncover the root of your container issues. From there, you can make adjustments to your configuration files and take corrective action. 

Surprisingly, the `CrashLoopBackOff` status isn’t always negative; it can be harnessed for monitoring purposes. By setting your `restartPolicy` to `always`, you’ll ensure that logs and other information are promptly collected when failure strikes, and you won’t be searching in the dark for answers.

Learn from Nana, AWS Hero & CNCF Ambassador, how to enforce K8s best practices with Datree

Watch Now

🍿 Techworld with Nana: How to enforce Kubernetes best practices and prevent misconfigurations from reaching production. Watch now.

Reveal misconfigurations within minutes

3 Quick Steps to Get Started