I am getting CrashLoopBackOff Error for 1/4 pods, please guide me on how to troubleshoot this issue.
$kubectl get pod -n cog-prod01 -o wide
slotmachine-1688723297-5vlht 1/1 Running 0 21h 100.96.6.15 ip-172-21-61-42.compute.internal
slotmachine-1688723297-6plr9 1/1 Running 0 16h 100.96.13.16 ip-172-21-54-247.compute.internal
slotmachine-1688723297-k995t 1/1 Running 0 16h 100.96.11.186 ip-172-21-60-180.compute.internal
slotmachine-1688723297-sk8bn 0/1 CrashLoopBackOff 8 19m 100.96.2.72 ip-172-21-56-148.compute.internalKubelet logs on the node:
admin@ip-172-21-56-148:~$ journalctl -u kubelet -f
Jan 07 02:44:36 ip-172-21-56-148 kubelet[1568]: W0107 02:44:36.351880 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: W0107 02:44:46.372270 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.443776 1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b3a.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.443851 1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.592800 1568 kubelet.go:1917] SyncLoop (PLEG): "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)", event: &pleg.PodLifecycleEvent{ID:"2bc8665e-30f5-11ea-a92d-024aeca0bafc", Type:"ContainerStarted", Data:"5b2868d22c3e5453e57a58cba78cea4979a7da9a0864be2f29049d47d19fa41b"}
Jan 07 02:44:56 ip-172-21-56-148 kubelet[1568]: W0107 02:44:56.409374 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.669027 1568 kubelet.go:1917] SyncLoop (PLEG): "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)", event: &pleg.PodLifecycleEvent{ID:"2bc8665e-30f5-11ea-a92d-024aeca0bafc", Type:"ContainerDied", Data:"5b2868d22c3e5453e57a58cba78cea4979a7da9a0864be2f29049d47d19fa41b"}
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971547 1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b3aa.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971640 1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971770 1568 kuberuntime_manager.go:757] Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: E0107 02:45:00.971805 1568 pod_workers.go:182] Error syncing pod 2bc8665e-30f5-11ea-a92d-024aeca0bafc ("slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"), skipping: failed to "StartContainer" for "slotmachine" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:45:06 ip-172-21-56-148 kubelet[1568]: W0107 02:45:06.447068 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.149685 1568 status_manager.go:418] Status for pod "2bc8665e-30f5-11ea-a92d-024aeca0bafc" is up-to-date; skipping
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.443951 1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b35a.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.444070 1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.444198 1568 kuberuntime_manager.go:757] Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: E0107 02:45:12.444238 1568 pod_workers.go:182] Error syncing pod 2bc8665e-30f5-11ea-a92d-024aeca0bafc ("slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"), skipping: failed to "StartContainer" for "slotmachine" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:45:13 ip-172-21-56-148 kubelet[1568]: I0107 02:45:13.938976 1568 qos_container_manager_linux.go:286] [ContainerManager]: Updated QoS cgroup configuration
Jan 07 02:45:16 ip-172-21-56-148 kubelet[1568]: W0107 02:45:16.464693 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
admin@ip-172-21-43-86:~$ kubectl describe po -n cog-prod01 slotmachine-1688723297-sk8bn
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
27m 27m 1 default-scheduler Normal Scheduled Successfully assigned slotmachine-1688723297-sk8bn to ip-172-21-56-148.compute.internal
27m 27m 1 kubelet, ip-172-21-56-148.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "slotmachine-logs"
27m 27m 1 kubelet, ip-172-21-56-148.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-9bxjf"
27m 4m 10 kubelet, ip-172-21-56-148.compute.internal spec.containers{slotmachine} Normal Pulled Container image "gt/slotmachine:develop.6590.xxxx.2866" already present on machine
27m 4m 10 kubelet, ip-172-21-56-148.compute.internal spec.containers{slotmachine} Normal Created Created container
27m 4m 10 kubelet, ip-172-21-56-148.compute.internal spec.containers{slotmachine} Normal Started Started container
27m 11s 113 kubelet, ip-172-21-56-148.compute.internal spec.containers{slotmachine} Warning BackOff Back-off restarting failed container
27m 11s 113 kubelet, ip-172-21-56-148.compute.internal Warning FailedSync Error syncing podNote: Checked disk space, CPU, memory on the node running that pod it's fine. According to pod logs, it's not able to connect config service but then other 3 are able to connect to this service so not able to figure it out what is wrong here!
admin@ip-172-21-43-86:~$ kubectl logs -n cog-prod01 slotmachine-1688723297-sk8bn
03:01:02.104 [main] INFO org.springframework.cloud.config.client.ConfigServicePropertySourceLocator - Fetching config from server at: http://configservice:8888
03:01:05.344 [main] WARN org.springframework.cloud.config.client.ConfigServicePropertySourceLocator - Could not locate PropertySource: I/O error on GET request for "http://configservice:8888/slotmachine/cog,cog-prod01": No route to host (Host unreachable); nested exception is java.net.NoRouteToHostException: No route to host (Host unreachable)
03:01:05.381 [main] INFO org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext - Refreshing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@77eca502: startup date [Tue Jan 07 03:01:05 UTC 2020]; parent: org.springframework.context.annotation.AnnotationConfigApplicationContext@4fb0f2b9Not enough capacity is available on the node or nodes so scheduler is not able to deploy your 4th pod. You may check this with kubectl describe nodes. For detailed explanation, have a look at my answer to GKE Insufficient CPU for small Node.js app pods
Check if Kube Proxy is working properly on your nodes.
Here is a guide on debugging Kube Proxy