We are using Airflow to schedule Spark job on Kubernetes. Recently, I have encountered a scenario where:
What happened due to this is that airflow created another POD for running the same job which resulted in race condition.
Airflow is using Kubernetes Python client's read_namespaced_pod API()
def read_pod(self, pod):
"""Read POD information"""
try:
return self._client.read_namespaced_pod(pod.metadata.name, pod.metadata.namespace)
except BaseHTTPError as e:
raise AirflowException(
'There was an error reading the kubernetes API: {}'.format(e)
)
I believe read_namespaced_pod() calls Kubernetes API. In order to investigate this further, I would like to like check logs of Kubernetes API server.
Can you please share steps to check what is happening on Kubernetes side ?
Note: Kubernetes version is 1.18 and Airflow version is 1.10.10.
Answering the question from the perspective of logs/troubleshooting:
I believe read_namespaced_pod() calls Kubernetes API. In order to investigate this further, I would like to like check logs of Kubernetes API server.
Yes, you are correct, this function calls the Kubernetes API. You can check the logs of Kubernetes API server by running:
$ kubectl logs -n kube-system KUBERNETES_API_SERVER_POD_NAMEI would also consider checking the kube-controller-manager:
$ kubectl logs -n kube-system KUBERNETES_CONTROLLER_MANAGER_POD_NAMEThe example output of it:
I0413 12:33:12.840270 1 event.go:291] "Event occurred" object="default/nginx-6799fc88d8" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: nginx-6799fc88d8-kchp7"A side note!
Above commands will work assuming that your
kubernetes-apiserverandkubernetes-controller-managerPodis visible to you
Can you please share steps to check what is happening on Kubernetes side ?
This question targets the basics of troubleshooting/logs checking.
For that you can use following commands (and the ones mentioned earlier):
$ kubectl get RESOURCE RESOURCE_NAME:$ kubectl get pod airflow-pod-name also you can add -o yaml for more information
$ kubectl describe RESOURCE RESOURCE_NAME:$ kubectl describe pod airflow-pod-name$ kubectl logs POD_NAME:$ kubectl logs airflow-pod-nameAdditional resources: