I have successfully created spark cluster on kubernetes with 1 master and 2 worker pods. The spark v2.4.3 running with Java 8 and scala 2.11.12 on k8s with kubectl v1.16.0 and minikube v1.4.0.
For detailed kubectl get pods shows this -
NAME READY STATUS RESTARTS AGE
spark-master-fcfd55d7d-qrpsw 1/1 Running 0 66m
spark-worker-686bd57b5d-6s9zb 1/1 Running 0 65m
spark-worker-686bd57b5d-wrqrd 1/1 Running 0 65mI am also able run built-in spark application such as pyspark and spark-shell by execing the master pod -
kubectl exec spark-master-fcfd55d7d-qrpsw -it spark-shell` Since I already have enough env I am trying to run my spark job on this like above. But it is not working. The spark submit command looks like this.
#!/usr/bin/env bash
spark-submit \
--class com.cloudian.spark.main.RequestInfoLogStreamer \
/Users/atekade/IdeaProjects/scala-spark-streaming/target/scala-2.11/scala-spark-streaming_2.11-1.0.jarAnd the .sh script is then submitted to master pod -
kubectl exec spark-master-fcfd55d7d-qrpsw /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.shBut this is giving me error -
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"/Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh\": stat /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory": unknown
command terminated with exit code 126What am I doing wrong here? My intention is to get the work done by these master and worker nodes.
As you can read from the error:
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"/Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh\": stat /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory": unknown command terminated with exit code 126
What interest us the most is part /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory, which means the pod is unable to locate the logstreamer.sh file.
Script logstreamer.sh needs to be uploaded to the spark-master pod. Also the scala-spark-streaming_2.11-1.0.jar needs to be there as well.
You can configure a PersistenVolume for Storage, this will be useful because if your pod will ever be rescheduled all data that was not stored on a PV will be lost.
Here is a link to minikube documentation for Persistent Volumes.
You can also use different Storage Classes.