Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Run example failed with: Executor cannot find driver pod #202

@hustcat

Description

@hustcat

I run example by following here:

# spark-submit \
>   --deploy-mode cluster \
>   --class org.apache.spark.examples.SparkPi \
>   --master k8s://http://localhost:8080 \
>   --kubernetes-namespace dbyin \
>   --conf spark.executor.instances=5 \
>   --conf spark.app.name=spark-pi \
>   --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-rc1 \
>   --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.1.0-rc1 \
>   examples/jars/spark-examples_2.11-2.1.0-k8s-0.1.0-SNAPSHOT.jar
...

2017-03-24 11:32:42 INFO  LoggingPodStatusWatcher:54 - Application status for spark-pi-1490326341602 (phase: Running)
2017-03-24 11:32:43 INFO  LoggingPodStatusWatcher:54 - State changed, new state: 
         pod name: spark-pi-1490326341602
         namespace: dbyin
         labels: spark-app-id -> spark-pi-1490326341602, spark-app-name -> spark-pi, spark-driver -> spark-pi-1490326341602
         pod uid: 7e15865a-1042-11e7-9180-6c0b84be1f24
         creation time: 2017-03-24T03:32:22Z
         service account name: default
         volumes: spark-submission-secret-volume, default-token-rd4n0
         node name: 100.x.x.x
         start time: 2017-03-24T03:32:22Z
         container images: kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-rc1
         phase: Running
         status: [ContainerStatus(containerID=docker://3dace9a96700be04a49c7f011050494de35bdb6d5b2f6c3d2f09177149cc5849, image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-rc1, imageID=docker://sha256:b5dbf02827567eed33fac7f94ca99fd408076db872f49280c2615837068775df, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2017-03-24T03:32:37Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
2017-03-24 11:32:43 INFO  Client:54 - Driver pod successfully created in Kubernetes cluster.
2017-03-24 11:32:43 INFO  Client:54 - Driver service created successfully in Kubernetes.
2017-03-24 11:32:43 INFO  Client:54 - Driver endpoints ready to receive application submission
2017-03-24 11:32:43 WARN  NodePortUrisDriverServiceManager:66 - Submitting application details, application secret, Kubernetes credentials, and local jars to the cluster over an insecure connection. You should configure SSL to secure this step.
2017-03-24 11:32:43 INFO  Client:54 - Submitting local resources to driver pod for application spark-pi-1490326341602 ...
2017-03-24 11:32:43 INFO  Client:54 - Successfully submitted local resources and driver configuration to driver pod.
2017-03-24 11:32:43 INFO  Client:54 - Finished submitting application to Kubernetes.
2017-03-24 11:32:43 INFO  KubernetesResourceCleaner:54 - Deleting 1 registered Kubernetes resources...
2017-03-24 11:32:43 INFO  KubernetesResourceCleaner:54 - Deleted 1 registered Kubernetes resources.
2017-03-24 11:32:43 INFO  Client:54 - Waiting for application spark-pi-1490326341602 to finish...
2017-03-24 11:32:43 INFO  LoggingPodStatusWatcher:54 - Application status for spark-pi-1490326341602 (phase: Running)
2017-03-24 11:32:44 INFO  LoggingPodStatusWatcher:54 - Application status for spark-pi-1490326341602 (phase: Running)
2017-03-24 11:32:45 INFO  LoggingPodStatusWatcher:54 - Application status for spark-pi-1490326341602 (phase: Running)
2017-03-24 11:32:46 INFO  LoggingPodStatusWatcher:54 - Application status for spark-pi-1490326341602 (phase: Running)
2017-03-24 11:32:47 INFO  LoggingPodStatusWatcher:54 - State changed, new state: 
         pod name: spark-pi-1490326341602
         namespace: dbyin
         labels: spark-app-id -> spark-pi-1490326341602, spark-app-name -> spark-pi, spark-driver -> spark-pi-1490326341602
         pod uid: 7e15865a-1042-11e7-9180-6c0b84be1f24
         creation time: 2017-03-24T03:32:22Z
         service account name: default
         volumes: spark-submission-secret-volume, default-token-rd4n0
         node name: 100.x.x.x
         start time: 2017-03-24T03:32:22Z
         container images: kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-rc1
         phase: Failed
         status: [ContainerStatus(containerID=docker://3dace9a96700be04a49c7f011050494de35bdb6d5b2f6c3d2f09177149cc5849, image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-rc1, imageID=docker://sha256:b5dbf02827567eed33fac7f94ca99fd408076db872f49280c2615837068775df, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://3dace9a96700be04a49c7f011050494de35bdb6d5b2f6c3d2f09177149cc5849, exitCode=1, finishedAt=2017-03-24T03:32:46Z, message=null, reason=Error, signal=null, startedAt=2017-03-24T03:32:37Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2017-03-24 11:32:47 INFO  Client:54 - Application spark-pi-1490326341602 finished.
2017-03-24 11:32:47 INFO  ShutdownHookManager:54 - Shutdown hook called

Kube pod's error logs:

...
2017-03-24 03:32:45 INFO  ServerConnector:266 - Started ServerConnector@4409e975{HTTP/1.1}{0.0.0.0:4040}
2017-03-24 03:32:45 INFO  Server:379 - Started @1500ms
2017-03-24 03:32:45 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2017-03-24 03:32:45 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://192.168.16.30:4040
2017-03-24 03:32:45 INFO  SparkContext:54 - Added JAR /tmp/spark-10597f49-5dc5-4b6b-a8c3-49c7ae111ee7/spark-examples_2.11-2.1.0-k8s-0.1.0-SNAPSHOT.jar at spark://192.168.16.30:7078/jars/spark-examples_2.11-2.1.0-k8s-0.1.0-SNAPSHOT.jar with timestamp 1490326365327
2017-03-24 03:32:45 ERROR KubernetesClusterSchedulerBackend:91 - Executor cannot find driver pod.
io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:61)
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:52)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:199)
        at org.apache.spark.scheduler.cluster.kubernetes.KubernetesClusterSchedulerBackend.liftedTree1$1(KubernetesClusterSchedulerBackend.scala:84)
        at org.apache.spark.scheduler.cluster.kubernetes.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:82)
        at org.apache.spark.scheduler.cluster.kubernetes.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:34)
        at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2554)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
        at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
        at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
        at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
        at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:267)
        at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:237)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:148)
        at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:186)
        at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
        at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
        at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179)
        at okhttp3.RealCall.execute(RealCall.java:63)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:237)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:232)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:228)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:711)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:192)
        ... 12 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
        at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
        at sun.security.validator.Validator.validate(Validator.java:260)
        at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
        at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
        at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
        ... 46 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
        at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
        at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
        ... 52 more
2017-03-24 03:32:45 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Executor cannot find driver pod
        at org.apache.spark.scheduler.cluster.kubernetes.KubernetesClusterSchedulerBackend.liftedTree1$1(KubernetesClusterSchedulerBackend.scala:88)
        at org.apache.spark.scheduler.cluster.kubernetes.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:82)
        at org.apache.spark.scheduler.cluster.kubernetes.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:34)
        at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2554)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
        at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
        at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:61)
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:52)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:199)
        at org.apache.spark.scheduler.cluster.kubernetes.KubernetesClusterSchedulerBackend.liftedTree1$1(KubernetesClusterSchedulerBackend.scala:84)
        ... 11 more
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
        at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
        at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:267)
        at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:237)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:148)
        at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:186)
        at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
        at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
        at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179)
        at okhttp3.RealCall.execute(RealCall.java:63)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:237)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:232)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:228)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:711)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:192)
        ... 12 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
        at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
        at sun.security.validator.Validator.validate(Validator.java:260)
        at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
        at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
        at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
        ... 46 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
        at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
        at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
        ... 52 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions