Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Better error message for failure to connect to Nodes #91

@foxish

Description

@foxish

I was running on a cluster where the firewall did not allow connections to the Nodes.
I would have expected an error message saying that the file could not be uploaded.

It instead continues to show that the pod is running and fails unexpectedly.

2017-02-07 22:34:49 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:50 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:51 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:52 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:53 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:54 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:55 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:56 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:57 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:58 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:34:59 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:00 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:01 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:02 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:03 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:04 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:05 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:06 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:07 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:08 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:09 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
2017-02-07 22:35:10 ERROR Client:91 - The driver pod with name foxish-1486535647246 in namespace default was not ready in 60 seconds.
Latest phase from the pod is: Running
The pod had no final message.

Driver container last state: Running
Driver container started at: 2017-02-08T06:34:10Z
java.util.concurrent.TimeoutException: Timeout waiting for task.
	at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276)
	at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5$$anonfun$apply$7.apply(Client.scala:189)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5$$anonfun$apply$7.apply(Client.scala:148)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5.apply(Client.scala:148)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5.apply(Client.scala:133)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6.apply(Client.scala:133)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6.apply(Client.scala:105)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client.run(Client.scala:105)
	at org.apache.spark.deploy.kubernetes.Client$.main(Client.scala:682)
	at org.apache.spark.deploy.kubernetes.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:117)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2017-02-07 22:35:10 INFO  LoggingPodStatusWatcher:54 - Application status for foxish-1486535647246 (phase: Running)
Exception in thread "OkHttp https://104.154.43.148/api/v1/namespaces/default/pods?labelSelector=spark-app-id%3Dfoxish-1486535647246,spark-app-name%3Dfoxish,spark-driver%3Dfoxish-1486535647246&resourceVersion=3440763&watch=true WebSocket" java.lang.NullPointerException
	at org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
	at org.spark_project.guava.util.concurrent.AbstractFuture.setException(AbstractFuture.java:201)
	at org.spark_project.guava.util.concurrent.SettableFuture.setException(SettableFuture.java:68)
	at org.apache.spark.deploy.kubernetes.Client$DriverPodWatcher.onClose(Client.scala:459)
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onClose(WatchConnectionManager.java:259)
	at okhttp3.internal.ws.RealWebSocket.peerClose(RealWebSocket.java:197)
	at okhttp3.internal.ws.RealWebSocket.access$200(RealWebSocket.java:38)
	at okhttp3.internal.ws.RealWebSocket$1$2.execute(RealWebSocket.java:84)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Exception in thread "main" org.apache.spark.SparkException: The driver pod with name foxish-1486535647246 in namespace default was not ready in 60 seconds.
Latest phase from the pod is: Running
The pod had no final message.

Driver container last state: Running
Driver container started at: 2017-02-08T06:34:10Z
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5$$anonfun$apply$7.apply(Client.scala:196)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5$$anonfun$apply$7.apply(Client.scala:148)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5.apply(Client.scala:148)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5.apply(Client.scala:133)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6.apply(Client.scala:133)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6.apply(Client.scala:105)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client.run(Client.scala:105)
	at org.apache.spark.deploy.kubernetes.Client$.main(Client.scala:682)
	at org.apache.spark.deploy.kubernetes.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:117)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
	at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276)
	at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$5$$anonfun$apply$7.apply(Client.scala:189)
	... 20 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions