Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Block spark-submit call until job is complete #46

@ash211

Description

@ash211

When running spark-submit in YARN cluster mode, the spark-submit script stays running until the Spark job completes, printing out the application status every second until it eventually finishes:

2017-01-25 01:28:28,346 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:29,348 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:30,350 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:31,352 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:32,355 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:33,357 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:34,362 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:35,364 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:36,366 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:37,368 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:38,370 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:39,372 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)

We should have spark-submit with k8s cluster mode (only supported mode now) do the same -- block the call and poll for pod status until the pod terminates.

The blocking call seems required to match the YARN feature set, though as a possible extension we could provide driver logs instead of secondly-status polling using the below example. I for one would find that a great usability improvement over YARN-cluster mode's behavior.

https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/PodLogExample.java

P.S. As a side note, I'm interested in making this call blocking so I can more accurately perform perf benchmarks of the same job running in YARN vs kubernetes by running time spark-submit ... aimed at both clusters.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions