-
Notifications
You must be signed in to change notification settings - Fork 117
Block spark-submit call until job is complete #46
Description
When running spark-submit in YARN cluster mode, the spark-submit script stays running until the Spark job completes, printing out the application status every second until it eventually finishes:
2017-01-25 01:28:28,346 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:29,348 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:30,350 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:31,352 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:32,355 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:33,357 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:34,362 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:35,364 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:36,366 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:37,368 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:38,370 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:39,372 INFO [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
We should have spark-submit with k8s cluster mode (only supported mode now) do the same -- block the call and poll for pod status until the pod terminates.
The blocking call seems required to match the YARN feature set, though as a possible extension we could provide driver logs instead of secondly-status polling using the below example. I for one would find that a great usability improvement over YARN-cluster mode's behavior.
P.S. As a side note, I'm interested in making this call blocking so I can more accurately perform perf benchmarks of the same job running in YARN vs kubernetes by running time spark-submit ... aimed at both clusters.