[SPARK-10812][YARN] Spark hadoop util support switching to yarn #8911

holdenk · 2015-09-25T00:01:29Z

While this is likely not a huge issue for real production systems, for test systems which may setup a Spark Context and tear it down and stand up a Spark Context with a different master (e.g. some local mode & some yarn mode) tests this cane be an issue. Discovered during work on spark-testing-base on Spark 1.4.1, but seems like the logic that triggers it is present in master (see SparkHadoopUtil object). A valid work around for users encountering this issue is to fork a different JVM, however this can be heavy weight.

[info] SampleMiniClusterTest:
[info] Exception encountered when attempting to run a suite with class name: com.holdenkarau.spark.testing.SampleMiniClusterTest *** ABORTED ***
[info] java.lang.ClassCastException: org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
[info] at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:163)
[info] at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:257)
[info] at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561)
[info] at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115)
[info] at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
[info] at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
[info] at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
[info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.setup(SharedMiniCluster.scala:186)
[info] at com.holdenkarau.spark.testing.SampleMiniClusterTest.setup(SampleMiniClusterTest.scala:26)
[info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.beforeAll(SharedMiniCluster.scala:103)

…ils as expected

SparkQA · 2015-09-25T00:12:52Z

Test build #42993 has finished for PR 8911 at commit 1915e7d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-09-25T07:19:42Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

LGTM. Only trivial things to suggest like a space after 'try'

SparkQA · 2015-09-25T19:19:06Z

Test build #43031 has started for PR 8911 at commit f97ec06.

vanzin · 2015-09-25T20:50:25Z

There's code in different places that set SPARK_YARN_MODE, but there's no code to unset it. So, to follow your example, if you start a context with yarn-client and later start another one in standalone mode, the latter will use the YARN version of the utils class.

Other than that, looks sane. This is another piece of code that will need some serious thought when trying to fix the "Spark doesn't allow multiple contexts" issue, though.

holdenk · 2015-09-25T21:16:36Z

@vanzin so in my own code (where I do try and switch between yarn and non yarn mode) I clear the SPARK_YARN_MODE as done in the test.

I could update SparkContext to explicitly clear SPARK_YARN_MODE if its being launched with a non yarn mode client if you think that would be helpful for people?

vanzin · 2015-09-25T21:18:09Z

Yeah, having the Spark code clean up after itself is easier because it means people don't have to remember to do it, and it doesn't need to be documented.

holdenk · 2015-09-25T21:19:06Z

Makes sense, do you think I should put that change in the SparkContext (on startup of non-yarn client or stop of any client) or in the yarnclient stop code?

vanzin · 2015-09-25T21:25:42Z

I did a cursory lookup for where it is set, and I think the places that need to be changed are SparkContext.stop() and YARN's Client.scala.

Doing it in SparkContext's creating is an option, but feels a little weird; in that case I'd rather set it to 1 or 0 (or some other boolean value) to indicate whether it's running in YARN mode, but that would be a much bigger change.

holdenk · 2015-09-25T21:31:34Z

@vanzin so it seems like if I do it in SparkContext shutdown that should be sufficient for all cases?

vanzin · 2015-09-25T21:33:12Z

I don't think so. Client.scala, for better or for worse, is still a public API. So you can submit a yarn-cluster job by calling Client.scala directly, and that would leave SPARK_YARN_MODE set.

holdenk · 2015-09-25T21:34:40Z

ah that makes sense, I guess I forgot the Client was a public API.

vanzin · 2015-09-25T23:18:11Z

core/src/main/scala/org/apache/spark/SparkContext.scala

To be super paranoid, I'd do this before the previous line.

vanzin · 2015-09-25T23:20:30Z

LGTM aside from two minor things.

vanzin · 2015-09-25T23:32:21Z

yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala

While we're here, comparing ...getClass === classOf[...] would be better IMO. Fewer magic strings.

SparkQA · 2015-09-26T02:57:34Z

Test build #43035 has finished for PR 8911 at commit d9ca925.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2015-09-27T20:06:47Z

@vanzin updated with the suggested changes :)

vanzin · 2015-09-28T13:33:13Z

LGTM, merging to master. Thanks!

While this is likely not a huge issue for real production systems, for test systems which may setup a Spark Context and tear it down and stand up a Spark Context with a different master (e.g. some local mode & some yarn mode) tests this cane be an issue. Discovered during work on spark-testing-base on Spark 1.4.1, but seems like the logic that triggers it is present in master (see SparkHadoopUtil object). A valid work around for users encountering this issue is to fork a different JVM, however this can be heavy weight. ``` [info] SampleMiniClusterTest: [info] Exception encountered when attempting to run a suite with class name: com.holdenkarau.spark.testing.SampleMiniClusterTest *** ABORTED *** [info] java.lang.ClassCastException: org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil [info] at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:163) [info] at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:257) [info] at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561) [info] at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115) [info] at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) [info] at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) [info] at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) [info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.setup(SharedMiniCluster.scala:186) [info] at com.holdenkarau.spark.testing.SampleMiniClusterTest.setup(SampleMiniClusterTest.scala:26) [info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.beforeAll(SharedMiniCluster.scala:103) ``` Author: Holden Karau <[email protected]> Closes #8911 from holdenk/SPARK-10812-spark-hadoop-util-support-switching-to-yarn. (cherry picked from commit d8d50ed)

stevenmanton · 2015-12-10T20:48:38Z

I'm running into this issue when running tests using pytest on pyspark with version 1.4.1. Is there a workaround I can use in pyspark in the meantime before we're able to upgrade to 1.5.2/1.6 to benefit from this fix?

holdenk · 2015-12-10T23:06:52Z

@stevenmanton that question probably belongs more on the user list - but I'd say maybe just don't use yarn mode for your tests.

stevenmanton · 2015-12-11T00:06:17Z

Thanks @holdenk. It ended up being a simple fix. I'll follow up with mailing list for any other questions.

While this is likely not a huge issue for real production systems, for test systems which may setup a Spark Context and tear it down and stand up a Spark Context with a different master (e.g. some local mode & some yarn mode) tests this cane be an issue. Discovered during work on spark-testing-base on Spark 1.4.1, but seems like the logic that triggers it is present in master (see SparkHadoopUtil object). A valid work around for users encountering this issue is to fork a different JVM, however this can be heavy weight. ``` [info] SampleMiniClusterTest: [info] Exception encountered when attempting to run a suite with class name: com.holdenkarau.spark.testing.SampleMiniClusterTest *** ABORTED *** [info] java.lang.ClassCastException: org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil [info] at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:163) [info] at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:257) [info] at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561) [info] at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115) [info] at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) [info] at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) [info] at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) [info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.setup(SharedMiniCluster.scala:186) [info] at com.holdenkarau.spark.testing.SampleMiniClusterTest.setup(SampleMiniClusterTest.scala:26) [info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.beforeAll(SharedMiniCluster.scala:103) ``` Author: Holden Karau <[email protected]> Closes apache#8911 from holdenk/SPARK-10812-spark-hadoop-util-support-switching-to-yarn.

holdenk added 2 commits September 24, 2015 14:59

Supply Yarn or regular hadoop libs as required

04c26b1

Check to make sure changing the system property changes the hadoop ut…

1915e7d

…ils as expected

srowen reviewed Sep 25, 2015
View reviewed changes

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala Outdated

Copy link

Member

srowen Sep 25, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only trivial things to suggest like a space after 'try'

holdenk added 2 commits September 25, 2015 12:06

Fix spacing

664162b

Make the tests simpler (thanks srowen :)).

f97ec06

holdenk changed the title ~~[SPARK-10812][YARN][WIP] Spark hadoop util support switching to yarn~~ [SPARK-10812][YARN] Spark hadoop util support switching to yarn Sep 25, 2015

holdenk added 2 commits September 25, 2015 14:40

Clear SPARK_YARN_MODE property on context shutdown

3c20e5d

Clear Spark property during yarn client stop as well

e038899

vanzin reviewed Sep 25, 2015
View reviewed changes

core/src/main/scala/org/apache/spark/SparkContext.scala Outdated

Copy link

Contributor

vanzin Sep 25, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be super paranoid, I'd do this before the previous line.

vanzin reviewed Sep 25, 2015
View reviewed changes

holdenk added 2 commits September 25, 2015 16:43

Clear the SPARK_YARN_MODE property earlier

fb0de1b

Compare classes rather than strings, clear propery in a final block

d9ca925

asfgit closed this in d8d50ed Sep 28, 2015

[SPARK-10812][YARN] Spark hadoop util support switching to yarn #8911

[SPARK-10812][YARN] Spark hadoop util support switching to yarn #8911

Uh oh!

Conversation

holdenk commented Sep 25, 2015

Uh oh!

SparkQA commented Sep 25, 2015

Uh oh!

srowen Sep 25, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 25, 2015

Uh oh!

vanzin commented Sep 25, 2015

Uh oh!

holdenk commented Sep 25, 2015

Uh oh!

vanzin commented Sep 25, 2015

Uh oh!

holdenk commented Sep 25, 2015

Uh oh!

vanzin commented Sep 25, 2015

Uh oh!

holdenk commented Sep 25, 2015

Uh oh!

vanzin commented Sep 25, 2015

Uh oh!

holdenk commented Sep 25, 2015

Uh oh!

vanzin Sep 25, 2015

Choose a reason for hiding this comment

Uh oh!

vanzin commented Sep 25, 2015

Uh oh!

vanzin Sep 25, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 26, 2015

Uh oh!

holdenk commented Sep 27, 2015

Uh oh!

vanzin commented Sep 28, 2015

Uh oh!

stevenmanton commented Dec 10, 2015

Uh oh!

holdenk commented Dec 10, 2015

Uh oh!

stevenmanton commented Dec 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants