[SPARK-6797][SPARKR] Add support for YARN cluster mode. #6743

sun-rui · 2015-06-10T11:32:50Z

This PR enables SparkR to dynamically ship the SparkR binary package to the AM node in YARN cluster mode, thus it is no longer required that the SparkR package be installed on each worker node.

This PR uses the JDK jar tool to package the SparkR package, because jar is thought to be available on both Linux/Windows platforms where JDK has been installed.

This PR does not address the R worker involved in RDD API. Will address it in a separate JIRA issue.

This PR does not address SBT build. SparkR installation and packaging by SBT will be addressed in a separate JIRA issue.

R/install-dev.bat is not tested. @shivaram , Could you help to test it?

SparkQA · 2015-06-10T15:16:27Z

Test build #34593 has finished for PR 6743 at commit 528f30e.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-06-10T19:07:43Z

Why is this needed if the only thing it is using is the DataFrame API?

rxin · 2015-06-10T19:08:17Z

ah i see. this is only for the AM.

SparkQA · 2015-06-11T07:04:10Z

Test build #34663 has finished for PR 6743 at commit 8d2a8df.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

shivaram · 2015-06-11T17:57:58Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Could you add a comment here as to why we have this '#sparkr" ? I believe this is to get the archive to unzip to a symlink named sparkr ?

sun-rui · 2015-06-12T00:47:04Z

Yes this assigns a symbol link name. Thus we can refer to the shipped package via the logical name instead of the specific archive file name.

shivaram · 2015-06-12T18:54:53Z

Thanks @sun-rui -- I didn't get a chance to test this on windows (and a YARN cluster yet). Will try to do it this weekend.

Also it looks like there is a merge conflict. Could you resolve that ?

SparkQA · 2015-06-12T19:02:02Z

Test build #34779 has finished for PR 6743 at commit 77055af.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

sun-rui · 2015-06-13T18:38:03Z

rebased

SparkQA · 2015-06-13T20:28:43Z

Test build #34834 has finished for PR 6743 at commit c8ed3d2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2015-06-14T00:10:24Z

R/install-dev.bat

@sun-rui -- this should be jar.exe instead of jar. The other thing is that jar.exe is only available in the JDK and not in the JAR version. So sometimes this may not be in the PATH. There are a couple of options for things we can do here

We can use %JAVA_HOME%\bin\jar.exe -- This might be more safer as users need to set JAVA_HOME for the compilation to work correctly

Rtools [1] by default installs a zip utility [2] as zip.exe. At least on my machine running zip.exe -r sparkr.zip SparkR seems to work.

[1] http://cran.r-project.org/bin/windows/Rtools/
[2] http://www.info-zip.org/

SparkQA · 2015-06-14T16:47:41Z

Test build #34886 has finished for PR 6743 at commit 2a8f7e5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-14T16:51:29Z

Test build #34887 has finished for PR 6743 at commit e57c0f4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2015-06-14T23:22:55Z

@sun-rui Thanks for the update. I just tested this on a YARN cluster and things seem to work correctly for the use case where we create a data frame from a file (i.e read.df).

However the createDataFrame code path still requires the RRDD to pick up worker.R from the zip file. Is there a separate JIRA we have for that ?

cc @davies

sun-rui · 2015-06-15T06:16:56Z

I originally planned to have a separate JIRA issue for adding shipping of the SparkR package for RDD APIs. But if this is still required by DataFrame API, I can do it in this PR.
Just wonder to know why all test pass? Not enough test cases for createDataFrame()?

shivaram · 2015-06-15T15:57:33Z

The tests pass because the createDataFrame call only fails in the YARN cluster mode and we don't have tests for SparkR in the YarnSuite (Actually this deserves a new JIRA).

BTW the change needed for createDataFrame to work should be minimal -- I think we can just set the variable SPARKR_PACKAGE_DIR for all modes and then set the default value of sparkRLibDir[1] to be Sys.getenv(SPARKR_PACKAGE_DIR).

[1] https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L103

tgravescs · 2015-06-26T19:33:24Z

I'm curious, have you looked at ways of shipping R itself with the job or are you relying on R being installed on all the yarn nodes? Should be possible with distributed cache just wondering if anyone has done it or looked at making it automatic

SparkQA · 2015-06-29T14:01:59Z

Test build #35989 has finished for PR 6743 at commit 0925e2a.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

sun-rui · 2015-06-29T14:24:21Z

@tgravescs， I think the problem of shipping R itself is that R executable is platform specific. Also it may require OS specific installation before running R (not sure). pySpark also does not ship python itself.

SparkQA · 2015-06-29T14:36:34Z

Test build #35990 has finished for PR 6743 at commit 86b3fa9.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

sun-rui · 2015-06-29T14:43:04Z

Add support for shipping SparkR package for R workers required by RDD APIs. Tested createDataFrame() by creating a DataFrame from an R list.

Remove sparkRLibDir parameter of sparkR.init(). Determine SparkR package location on each worker node according to the deployment mode (this allows node-specific SPARK_HOME). Not sure if there is better solution. A rough code scan about pySpark does not tell me how pySpark locates pySpark.zip in various deployment modes. @davies , could you help to give me a hint and review this patch?

Next, I'd like to refactor this code to align with SPARK-5479 (moves YARN specific code from SparkSubmit to deploy/yarn) @shivaram, do you think I refactor code in this patch or do it in a new JIRA?

sun-rui · 2015-06-29T15:34:18Z

rebased

SparkQA · 2015-06-29T15:53:20Z

Test build #35994 has finished for PR 6743 at commit c6ef550.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2015-06-29T16:21:15Z

Thanks @sun-rui for the update. @davies can probably confirm but AFAIK the PySpark location is picked up at [1] by looking at the JAR file path / Spark home.

I'll take a closer look at the code later today, but in terms of refactoring I'd say lets do in a separate JIRA.

[1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala#L31

davies · 2015-06-29T19:08:15Z

@sun-rui Unfortunately, I also don't know much about how PySpark run on YARN. Could you add some unit test for YARN mode? (just follow the Python ones)

sun-rui · 2015-06-30T00:48:23Z

@shivaram, yes, I saw that function, but felt confusing that it does not consider the YARN mode case.
@davies, it seems unit tests for pySpark in YARN modes were added in https://github.com/apache/spark/pull/6360/files. Need confirmation.

shivaram · 2015-06-30T00:54:47Z

cc @andrewor14 who probably knows some thing about the YARN tests

SparkQA · 2015-07-08T17:23:58Z

Test build #36801 has finished for PR 6743 at commit c644746.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2015-07-08T17:25:56Z

Jenkins, retest this please

SparkQA · 2015-07-08T19:43:14Z

Test build #36804 has finished for PR 6743 at commit c644746.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-07-10T20:11:36Z

@sun-rui could you rebase this? There are conflicts now. @shivaram does the R side look good to you? If so feel free to merge it.

shivaram · 2015-07-10T22:01:42Z

Yeah R code looks fine to me. However as this changes some fundamental things with how we locate the package, @sun-rui it will be great if you could confirm the two or three scenarios work correctly (with just a local master or a standalone master is fine):

Using shell by running bin/sparkR
Running a script using bin/spark-submit or bin/sparkR
Launching SparkR from RStudio

If things look good lets merge this once its rebased + tested. We can do more testing while its in the tree before 1.5

… APIs.

…esos modes.

SparkQA · 2015-07-13T09:39:26Z

Test build #37127 has finished for PR 6743 at commit ca63c86.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sun-rui · 2015-07-13T11:14:01Z

@shivaram , tests done. Also tested with YARN cluster, yarn-client, standalone, createDataFrame() in YARN client mode.

shivaram · 2015-07-13T15:19:59Z

Thanks @sun-rui - LGTM. Merging this

This PR enables SparkR to dynamically ship the SparkR binary package to the AM node in YARN cluster mode, thus it is no longer required that the SparkR package be installed on each worker node. This PR uses the JDK jar tool to package the SparkR package, because jar is thought to be available on both Linux/Windows platforms where JDK has been installed. This PR does not address the R worker involved in RDD API. Will address it in a separate JIRA issue. This PR does not address SBT build. SparkR installation and packaging by SBT will be addressed in a separate JIRA issue. R/install-dev.bat is not tested. shivaram , Could you help to test it? Author: Sun Rui <[email protected]> Closes #6743 from sun-rui/SPARK-6797 and squashes the following commits: ca63c86 [Sun Rui] Adjust MimaExcludes after rebase. 7313374 [Sun Rui] Fix unit test errors. 72695fb [Sun Rui] Fix unit test failures. 193882f [Sun Rui] Fix Mima test error. fe25a33 [Sun Rui] Fix Mima test error. 35ecfa3 [Sun Rui] Fix comments. c38a005 [Sun Rui] Unzipped SparkR binary package is still required for standalone and Mesos modes. b05340c [Sun Rui] Fix scala style. 2ca5048 [Sun Rui] Fix comments. 1acefd1 [Sun Rui] Fix scala style. 0aa1e97 [Sun Rui] Fix scala style. 41d4f17 [Sun Rui] Add support for locating SparkR package for R workers required by RDD APIs. 49ff948 [Sun Rui] Invoke jar.exe with full path in install-dev.bat. 7b916c5 [Sun Rui] Use 'rem' consistently. 3bed438 [Sun Rui] Add a comment. 681afb0 [Sun Rui] Fix a bug that RRunner does not handle client deployment modes. cedfbe2 [Sun Rui] [SPARK-6797][SPARKR] Add support for YARN cluster mode.

shivaram · 2015-07-13T21:17:22Z

@sun-rui Could you close this PR ? Looks like Github PRs are not being closed due to an infrastructure issue https://issues.apache.org/jira/browse/INFRA-9988

sun-rui · 2015-07-14T01:06:38Z

@shivaram, close the PR.

shivaram reviewed Jun 11, 2015
View reviewed changes

sun-rui force-pushed the SPARK-6797 branch from 77055af to c8ed3d2 Compare June 13, 2015 18:37

shivaram reviewed Jun 14, 2015
View reviewed changes

sun-rui force-pushed the SPARK-6797 branch from 86b3fa9 to c6ef550 Compare June 29, 2015 15:33

Sun Rui added 16 commits July 13, 2015 10:50

[SPARK-6797][SPARKR] Add support for YARN cluster mode.

cedfbe2

Fix a bug that RRunner does not handle client deployment modes.

681afb0

Add a comment.

3bed438

Use 'rem' consistently.

7b916c5

Invoke jar.exe with full path in install-dev.bat.

49ff948

Add support for locating SparkR package for R workers required by RDD…

41d4f17

… APIs.

Fix scala style.

0aa1e97

Fix scala style.

1acefd1

Fix comments.

2ca5048

Fix scala style.

b05340c

Unzipped SparkR binary package is still required for standalone and M…

c38a005

…esos modes.

Fix comments.

35ecfa3

Fix Mima test error.

fe25a33

Fix Mima test error.

193882f

Fix unit test failures.

72695fb

Fix unit test errors.

7313374

sun-rui force-pushed the SPARK-6797 branch from c644746 to ca63c86 Compare July 13, 2015 07:15

Adjust MimaExcludes after rebase.

ca63c86

sun-rui closed this Jul 14, 2015

[SPARK-6797][SPARKR] Add support for YARN cluster mode. #6743

[SPARK-6797][SPARKR] Add support for YARN cluster mode. #6743

Uh oh!

Conversation

sun-rui commented Jun 10, 2015

Uh oh!

SparkQA commented Jun 10, 2015

Uh oh!

rxin commented Jun 10, 2015

Uh oh!

rxin commented Jun 10, 2015

Uh oh!

SparkQA commented Jun 11, 2015

Uh oh!

shivaram Jun 11, 2015

Choose a reason for hiding this comment

Uh oh!

sun-rui commented Jun 12, 2015

Uh oh!

shivaram commented Jun 12, 2015

Uh oh!

SparkQA commented Jun 12, 2015

Uh oh!

sun-rui commented Jun 13, 2015

Uh oh!

SparkQA commented Jun 13, 2015

Uh oh!

shivaram Jun 14, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 14, 2015

Uh oh!

SparkQA commented Jun 14, 2015

Uh oh!

shivaram commented Jun 14, 2015

Uh oh!

sun-rui commented Jun 15, 2015

Uh oh!

shivaram commented Jun 15, 2015

Uh oh!

tgravescs commented Jun 26, 2015

Uh oh!

SparkQA commented Jun 29, 2015

Uh oh!

sun-rui commented Jun 29, 2015

Uh oh!

SparkQA commented Jun 29, 2015

Uh oh!

sun-rui commented Jun 29, 2015

Uh oh!

sun-rui commented Jun 29, 2015

Uh oh!

SparkQA commented Jun 29, 2015

Uh oh!

shivaram commented Jun 29, 2015

Uh oh!

davies commented Jun 29, 2015

Uh oh!

sun-rui commented Jun 30, 2015

Uh oh!

shivaram commented Jun 30, 2015

Uh oh!

SparkQA commented Jul 8, 2015

Uh oh!

shivaram commented Jul 8, 2015

Uh oh!

SparkQA commented Jul 8, 2015

Uh oh!

andrewor14 commented Jul 10, 2015

Uh oh!

shivaram commented Jul 10, 2015

Uh oh!

SparkQA commented Jul 13, 2015

Uh oh!

sun-rui commented Jul 13, 2015

Uh oh!

shivaram commented Jul 13, 2015

Uh oh!

shivaram commented Jul 13, 2015

Uh oh!