Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Jul 1, 2015

@shivaram @cafreeman Could you please help me in testing this out? Exposing and running rPackageBuilder from inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit.

For testing, you may use the R branch with sbt-spark-package. You can call spPackage, and then pass the jar using --jars.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36218 has finished for PR 7139 at commit 8810beb.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Jul 1, 2015

@brkyvz Thanks for sending out this PR. Its looking good. I had a couple of high level points

  1. It might be good to point out in some of the error messages how the JAR should be structured for this to work. I know that it works out of the box with the SBT plugin but it will be good explain this for users who aren't using the plugin
  2. Does this also do the package install on all the executors ? Its not important right now with the DataFrame API but some of the work we do in the future will run R code in the executors.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36294 has finished for PR 7139 at commit 0226768.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Jul 1, 2015

@JoshRosen @shaneknapp -- So in this case the SparkR unit tests failed but the AmpLabJenkins message says Test Passed ? Do you know what could cause this ?

@brkyvz
Copy link
Contributor Author

brkyvz commented Jul 1, 2015

@shivaram Thanks for the feedback. I'll add more error messages regarding structuring. Regarding (1),
right now, this doesn't install anything on the executors, but the procedure should be as simple as running the same command once the executors receive the jars.

@shaneknapp
Copy link
Contributor

@shivaram -- i'm looking in to the bash monstrosities and seeing if it's somehow missing an exit code... to this end i'm setting up some test builds on our staging server. i'll report back once i figure it out.

@shaneknapp
Copy link
Contributor

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36447 has finished for PR 7139 at commit 0226768.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Jul 6, 2015

Test build #36588 has finished for PR 7139 at commit 0226768.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2015

Test build #36604 has finished for PR 7139 at commit bb751ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Jul 9, 2015

@shivaram @cafreeman I believe this is ready. I added unit, and end to end tests.

@SparkQA
Copy link

SparkQA commented Jul 9, 2015

Test build #36898 has finished for PR 7139 at commit eff5ba1.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz brkyvz changed the title [WIP][SPARK-8313] R Spark packages support [SPARK-8313] R Spark packages support Jul 9, 2015
@SparkQA
Copy link

SparkQA commented Jul 9, 2015

Test build #36910 has finished for PR 7139 at commit d867756.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Jul 9, 2015

Thanks @brkyvz for the update. @sun-rui Could you also take a look at this ?

@brkyvz
Copy link
Contributor Author

brkyvz commented Jul 9, 2015

retest this please

@brkyvz
Copy link
Contributor Author

brkyvz commented Jul 9, 2015

The test is failing, because Jenkins is not allowing me to install an R package to $SPARK_HOME/R/lib. I'll have to update the code somehow. One question I have is, is there a way to disable some tests for people that don't have R? Because we directly use R CMD INSTALL, these tests will fail for a random person testing this.

@SparkQA
Copy link

SparkQA commented Jul 9, 2015

Test build #36945 has finished for PR 7139 at commit d867756.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Jul 9, 2015

So one thing is that we don't need to write a $SPARK_HOME/R/lib. If you don't pass in the -l argument the package will get installed to the user's home directory which should be writable.

And regarding the R unit tests, I don't think the SparkR tests get run by maven by mvn test. It is explicitly included in our jenkins scripts. One thing we could do is check if R is installed and if not skip the tests ?

@cafreeman
Copy link

@shivaram The weird thing about R CMD INSTALL in this case is that if you don't pass -l it'll actually install to the first location on your .libpaths() list, which COULD still be $SPARK_HOME/R/lib depending on when this function gets processed.

If I remember right, there's something that ensures that SparkR is the last library that gets loaded, correct? If that's the case, then R CMD INSTALL would probably default to the standard package install location for R since that's all that would exist on .libPaths().

@shivaram
Copy link
Contributor

@cafreeman That function only gets executed when we put shell.R or general.R in R_PROFILE_USER (We do this in our spark-submit launchers). If you just run plain R, it shouldn't put SparkR on the path at all.

FWIW the SparkR being the last package loaded is done at https://github.com/apache/spark/blob/master/R/pkg/inst/profile/shell.R#L25

@shivaram
Copy link
Contributor

shivaram commented Aug 4, 2015

Thanks @brkyvz for the update. I did one pass over the code and mostly had minor comments. I think the idea of building a zip file at the end just before we launch is pretty good. BTW there is some code to create the zip in Windows at

rem Zip the SparkR package so that it can be distributed to worker nodes on YARN
that can also be cleaned up now.

BTW I'll also try to test this on Windows and see how far I get. My guess is that if we can make the name of the R binary configurable we should be able to get this working on windows, but we can do that in a follow up PR too

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39617 has finished for PR 7139 at commit 4258ffe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Aug 4, 2015

@brkyvz The jenkins failure message seems to be

ERROR: cannot cd to directory '/home/jenkins/workspace/SparkPullRequestBuilder@2/R/lib

from https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39617/artifact/core/target/unit-tests.log

Any ideas why we can't use that directory ? Are the directories read only by default or something like that ?

@brkyvz
Copy link
Contributor Author

brkyvz commented Aug 4, 2015

I guess the folder is read-only. Which makes sense, because it is part of the test infrastructure (what if someone added a test that ran rm -rf). I don't know of a work around :( Another possibility is that the folder doesn't exist. There is no R/lib. Let me try a workaround for the second option. If it's the first, I can't think of any options.

@brkyvz
Copy link
Contributor Author

brkyvz commented Aug 4, 2015

If R/lib doesn't exist though, that means that SparkR doesn't exist either, which means that the test will fail anyway :/ Maybe I could just remove that test? Or add a SparkR test instead? Are there any that use spark submit?

@brkyvz
Copy link
Contributor Author

brkyvz commented Aug 4, 2015

Hi @shivaram. Addressed your comments. Regarding the failing test... I disabled it for now. Maybe a better solution will be to include a test jar similar to what you already have for test_includeJAR.R and run the test on the Spark R side instead of Spark Submit.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39666 has finished for PR 7139 at commit 6603d0d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Aug 4, 2015

@brkyvz I think I figured out the problem with Jenkins -- Since Jenkins uses SBT it doesn't build the SparkR package at the beginning along with the rest of the artifacts and only builds SparkR while running SparkR unit tests. So what ends up happening is that R/lib doesn't exist when the core unit tests are running.

I think the right thing to do here is to get Jenkins to build SparkR along with other components. @JoshRosen and @yu-iskw were discussing a similar issue in #7883. For now can we just comment out the test and open a JIRA to un-comment it once we fix this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking -- do we need this import ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the Process Builder below. It takes a Java Collection

@brkyvz
Copy link
Contributor Author

brkyvz commented Aug 4, 2015

@shivaram removed the unused imports. Created a JIRA (SPARK-9603) for re-enabling the test, and added it as a TODO to the test

@shivaram
Copy link
Contributor

shivaram commented Aug 4, 2015

Thanks @brkyvz -- Changes LGTM. Can you check if @andrewor14 wants to take another look at the SparkSubmit changes ?

@brkyvz
Copy link
Contributor Author

brkyvz commented Aug 4, 2015

jenkins retest this please

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39729 has finished for PR 7139 at commit 0de384f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Aug 4, 2015

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #212 has finished for PR 7139 at commit 0de384f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39738 has finished for PR 7139 at commit 0de384f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39747 has finished for PR 7139 at commit 0de384f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pwendell
Copy link
Contributor

pwendell commented Aug 5, 2015

I took a look at the spark submit changes and they LGTM.

@brkyvz
Copy link
Contributor Author

brkyvz commented Aug 5, 2015

thanks @pwendell, @shivaram can you merge this please?

@pwendell
Copy link
Contributor

pwendell commented Aug 5, 2015

@shivaram maybe you can merge? I looked at the spark submit stuff but overall it was a very small part of the changes.

@shivaram
Copy link
Contributor

shivaram commented Aug 5, 2015

Yep - I'm out now, but will get back to my computer and merge

asfgit pushed a commit that referenced this pull request Aug 5, 2015
shivaram cafreeman Could you please help me in testing this out? Exposing and running `rPackageBuilder` from inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit.

For testing, you may use the R branch with [sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can call spPackage, and then pass the jar using `--jars`.

Author: Burak Yavuz <[email protected]>

Closes #7139 from brkyvz/r-submit and squashes the following commits:

0de384f [Burak Yavuz] remove unused imports 2
d253708 [Burak Yavuz] removed unused imports
6603d0d [Burak Yavuz] addressed comments
4258ffe [Burak Yavuz] merged master
ddfcc06 [Burak Yavuz] added zipping test
3a1be7d [Burak Yavuz] don't zip
77995df [Burak Yavuz] fix URI
ac45527 [Burak Yavuz] added zipping of all libs
e6bf7b0 [Burak Yavuz] add println ignores
1bc5554 [Burak Yavuz] add assumes for tests
9778e03 [Burak Yavuz] addressed comments
b42b300 [Burak Yavuz] merged master
ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit
d867756 [Burak Yavuz] add apache header
eff5ba1 [Burak Yavuz] ready for review
8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit
e5b5a06 [Burak Yavuz] added doc
bb751ce [Burak Yavuz] fix null bug
0226768 [Burak Yavuz] fixed issues
8810beb [Burak Yavuz] R packages support

(cherry picked from commit c9a4c36)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in c9a4c36 Aug 5, 2015
@shivaram
Copy link
Contributor

shivaram commented Aug 5, 2015

BTW @brkyvz can you merge the R changes in the https://github.com/databricks/sbt-spark-package as well ?

@brkyvz
Copy link
Contributor Author

brkyvz commented Aug 5, 2015

thanks @shivaram. Will do, I'll add the suggested format to the spark-package command line tool as well.

@brkyvz brkyvz deleted the r-submit branch February 3, 2019 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants