-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8313] R Spark packages support #7139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #36218 has finished for PR 7139 at commit
|
|
@brkyvz Thanks for sending out this PR. Its looking good. I had a couple of high level points
|
|
Test build #36294 has finished for PR 7139 at commit
|
|
@JoshRosen @shaneknapp -- So in this case the SparkR unit tests failed but the AmpLabJenkins message says Test Passed ? Do you know what could cause this ? |
|
@shivaram Thanks for the feedback. I'll add more error messages regarding structuring. Regarding (1), |
|
@shivaram -- i'm looking in to the bash monstrosities and seeing if it's somehow missing an exit code... to this end i'm setting up some test builds on our staging server. i'll report back once i figure it out. |
|
jenkins, test this please |
|
Test build #36447 has finished for PR 7139 at commit
|
|
jenkins, test this please |
|
Test build #36588 has finished for PR 7139 at commit
|
|
Test build #36604 has finished for PR 7139 at commit
|
|
@shivaram @cafreeman I believe this is ready. I added unit, and end to end tests. |
|
Test build #36898 has finished for PR 7139 at commit
|
|
Test build #36910 has finished for PR 7139 at commit
|
|
retest this please |
|
The test is failing, because Jenkins is not allowing me to install an R package to |
|
Test build #36945 has finished for PR 7139 at commit
|
|
So one thing is that we don't need to write a And regarding the R unit tests, I don't think the SparkR tests get run by maven by |
|
@shivaram The weird thing about R CMD INSTALL in this case is that if you don't pass If I remember right, there's something that ensures that SparkR is the last library that gets loaded, correct? If that's the case, then R CMD INSTALL would probably default to the standard package install location for R since that's all that would exist on |
|
@cafreeman That function only gets executed when we put FWIW the SparkR being the last package loaded is done at https://github.com/apache/spark/blob/master/R/pkg/inst/profile/shell.R#L25 |
|
Thanks @brkyvz for the update. I did one pass over the code and mostly had minor comments. I think the idea of building a zip file at the end just before we launch is pretty good. BTW there is some code to create the zip in Windows at Line 29 in 3b0e444
BTW I'll also try to test this on Windows and see how far I get. My guess is that if we can make the name of the R binary configurable we should be able to get this working on windows, but we can do that in a follow up PR too |
|
Test build #39617 has finished for PR 7139 at commit
|
|
@brkyvz The jenkins failure message seems to be Any ideas why we can't use that directory ? Are the directories read only by default or something like that ? |
|
I guess the folder is read-only. Which makes sense, because it is part of the test infrastructure (what if someone added a test that ran |
|
If |
|
Hi @shivaram. Addressed your comments. Regarding the failing test... I disabled it for now. Maybe a better solution will be to include a test jar similar to what you already have for |
|
Test build #39666 has finished for PR 7139 at commit
|
|
@brkyvz I think I figured out the problem with Jenkins -- Since Jenkins uses SBT it doesn't build the SparkR package at the beginning along with the rest of the artifacts and only builds SparkR while running SparkR unit tests. So what ends up happening is that R/lib doesn't exist when the core unit tests are running. I think the right thing to do here is to get Jenkins to build SparkR along with other components. @JoshRosen and @yu-iskw were discussing a similar issue in #7883. For now can we just comment out the test and open a JIRA to un-comment it once we fix this ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking -- do we need this import ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the Process Builder below. It takes a Java Collection
|
@shivaram removed the unused imports. Created a JIRA (SPARK-9603) for re-enabling the test, and added it as a TODO to the test |
|
Thanks @brkyvz -- Changes LGTM. Can you check if @andrewor14 wants to take another look at the SparkSubmit changes ? |
|
jenkins retest this please |
|
Test build #39729 has finished for PR 7139 at commit
|
|
Jenkins, retest this please |
|
Test build #212 has finished for PR 7139 at commit
|
|
Test build #39738 has finished for PR 7139 at commit
|
|
Test build #39747 has finished for PR 7139 at commit
|
|
I took a look at the spark submit changes and they LGTM. |
|
@shivaram maybe you can merge? I looked at the spark submit stuff but overall it was a very small part of the changes. |
|
Yep - I'm out now, but will get back to my computer and merge |
shivaram cafreeman Could you please help me in testing this out? Exposing and running `rPackageBuilder` from inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit. For testing, you may use the R branch with [sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can call spPackage, and then pass the jar using `--jars`. Author: Burak Yavuz <[email protected]> Closes #7139 from brkyvz/r-submit and squashes the following commits: 0de384f [Burak Yavuz] remove unused imports 2 d253708 [Burak Yavuz] removed unused imports 6603d0d [Burak Yavuz] addressed comments 4258ffe [Burak Yavuz] merged master ddfcc06 [Burak Yavuz] added zipping test 3a1be7d [Burak Yavuz] don't zip 77995df [Burak Yavuz] fix URI ac45527 [Burak Yavuz] added zipping of all libs e6bf7b0 [Burak Yavuz] add println ignores 1bc5554 [Burak Yavuz] add assumes for tests 9778e03 [Burak Yavuz] addressed comments b42b300 [Burak Yavuz] merged master ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit d867756 [Burak Yavuz] add apache header eff5ba1 [Burak Yavuz] ready for review 8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit e5b5a06 [Burak Yavuz] added doc bb751ce [Burak Yavuz] fix null bug 0226768 [Burak Yavuz] fixed issues 8810beb [Burak Yavuz] R packages support (cherry picked from commit c9a4c36) Signed-off-by: Shivaram Venkataraman <[email protected]>
|
BTW @brkyvz can you merge the R changes in the https://github.com/databricks/sbt-spark-package as well ? |
|
thanks @shivaram. Will do, I'll add the suggested format to the spark-package command line tool as well. |
@shivaram @cafreeman Could you please help me in testing this out? Exposing and running
rPackageBuilderfrom inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit.For testing, you may use the R branch with sbt-spark-package. You can call spPackage, and then pass the jar using
--jars.