-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18590][SPARKR] build R source package when making distribution #16014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b3e4d21
e579b23
afacbf3
6ef26fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,8 @@ | ||
| ^.*\.Rproj$ | ||
| ^\.Rproj\.user$ | ||
| ^\.lintr$ | ||
| ^cran-comments\.md$ | ||
| ^NEWS\.md$ | ||
| ^README\.Rmd$ | ||
| ^src-native$ | ||
| ^html$ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,28 +1,27 @@ | ||
| Package: SparkR | ||
| Type: Package | ||
| Title: R Frontend for Apache Spark | ||
| Version: 2.1.0 | ||
| Date: 2016-11-06 | ||
| Title: R Frontend for Apache Spark | ||
| Description: The SparkR package provides an R Frontend for Apache Spark. | ||
|
||
| Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), | ||
| email = "[email protected]"), | ||
| person("Xiangrui", "Meng", role = "aut", | ||
| email = "[email protected]"), | ||
| person("Felix", "Cheung", role = "aut", | ||
| email = "[email protected]"), | ||
| person(family = "The Apache Software Foundation", role = c("aut", "cph"))) | ||
| License: Apache License (== 2.0) | ||
| URL: http://www.apache.org/ http://spark.apache.org/ | ||
| BugReports: http://spark.apache.org/contributing.html | ||
| Depends: | ||
| R (>= 3.0), | ||
| methods | ||
| Suggests: | ||
| knitr, | ||
| rmarkdown, | ||
| testthat, | ||
| e1071, | ||
| survival, | ||
| knitr, | ||
| rmarkdown | ||
| Description: The SparkR package provides an R frontend for Apache Spark. | ||
| License: Apache License (== 2.0) | ||
| survival | ||
| Collate: | ||
| 'schema.R' | ||
| 'generics.R' | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,7 +3,7 @@ | |
| importFrom("methods", "setGeneric", "setMethod", "setOldClass") | ||
| importFrom("methods", "is", "new", "signature", "show") | ||
| importFrom("stats", "gaussian", "setNames") | ||
| importFrom("utils", "download.file", "object.size", "packageVersion", "untar") | ||
| importFrom("utils", "download.file", "object.size", "packageVersion", "tail", "untar") | ||
|
||
|
|
||
| # Disable native libraries till we figure out how to package it | ||
| # See SPARKR-7839 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -150,7 +150,7 @@ if [[ "$1" == "package" ]]; then | |
| NAME=$1 | ||
| FLAGS=$2 | ||
| ZINC_PORT=$3 | ||
| BUILD_PIP_PACKAGE=$4 | ||
| BUILD_PACKAGE=$4 | ||
| cp -r spark spark-$SPARK_VERSION-bin-$NAME | ||
|
|
||
| cd spark-$SPARK_VERSION-bin-$NAME | ||
|
|
@@ -172,11 +172,30 @@ if [[ "$1" == "package" ]]; then | |
| MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'` | ||
|
|
||
|
|
||
| if [ -z "$BUILD_PIP_PACKAGE" ]; then | ||
| echo "Creating distribution without PIP package" | ||
| if [ -z "$BUILD_PACKAGE" ]; then | ||
| echo "Creating distribution without PIP/R package" | ||
| ./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz $FLAGS \ | ||
| -DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log | ||
| cd .. | ||
| elif [[ "$BUILD_PACKAGE" == "withr" ]]; then | ||
| echo "Creating distribution with R package" | ||
| ./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz --r $FLAGS \ | ||
| -DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log | ||
| cd .. | ||
|
|
||
| echo "Copying and signing R source package" | ||
| R_DIST_NAME=SparkR_$SPARK_VERSION.tar.gz | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to clarify this is the tgz that we will upload to CRAN right ? |
||
| cp spark-$SPARK_VERSION-bin-$NAME/R/$R_DIST_NAME . | ||
|
|
||
| echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour \ | ||
| --output $R_DIST_NAME.asc \ | ||
| --detach-sig $R_DIST_NAME | ||
| echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \ | ||
| MD5 $R_DIST_NAME > \ | ||
| $R_DIST_NAME.md5 | ||
| echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \ | ||
| SHA512 $R_DIST_NAME > \ | ||
| $R_DIST_NAME.sha | ||
| else | ||
| echo "Creating distribution with PIP package" | ||
| ./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz --pip $FLAGS \ | ||
|
|
@@ -222,7 +241,7 @@ if [[ "$1" == "package" ]]; then | |
| make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" & | ||
| make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" "withpip" & | ||
| make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 -Pyarn -Pmesos" "3037" & | ||
| make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" & | ||
| make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" "withr" & | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any specific reason to use the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was mostly to use a "separate profile" from "withpip" Running this Also the Spark jar, while loaded and called into during that process, will not be packaged into the resulting R source package, so I thought it didn't matter which build profile we would run this in.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it sounds fine. I was waiting to see if @rxin (or @JoshRosen ?) would take a look because I have not reviewed changes to this file before. Let me take another closer look and then we can merge it to branch-2.1 -- We'll see what happens to the RC process after that |
||
| wait | ||
| rm -rf spark-$SPARK_VERSION-bin-*/ | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -34,14 +34,15 @@ DISTDIR="$SPARK_HOME/dist" | |||
|
|
||||
| MAKE_TGZ=false | ||||
| MAKE_PIP=false | ||||
| MAKE_R=false | ||||
| NAME=none | ||||
| MVN="$SPARK_HOME/build/mvn" | ||||
|
|
||||
| function exit_with_usage { | ||||
| echo "make-distribution.sh - tool for making binary distributions of Spark" | ||||
| echo "" | ||||
| echo "usage:" | ||||
| cl_options="[--name] [--tgz] [--pip] [--mvn <mvn-command>]" | ||||
| cl_options="[--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>]" | ||||
| echo "make-distribution.sh $cl_options <maven build options>" | ||||
| echo "See Spark's \"Building Spark\" doc for correct Maven options." | ||||
| echo "" | ||||
|
|
@@ -71,6 +72,9 @@ while (( "$#" )); do | |||
| --pip) | ||||
| MAKE_PIP=true | ||||
| ;; | ||||
| --r) | ||||
| MAKE_R=true | ||||
|
||||
| make_binary_release "hadoop2.3" "-Phadoop-2.3 $FLAGS" "3033" & |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a little awkward that we use check-cran.sh to build, install the package. I think it points to the fact that we can refactor the scripts more. But that can be done in a future PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I think it is somewhat debatable whether we should run R CMD check in make-distribution.sh - but I feel there are gaps with what we check in Jenkins that it is worthwhile to repeat that here.
For everything else it's just convenient to call R from here. We could factor out the R environment stuff and have a separate install.sh (possibly replacing install-dev.sh since this does more with the source package? What do you think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah longer term that sounds like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this already done by
install-dev.sh? I'm a bit confused as to why we need to call install again.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as mentioned above:
Apparently the output is different with
R CMD installversus whatdevtoolsis doing. I'll dig through the content and list them here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did the diff. Here are the new files in the output of
make-distributionin master branch with this change vs. 2.0.0Files Added:
Files removed: A bunch of HTML files starting from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like we lost the knitted HTML files in the SparkR package with this change. FWIW this may not be bad as the html files are not usually used locally and only used for the website and I think the docs creation part of the build should pick that up. (Verifying that now)