Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ To release SparkR as a package to CRAN, we would use the `devtools` package. Ple

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `check-cran.sh` is running `R CMD check`, it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release.
Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Expand Down
19 changes: 18 additions & 1 deletion R/check-cran.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Install the package (this is required for code in vignettes to run when building it later)
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh

Expand Down Expand Up @@ -82,4 +83,20 @@ else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi

# Install source package to get it to generate vignettes rds files, etc.
if [ -n "$CLEAN_INSTALL" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already done by install-dev.sh ? I'm a bit confused as to why we need to call install again.

Copy link
Member Author

@felixcheung felixcheung Nov 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as mentioned above:

include in the official Spark binary distributions SparkR installed from this source package instead (which would have help/vignettes rds needed for those to work when the SparkR package is loaded in R, whereas earlier approach with devtools does not)

R CMD Install on the source package (this is the only way to generate doc/vignettes rds files correctly, not in step # 1)
(the output of this step is what we package into Spark dist and sparkr.zip)

Apparently the output is different with R CMD install versus what devtools is doing. I'll dig through the content and list them here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did the diff. Here are the new files in the output of make-distribution in master branch with this change vs. 2.0.0
Files Added:

- R/lib/SparkR/Meta/vignette.rds
- /R/lib/SparkR/doc/
- /R/lib/SparkR/doc/index.html
- /R/lib/SparkR/doc/sparkr-vignettes.R
- /R/lib/SparkR/doc/sparkr-vignettes.Rmd
- /R/lib/SparkR/doc/sparkr-vignettes.html

Files removed: A bunch of HTML files starting from

/R/lib/SparkR/html/AFTSurvivalRegressionModel-class.html
...
/R/lib/SparkR/html/year.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like we lost the knitted HTML files in the SparkR package with this change. FWIW this may not be bad as the html files are not usually used locally and only used for the website and I think the docs creation part of the build should pick that up. (Verifying that now)

then
echo "Removing lib path and installing from source package"
LIB_DIR="$FWDIR/lib"
rm -rf $LIB_DIR
mkdir -p $LIB_DIR
"$R_SCRIPT_PATH/"R CMD INSTALL SparkR_"$VERSION".tar.gz --library=$LIB_DIR

# Zip the SparkR package so that it can be distributed to worker nodes on YARN
pushd $LIB_DIR > /dev/null
jar cfM "$LIB_DIR/sparkr.zip" SparkR
popd > /dev/null
fi

popd > /dev/null
2 changes: 1 addition & 1 deletion R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Generate Rd files if devtools is installed
"$R_SCRIPT_PATH/"Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'
Expand Down
3 changes: 3 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
^cran-comments\.md$
^NEWS\.md$
^README\.Rmd$
^src-native$
^html$
13 changes: 6 additions & 7 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,28 +1,27 @@
Package: SparkR
Type: Package
Title: R Frontend for Apache Spark
Version: 2.1.0
Date: 2016-11-06
Title: R Frontend for Apache Spark
Description: The SparkR package provides an R Frontend for Apache Spark.
Copy link
Member Author

@felixcheung felixcheung Nov 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is removed - I tried but haven't found a way to update this automatically, (I guess this could be in the release-tag script though)
But more importantly, seems like many (most?) packages do not have this in their DESCRIPTION file.

In any case, release date is stamped when releasing to CRAN.

Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "[email protected]"),
person("Xiangrui", "Meng", role = "aut",
email = "[email protected]"),
person("Felix", "Cheung", role = "aut",
email = "[email protected]"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
License: Apache License (== 2.0)
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: http://spark.apache.org/contributing.html
Depends:
R (>= 3.0),
methods
Suggests:
knitr,
rmarkdown,
testthat,
e1071,
survival,
knitr,
rmarkdown
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
survival
Collate:
'schema.R'
'generics.R'
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
importFrom("methods", "setGeneric", "setMethod", "setOldClass")
importFrom("methods", "is", "new", "signature", "show")
importFrom("stats", "gaussian", "setNames")
importFrom("utils", "download.file", "object.size", "packageVersion", "untar")
importFrom("utils", "download.file", "object.size", "packageVersion", "tail", "untar")
Copy link
Member Author

@felixcheung felixcheung Nov 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was regressed from a recent commit. check-cran.sh actually is flagging this by appending to an existing NOTE but we only check for # of NOTE (which is still 1), and so this went in undetected.


# Disable native libraries till we figure out how to package it
# See SPARKR-7839
Expand Down
27 changes: 23 additions & 4 deletions dev/create-release/release-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ if [[ "$1" == "package" ]]; then
NAME=$1
FLAGS=$2
ZINC_PORT=$3
BUILD_PIP_PACKAGE=$4
BUILD_PACKAGE=$4
cp -r spark spark-$SPARK_VERSION-bin-$NAME

cd spark-$SPARK_VERSION-bin-$NAME
Expand All @@ -172,11 +172,30 @@ if [[ "$1" == "package" ]]; then
MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'`


if [ -z "$BUILD_PIP_PACKAGE" ]; then
echo "Creating distribution without PIP package"
if [ -z "$BUILD_PACKAGE" ]; then
echo "Creating distribution without PIP/R package"
./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz $FLAGS \
-DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log
cd ..
elif [[ "$BUILD_PACKAGE" == "withr" ]]; then
echo "Creating distribution with R package"
./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz --r $FLAGS \
-DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log
cd ..

echo "Copying and signing R source package"
R_DIST_NAME=SparkR_$SPARK_VERSION.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify this is the tgz that we will upload to CRAN right ?

cp spark-$SPARK_VERSION-bin-$NAME/R/$R_DIST_NAME .

echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour \
--output $R_DIST_NAME.asc \
--detach-sig $R_DIST_NAME
echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \
MD5 $R_DIST_NAME > \
$R_DIST_NAME.md5
echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \
SHA512 $R_DIST_NAME > \
$R_DIST_NAME.sha
else
echo "Creating distribution with PIP package"
./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz --pip $FLAGS \
Expand Down Expand Up @@ -222,7 +241,7 @@ if [[ "$1" == "package" ]]; then
make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" &
make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" "withpip" &
make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 -Pyarn -Pmesos" "3037" &
make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" &
make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" "withr" &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason to use the without-hadoop build for the R package ? Just wondering if this will affect the users in any fashion

Copy link
Member Author

@felixcheung felixcheung Dec 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mostly to use a "separate profile" from "withpip"

Running this R CMD build would run some Spark code (mainly in vignettes since we turn off tests in R CMD check), but nothing that depends on the file system etc.

Also the Spark jar, while loaded and called into during that process, will not be packaged into the resulting R source package, so I thought it didn't matter which build profile we would run this in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram what do you think about this

I'd like to merge this to branch-2.1 to see if we could make it to 2.1.0 if at all possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it sounds fine. I was waiting to see if @rxin (or @JoshRosen ?) would take a look because I have not reviewed changes to this file before. Let me take another closer look and then we can merge it to branch-2.1 -- We'll see what happens to the RC process after that

wait
rm -rf spark-$SPARK_VERSION-bin-*/

Expand Down
25 changes: 21 additions & 4 deletions dev/make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,15 @@ DISTDIR="$SPARK_HOME/dist"

MAKE_TGZ=false
MAKE_PIP=false
MAKE_R=false
NAME=none
MVN="$SPARK_HOME/build/mvn"

function exit_with_usage {
echo "make-distribution.sh - tool for making binary distributions of Spark"
echo ""
echo "usage:"
cl_options="[--name] [--tgz] [--pip] [--mvn <mvn-command>]"
cl_options="[--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>]"
echo "make-distribution.sh $cl_options <maven build options>"
echo "See Spark's \"Building Spark\" doc for correct Maven options."
echo ""
Expand Down Expand Up @@ -71,6 +72,9 @@ while (( "$#" )); do
--pip)
MAKE_PIP=true
;;
--r)
MAKE_R=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW if you want this to get picked up by the official release building procedure we also need to edit release-build.sh [1]. Can you coordinate this with @rxin ?

[1]

make_binary_release "hadoop2.3" "-Phadoop-2.3 $FLAGS" "3033" &

;;
--mvn)
MVN="$2"
shift
Expand Down Expand Up @@ -208,11 +212,24 @@ cp -r "$SPARK_HOME/data" "$DISTDIR"
# Make pip package
if [ "$MAKE_PIP" == "true" ]; then
echo "Building python distribution package"
cd $SPARK_HOME/python
pushd "$SPARK_HOME/python" > /dev/null
python setup.py sdist
cd ..
popd > /dev/null
else
echo "Skipping building python distribution package"
fi

# Make R package - this is used for both CRAN release and packing R layout into distribution
if [ "$MAKE_R" == "true" ]; then
echo "Building R source package"
pushd "$SPARK_HOME/R" > /dev/null
# Build source package and run full checks
# Install source package to get it to generate vignettes, etc.
# Do not source the check-cran.sh - it should be run from where it is for it to set SPARK_HOME
NO_TESTS=1 CLEAN_INSTALL=1 "$SPARK_HOME/"R/check-cran.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a little awkward that we use check-cran.sh to build, install the package. I think it points to the fact that we can refactor the scripts more. But that can be done in a future PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think it is somewhat debatable whether we should run R CMD check in make-distribution.sh - but I feel there are gaps with what we check in Jenkins that it is worthwhile to repeat that here.

For everything else it's just convenient to call R from here. We could factor out the R environment stuff and have a separate install.sh (possibly replacing install-dev.sh since this does more with the source package? What do you think?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah longer term that sounds like a good idea.

popd > /dev/null
else
echo "Skipping creating pip installable PySpark"
echo "Skipping building R source package"
fi

# Copy other things
Expand Down