Skip to content

Commit 2d6d2f2

Browse files
committed
resolve conflicts
2 parents bed0310 + 8743898 commit 2d6d2f2

File tree

1,889 files changed

+86992
-27512
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,889 files changed

+86992
-27512
lines changed

.github/PULL_REQUEST_TEMPLATE

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,9 @@
22

33
(Please fill in changes proposed in this fix)
44

5-
65
## How was this patch tested?
76

87
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
9-
10-
118
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
129

10+
Please review http://spark.apache.org/contributing.html before opening a pull request.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
R-unit-tests.log
2525
R/unit-tests.out
2626
R/cran-check.out
27+
R/pkg/vignettes/sparkr-vignettes.html
2728
build/*.jar
2829
build/apache-maven*
2930
build/scala*
@@ -56,6 +57,8 @@ project/plugins/project/build.properties
5657
project/plugins/src_managed/
5758
project/plugins/target/
5859
python/lib/pyspark.zip
60+
python/deps
61+
python/pyspark/python
5962
reports/
6063
scalastyle-on-compile.generated.xml
6164
scalastyle-output.xml

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
## Contributing to Spark
22

33
*Before opening a pull request*, review the
4-
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
4+
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
55
It lists steps that are required before creating a PR. In particular, consider:
66

77
- Is the change important and ready enough to ask the community to spend time reviewing?
88
- Have you searched for existing, related JIRAs and pull requests?
9-
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
9+
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
1010
- Is the change being proposed clearly explained and motivated?
1111

1212
When you contribute code, you affirm that the contribution is your original work and that you

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
263263
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
264264
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
265265
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
266-
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.3 - http://py4j.sourceforge.net/)
266+
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.4 - http://py4j.sourceforge.net/)
267267
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
268268
(BSD licence) sbt and sbt-launch-lib.bash
269269
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)

NOTICE

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
421421
This product includes/uses ASM (http://asm.ow2.org/),
422422
Copyright (c) 2000-2007 INRIA, France Telecom.
423423

424-
This product includes/uses org.json (http://www.json.org/java/index.html),
425-
Copyright (c) 2002 JSON.org
426-
427424
This product includes/uses JLine (http://jline.sourceforge.net/),
428425
Copyright (c) 2002-2006, Marc Prud'hommeaux <[email protected]>.
429426

R/CRAN_RELEASE.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# SparkR CRAN Release
2+
3+
To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
4+
`[email protected]` community and R package maintainer on this.
5+
6+
### Release
7+
8+
First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.
9+
10+
Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).
11+
12+
To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.
13+
14+
Once everything is in place, run in R under the `SPARK_HOME/R` directory:
15+
16+
```R
17+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
18+
```
19+
20+
For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check
21+
22+
### Testing: build package manually
23+
24+
To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.
25+
26+
Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.
27+
28+
#### Build source package
29+
30+
To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:
31+
32+
```R
33+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
34+
```
35+
36+
(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)
37+
38+
Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.
39+
40+
For example, this should be the content of the source package:
41+
42+
```sh
43+
DESCRIPTION R inst tests
44+
NAMESPACE build man vignettes
45+
46+
inst/doc/
47+
sparkr-vignettes.html
48+
sparkr-vignettes.Rmd
49+
sparkr-vignettes.Rman
50+
51+
build/
52+
vignette.rds
53+
54+
man/
55+
*.Rd files...
56+
57+
vignettes/
58+
sparkr-vignettes.Rmd
59+
```
60+
61+
#### Test source package
62+
63+
To install, run this:
64+
65+
```sh
66+
R CMD INSTALL SparkR_2.1.0.tar.gz
67+
```
68+
69+
With "2.1.0" replaced with the version of SparkR.
70+
71+
This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:
72+
73+
```R
74+
library(SparkR)
75+
vignette("sparkr-vignettes", package="SparkR")
76+
```
77+
78+
#### Build binary package
79+
80+
To build binary package locally, run in R under the `SPARK_HOME/R` directory:
81+
82+
```R
83+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
84+
```
85+
86+
For example, this should be the content of the binary package:
87+
88+
```sh
89+
DESCRIPTION Meta R html tests
90+
INDEX NAMESPACE help profile worker
91+
```

R/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R
66

77
Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
88
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
9-
Example:
9+
Example:
1010
```bash
1111
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
1212
export R_HOME=/home/username/R
@@ -46,19 +46,19 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
4646
# This line loads SparkR from the installed directory
4747
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
4848
library(SparkR)
49-
sc <- sparkR.init(master="local")
49+
sparkR.session()
5050
```
5151

5252
#### Making changes to SparkR
5353

54-
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
54+
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
5555
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
5656
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.
57-
57+
5858
#### Generating documentation
5959

6060
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`
61-
61+
6262
### Examples, Unit tests
6363

6464
SparkR comes with several sample programs in the `examples/src/main/r` directory.

R/WINDOWS.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,23 @@ To build SparkR on Windows, the following steps are required
44

55
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
66
include Rtools and R in `PATH`.
7+
78
2. Install
89
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
910
`JAVA_HOME` in the system environment variables.
11+
1012
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
1113
directory in Maven in `PATH`.
14+
1215
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
13-
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`
16+
17+
5. Open a command shell (`cmd`) in the Spark directory and build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
18+
19+
```bash
20+
mvn.cmd -DskipTests -Psparkr package
21+
```
22+
23+
`.\build\mvn` is a shell script so `mvn.cmd` should be used directly on Windows.
1424

1525
## Unit tests
1626

R/check-cran.sh

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,30 @@ if [ ! -z "$R_HOME" ]
3434
fi
3535
R_SCRIPT_PATH="$(dirname $(which R))"
3636
fi
37-
echo "USING R_HOME = $R_HOME"
37+
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"
3838

39-
# Build the latest docs
39+
# Install the package (this is required for code in vignettes to run when building it later)
40+
# Build the latest docs, but not vignettes, which is built with the package next
4041
$FWDIR/create-docs.sh
4142

42-
# Build a zip file containing the source package
43-
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
43+
# Build source package with vignettes
44+
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
45+
. "${SPARK_HOME}"/bin/load-spark-env.sh
46+
if [ -f "${SPARK_HOME}/RELEASE" ]; then
47+
SPARK_JARS_DIR="${SPARK_HOME}/jars"
48+
else
49+
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
50+
fi
51+
52+
if [ -d "$SPARK_JARS_DIR" ]; then
53+
# Build a zip file containing the source package with vignettes
54+
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
55+
56+
find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
57+
else
58+
echo "Error Spark JARs not found in $SPARK_HOME"
59+
exit 1
60+
fi
4461

4562
# Run check as-cran.
4663
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
@@ -54,11 +71,32 @@ fi
5471

5572
if [ -n "$NO_MANUAL" ]
5673
then
57-
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
74+
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
5875
fi
5976

6077
echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"
6178

62-
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
79+
if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
80+
then
81+
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
82+
else
83+
# This will run tests and/or build vignettes, and require SPARK_HOME
84+
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
85+
fi
86+
87+
# Install source package to get it to generate vignettes rds files, etc.
88+
if [ -n "$CLEAN_INSTALL" ]
89+
then
90+
echo "Removing lib path and installing from source package"
91+
LIB_DIR="$FWDIR/lib"
92+
rm -rf $LIB_DIR
93+
mkdir -p $LIB_DIR
94+
"$R_SCRIPT_PATH/"R CMD INSTALL SparkR_"$VERSION".tar.gz --library=$LIB_DIR
95+
96+
# Zip the SparkR package so that it can be distributed to worker nodes on YARN
97+
pushd $LIB_DIR > /dev/null
98+
jar cfM "$LIB_DIR/sparkr.zip" SparkR
99+
popd > /dev/null
100+
fi
63101

64102
popd > /dev/null

R/create-docs.sh

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,26 @@
1717
# limitations under the License.
1818
#
1919

20-
# Script to create API docs for SparkR
21-
# This requires `devtools` and `knitr` to be installed on the machine.
20+
# Script to create API docs and vignettes for SparkR
21+
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.
2222

23-
# After running this script the html docs can be found in
23+
# After running this script the html docs can be found in
2424
# $SPARK_HOME/R/pkg/html
25+
# The vignettes can be found in
26+
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
2527

2628
set -o pipefail
2729
set -e
2830

2931
# Figure out where the script is
3032
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
33+
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
34+
35+
# Required for setting SPARK_SCALA_VERSION
36+
. "${SPARK_HOME}"/bin/load-spark-env.sh
37+
38+
echo "Using Scala $SPARK_SCALA_VERSION"
39+
3140
pushd $FWDIR
3241

3342
# Install the package (this will also generate the Rd files)

0 commit comments

Comments
 (0)