Skip to content

Commit a439899

Browse files
author
Nick Pentreath
committed
Merge branch 'master' into SPARK-14489-als-nan
2 parents 9ff2a0a + 6585479 commit a439899

File tree

2,092 files changed

+94271
-34224
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,092 files changed

+94271
-34224
lines changed

.github/PULL_REQUEST_TEMPLATE

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,9 @@
22

33
(Please fill in changes proposed in this fix)
44

5-
65
## How was this patch tested?
76

87
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
9-
10-
118
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
129

10+
Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,14 @@
1717
.idea/
1818
.idea_modules/
1919
.project
20+
.pydevproject
2021
.scala_dependencies
2122
.settings
2223
/lib/
2324
R-unit-tests.log
2425
R/unit-tests.out
26+
R/cran-check.out
27+
R/pkg/vignettes/sparkr-vignettes.html
2528
build/*.jar
2629
build/apache-maven*
2730
build/scala*
@@ -54,6 +57,8 @@ project/plugins/project/build.properties
5457
project/plugins/src_managed/
5558
project/plugins/target/
5659
python/lib/pyspark.zip
60+
python/deps
61+
python/pyspark/python
5762
reports/
5863
scalastyle-on-compile.generated.xml
5964
scalastyle-output.xml
@@ -78,3 +83,7 @@ spark-warehouse/
7883
.RData
7984
.RHistory
8085
.Rhistory
86+
*.Rproj
87+
*.Rproj.*
88+
89+
.Rproj.user

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ notifications:
4444
# 5. Run maven install before running lint-java.
4545
install:
4646
- export MAVEN_SKIP_RC=1
47-
- build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
47+
- build/mvn -T 4 -q -DskipTests -Pmesos -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
4848

4949
# 6. Run lint-java.
5050
script:

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ It lists steps that are required before creating a PR. In particular, consider:
66

77
- Is the change important and ready enough to ask the community to spend time reviewing?
88
- Have you searched for existing, related JIRAs and pull requests?
9-
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
9+
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
1010
- Is the change being proposed clearly explained and motivated?
1111

1212
When you contribute code, you affirm that the contribution is your original work and that you

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
263263
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
264264
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
265265
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
266-
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.1 - http://py4j.sourceforge.net/)
266+
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.4 - http://py4j.sourceforge.net/)
267267
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
268268
(BSD licence) sbt and sbt-launch-lib.bash
269269
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)

NOTICE

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
421421
This product includes/uses ASM (http://asm.ow2.org/),
422422
Copyright (c) 2000-2007 INRIA, France Telecom.
423423

424-
This product includes/uses org.json (http://www.json.org/java/index.html),
425-
Copyright (c) 2002 JSON.org
426-
427424
This product includes/uses JLine (http://jline.sourceforge.net/),
428425
Copyright (c) 2002-2006, Marc Prud'hommeaux <[email protected]>.
429426

R/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
lib
55
pkg/man
66
pkg/html
7+
SparkR.Rcheck/
8+
SparkR_*.tar.gz

R/CRAN_RELEASE.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# SparkR CRAN Release
2+
3+
To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
4+
`[email protected]` community and R package maintainer on this.
5+
6+
### Release
7+
8+
First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.
9+
10+
Note that while `check-cran.sh` is running `R CMD check`, it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release.
11+
12+
To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.
13+
14+
Once everything is in place, run in R under the `SPARK_HOME/R` directory:
15+
16+
```R
17+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
18+
```
19+
20+
For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check
21+
22+
### Testing: build package manually
23+
24+
To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.
25+
26+
Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.
27+
28+
#### Build source package
29+
30+
To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:
31+
32+
```R
33+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
34+
```
35+
36+
(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)
37+
38+
Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.
39+
40+
For example, this should be the content of the source package:
41+
42+
```sh
43+
DESCRIPTION R inst tests
44+
NAMESPACE build man vignettes
45+
46+
inst/doc/
47+
sparkr-vignettes.html
48+
sparkr-vignettes.Rmd
49+
sparkr-vignettes.Rman
50+
51+
build/
52+
vignette.rds
53+
54+
man/
55+
*.Rd files...
56+
57+
vignettes/
58+
sparkr-vignettes.Rmd
59+
```
60+
61+
#### Test source package
62+
63+
To install, run this:
64+
65+
```sh
66+
R CMD INSTALL SparkR_2.1.0.tar.gz
67+
```
68+
69+
With "2.1.0" replaced with the version of SparkR.
70+
71+
This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:
72+
73+
```R
74+
library(SparkR)
75+
vignette("sparkr-vignettes", package="SparkR")
76+
```
77+
78+
#### Build binary package
79+
80+
To build binary package locally, run in R under the `SPARK_HOME/R` directory:
81+
82+
```R
83+
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
84+
```
85+
86+
For example, this should be the content of the binary package:
87+
88+
```sh
89+
DESCRIPTION Meta R html tests
90+
INDEX NAMESPACE help profile worker
91+
```

R/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R
66

77
Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
88
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
9-
Example:
9+
Example:
1010
```bash
1111
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
1212
export R_HOME=/home/username/R
@@ -46,19 +46,19 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
4646
# This line loads SparkR from the installed directory
4747
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
4848
library(SparkR)
49-
sc <- sparkR.init(master="local")
49+
sparkR.session()
5050
```
5151

5252
#### Making changes to SparkR
5353

5454
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
5555
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
5656
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.
57-
57+
5858
#### Generating documentation
5959

6060
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`
61-
61+
6262
### Examples, Unit tests
6363

6464
SparkR comes with several sample programs in the `examples/src/main/r` directory.

R/WINDOWS.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,23 @@ To build SparkR on Windows, the following steps are required
44

55
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
66
include Rtools and R in `PATH`.
7+
78
2. Install
89
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
910
`JAVA_HOME` in the system environment variables.
11+
1012
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
1113
directory in Maven in `PATH`.
14+
1215
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
13-
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`
16+
17+
5. Open a command shell (`cmd`) in the Spark directory and build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
18+
19+
```bash
20+
mvn.cmd -DskipTests -Psparkr package
21+
```
22+
23+
`.\build\mvn` is a shell script so `mvn.cmd` should be used directly on Windows.
1424

1525
## Unit tests
1626

0 commit comments

Comments
 (0)