Skip to content

Commit 1f472ff

Browse files
committed
Merge remote-tracking branch 'upstream/master' into wip-spark-package-settings
2 parents 97ea3a4 + a425a37 commit 1f472ff

File tree

2,379 files changed

+101091
-39786
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,379 files changed

+101091
-39786
lines changed

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,13 @@
1717
.idea/
1818
.idea_modules/
1919
.project
20+
.pydevproject
2021
.scala_dependencies
2122
.settings
2223
/lib/
2324
R-unit-tests.log
2425
R/unit-tests.out
26+
R/cran-check.out
2527
build/*.jar
2628
build/apache-maven*
2729
build/scala*
@@ -77,3 +79,8 @@ spark-warehouse/
7779
# For R session data
7880
.RData
7981
.RHistory
82+
.Rhistory
83+
*.Rproj
84+
*.Rproj.*
85+
86+
.Rproj.user

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ notifications:
4444
# 5. Run maven install before running lint-java.
4545
install:
4646
- export MAVEN_SKIP_RC=1
47-
- build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
47+
- build/mvn -T 4 -q -DskipTests -Pmesos -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
4848

4949
# 6. Run lint-java.
5050
script:

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ It lists steps that are required before creating a PR. In particular, consider:
66

77
- Is the change important and ready enough to ask the community to spend time reviewing?
88
- Have you searched for existing, related JIRAs and pull requests?
9-
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
9+
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
1010
- Is the change being proposed clearly explained and motivated?
1111

1212
When you contribute code, you affirm that the contribution is your original work and that you

LICENSE

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
263263
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
264264
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
265265
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
266-
(The New BSD License) Py4J (net.sf.py4j:py4j:0.9.2 - http://py4j.sourceforge.net/)
266+
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.3 - http://py4j.sourceforge.net/)
267267
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
268268
(BSD licence) sbt and sbt-launch-lib.bash
269269
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
@@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
296296
(MIT License) blockUI (http://jquery.malsup.com/block/)
297297
(MIT License) RowsGroup (http://datatables.net/license/mit)
298298
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
299+
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)

NOTICE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Apache Spark
2-
Copyright 2014 The Apache Software Foundation.
2+
Copyright 2014 and onwards The Apache Software Foundation.
33

44
This product includes software developed at
55
The Apache Software Foundation (http://www.apache.org/).

R/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
lib
55
pkg/man
66
pkg/html
7+
SparkR.Rcheck/
8+
SparkR_*.tar.gz

R/DOCUMENTATION.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# SparkR Documentation
22

3-
SparkR documentation is generated using in-source comments annotated using using
4-
`roxygen2`. After making changes to the documentation, to generate man pages,
3+
SparkR documentation is generated by using in-source comments and annotated by using
4+
[`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/index.html). After making changes to the documentation and generating man pages,
55
you can run the following from an R console in the SparkR home directory
6-
7-
library(devtools)
8-
devtools::document(pkg="./pkg", roclets=c("rd"))
9-
6+
```R
7+
library(devtools)
8+
devtools::document(pkg="./pkg", roclets=c("rd"))
9+
```
1010
You can verify if your changes are good by running
1111

1212
R CMD check pkg/

R/README.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
# R on Spark
22

33
SparkR is an R package that provides a light-weight frontend to use Spark from R.
4+
45
### Installing sparkR
56

67
Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
78
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
89
Example:
9-
```
10+
```bash
1011
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
1112
export R_HOME=/home/username/R
1213
./install-dev.sh
@@ -17,8 +18,9 @@ export R_HOME=/home/username/R
1718
#### Build Spark
1819

1920
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
20-
```
21-
build/mvn -DskipTests -Psparkr package
21+
22+
```bash
23+
build/mvn -DskipTests -Psparkr package
2224
```
2325

2426
#### Running sparkR
@@ -37,8 +39,8 @@ To set other options like driver memory, executor memory etc. you can pass in th
3739

3840
#### Using SparkR from RStudio
3941

40-
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
41-
```
42+
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
43+
```R
4244
# Set this to where Spark is installed
4345
Sys.setenv(SPARK_HOME="/Users/username/spark")
4446
# This line loads SparkR from the installed directory
@@ -55,23 +57,25 @@ Once you have made your changes, please include unit tests for them and run exis
5557

5658
#### Generating documentation
5759

58-
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
60+
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`
5961

6062
### Examples, Unit tests
6163

6264
SparkR comes with several sample programs in the `examples/src/main/r` directory.
6365
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:
64-
65-
./bin/spark-submit examples/src/main/r/dataframe.R
66-
67-
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):
68-
69-
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
70-
./R/run-tests.sh
66+
```bash
67+
./bin/spark-submit examples/src/main/r/dataframe.R
68+
```
69+
You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
70+
```bash
71+
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
72+
./R/run-tests.sh
73+
```
7174

7275
### Running on YARN
76+
7377
The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
74-
```
78+
```bash
7579
export YARN_CONF_DIR=/etc/hadoop/conf
7680
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
7781
```

R/WINDOWS.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,40 @@ To build SparkR on Windows, the following steps are required
44

55
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
66
include Rtools and R in `PATH`.
7+
78
2. Install
89
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
910
`JAVA_HOME` in the system environment variables.
11+
1012
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
1113
directory in Maven in `PATH`.
14+
1215
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
13-
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`
16+
17+
5. Open a command shell (`cmd`) in the Spark directory and build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
18+
19+
```bash
20+
mvn.cmd -DskipTests -Psparkr package
21+
```
22+
23+
`.\build\mvn` is a shell script so `mvn.cmd` should be used directly on Windows.
24+
25+
## Unit tests
26+
27+
To run the SparkR unit tests on Windows, the following steps are required —assuming you are in the Spark root directory and do not have Apache Hadoop installed already:
28+
29+
1. Create a folder to download Hadoop related files for Windows. For example, `cd ..` and `mkdir hadoop`.
30+
31+
2. Download the relevant Hadoop bin package from [steveloughran/winutils](https://github.com/steveloughran/winutils). While these are not official ASF artifacts, they are built from the ASF release git hashes by a Hadoop PMC member on a dedicated Windows VM. For further reading, consult [Windows Problems on the Hadoop wiki](https://wiki.apache.org/hadoop/WindowsProblems).
32+
33+
3. Install the files into `hadoop\bin`; make sure that `winutils.exe` and `hadoop.dll` are present.
34+
35+
4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory.
36+
37+
5. Run unit tests for SparkR by running the command below. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
38+
39+
```
40+
R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"
41+
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.default.name="file:///" R\pkg\tests\run-all.R
42+
```
43+

R/check-cran.sh

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
#!/bin/bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
set -o pipefail
21+
set -e
22+
23+
FWDIR="$(cd `dirname $0`; pwd)"
24+
pushd $FWDIR > /dev/null
25+
26+
if [ ! -z "$R_HOME" ]
27+
then
28+
R_SCRIPT_PATH="$R_HOME/bin"
29+
else
30+
# if system wide R_HOME is not found, then exit
31+
if [ ! `command -v R` ]; then
32+
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
33+
exit 1
34+
fi
35+
R_SCRIPT_PATH="$(dirname $(which R))"
36+
fi
37+
echo "USING R_HOME = $R_HOME"
38+
39+
# Build the latest docs
40+
$FWDIR/create-docs.sh
41+
42+
# Build a zip file containing the source package
43+
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
44+
45+
# Run check as-cran.
46+
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
47+
48+
CRAN_CHECK_OPTIONS="--as-cran"
49+
50+
if [ -n "$NO_TESTS" ]
51+
then
52+
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-tests"
53+
fi
54+
55+
if [ -n "$NO_MANUAL" ]
56+
then
57+
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
58+
fi
59+
60+
echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"
61+
62+
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
63+
64+
popd > /dev/null

0 commit comments

Comments
 (0)