Skip to content

Commit cc7804f

Browse files
committed
Merge branch 'master' into minor-3
2 parents 762f58b + 87706eb commit cc7804f

File tree

211 files changed

+1809
-618
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

211 files changed

+1809
-618
lines changed

LICENSE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
296296
(MIT License) blockUI (http://jquery.malsup.com/block/)
297297
(MIT License) RowsGroup (http://datatables.net/license/mit)
298298
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
299+
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)

NOTICE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Apache Spark
2-
Copyright 2014 The Apache Software Foundation.
2+
Copyright 2014 and onwards The Apache Software Foundation.
33

44
This product includes software developed at
55
The Apache Software Foundation (http://www.apache.org/).

R/DOCUMENTATION.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# SparkR Documentation
22

3-
SparkR documentation is generated using in-source comments annotated using using
4-
`roxygen2`. After making changes to the documentation, to generate man pages,
3+
SparkR documentation is generated by using in-source comments and annotated by using
4+
[`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/index.html). After making changes to the documentation and generating man pages,
55
you can run the following from an R console in the SparkR home directory
6-
7-
library(devtools)
8-
devtools::document(pkg="./pkg", roclets=c("rd"))
9-
6+
```R
7+
library(devtools)
8+
devtools::document(pkg="./pkg", roclets=c("rd"))
9+
```
1010
You can verify if your changes are good by running
1111

1212
R CMD check pkg/

R/README.md

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R
77
Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
88
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
99
Example:
10-
11-
```
10+
```bash
1211
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
1312
export R_HOME=/home/username/R
1413
./install-dev.sh
@@ -20,8 +19,8 @@ export R_HOME=/home/username/R
2019

2120
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
2221

23-
```
24-
build/mvn -DskipTests -Psparkr package
22+
```bash
23+
build/mvn -DskipTests -Psparkr package
2524
```
2625

2726
#### Running sparkR
@@ -40,9 +39,8 @@ To set other options like driver memory, executor memory etc. you can pass in th
4039

4140
#### Using SparkR from RStudio
4241

43-
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
44-
45-
```
42+
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
43+
```R
4644
# Set this to where Spark is installed
4745
Sys.setenv(SPARK_HOME="/Users/username/spark")
4846
# This line loads SparkR from the installed directory
@@ -59,25 +57,25 @@ Once you have made your changes, please include unit tests for them and run exis
5957

6058
#### Generating documentation
6159

62-
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
60+
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`
6361

6462
### Examples, Unit tests
6563

6664
SparkR comes with several sample programs in the `examples/src/main/r` directory.
6765
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:
68-
69-
./bin/spark-submit examples/src/main/r/dataframe.R
70-
66+
```bash
67+
./bin/spark-submit examples/src/main/r/dataframe.R
68+
```
7169
You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
72-
73-
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
74-
./R/run-tests.sh
70+
```bash
71+
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
72+
./R/run-tests.sh
73+
```
7574

7675
### Running on YARN
7776

7877
The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
79-
80-
```
78+
```bash
8179
export YARN_CONF_DIR=/etc/hadoop/conf
8280
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
8381
```

R/pkg/R/column.R

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ operators <- list(
5757
"^" = "pow"
5858
)
5959
column_functions1 <- c("asc", "desc", "isNaN", "isNull", "isNotNull")
60-
column_functions2 <- c("like", "rlike", "startsWith", "endsWith", "getField", "getItem", "contains")
60+
column_functions2 <- c("like", "rlike", "getField", "getItem", "contains")
6161

6262
createOperator <- function(op) {
6363
setMethod(op,
@@ -151,6 +151,40 @@ setMethod("substr", signature(x = "Column"),
151151
column(jc)
152152
})
153153

154+
#' startsWith
155+
#'
156+
#' Determines if entries of x start with string (entries of) prefix respectively,
157+
#' where strings are recycled to common lengths.
158+
#'
159+
#' @rdname startsWith
160+
#' @name startsWith
161+
#' @family colum_func
162+
#'
163+
#' @param x vector of character string whose “starts” are considered
164+
#' @param prefix character vector (often of length one)
165+
setMethod("startsWith", signature(x = "Column"),
166+
function(x, prefix) {
167+
jc <- callJMethod(x@jc, "startsWith", as.vector(prefix))
168+
column(jc)
169+
})
170+
171+
#' endsWith
172+
#'
173+
#' Determines if entries of x end with string (entries of) suffix respectively,
174+
#' where strings are recycled to common lengths.
175+
#'
176+
#' @rdname endsWith
177+
#' @name endsWith
178+
#' @family colum_func
179+
#'
180+
#' @param x vector of character string whose “ends” are considered
181+
#' @param suffix character vector (often of length one)
182+
setMethod("endsWith", signature(x = "Column"),
183+
function(x, suffix) {
184+
jc <- callJMethod(x@jc, "endsWith", as.vector(suffix))
185+
column(jc)
186+
})
187+
154188
#' between
155189
#'
156190
#' Test if the column is between the lower bound and upper bound, inclusive.

R/pkg/R/generics.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -695,7 +695,7 @@ setGeneric("desc", function(x) { standardGeneric("desc") })
695695

696696
#' @rdname column
697697
#' @export
698-
setGeneric("endsWith", function(x, ...) { standardGeneric("endsWith") })
698+
setGeneric("endsWith", function(x, suffix) { standardGeneric("endsWith") })
699699

700700
#' @rdname column
701701
#' @export
@@ -727,7 +727,7 @@ setGeneric("rlike", function(x, ...) { standardGeneric("rlike") })
727727

728728
#' @rdname column
729729
#' @export
730-
setGeneric("startsWith", function(x, ...) { standardGeneric("startsWith") })
730+
setGeneric("startsWith", function(x, prefix) { standardGeneric("startsWith") })
731731

732732
#' @rdname column
733733
#' @export

R/pkg/R/utils.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -489,7 +489,7 @@ processClosure <- function(node, oldEnv, defVars, checkedFuncs, newEnv) {
489489
# checkedFunc An environment of function objects examined during cleanClosure. It can be
490490
# considered as a "name"-to-"list of functions" mapping.
491491
# return value
492-
# a new version of func that has an correct environment (closure).
492+
# a new version of func that has a correct environment (closure).
493493
cleanClosure <- function(func, checkedFuncs = new.env()) {
494494
if (is.function(func)) {
495495
newEnv <- new.env(parent = .GlobalEnv)

R/pkg/inst/tests/testthat/test_sparkSQL.R

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1136,7 +1136,14 @@ test_that("string operators", {
11361136
df <- read.json(jsonPath)
11371137
expect_equal(count(where(df, like(df$name, "A%"))), 1)
11381138
expect_equal(count(where(df, startsWith(df$name, "A"))), 1)
1139+
expect_true(first(select(df, startsWith(df$name, "M")))[[1]])
1140+
expect_false(first(select(df, startsWith(df$name, "m")))[[1]])
1141+
expect_true(first(select(df, endsWith(df$name, "el")))[[1]])
11391142
expect_equal(first(select(df, substr(df$name, 1, 2)))[[1]], "Mi")
1143+
if (as.numeric(R.version$major) >= 3 && as.numeric(R.version$minor) >= 3) {
1144+
expect_true(startsWith("Hello World", "Hello"))
1145+
expect_false(endsWith("Hello World", "a"))
1146+
}
11401147
expect_equal(collect(select(df, cast(df$age, "string")))[[2, 1]], "30")
11411148
expect_equal(collect(select(df, concat(df$name, lit(":"), df$age)))[[2, 1]], "Andy:30")
11421149
expect_equal(collect(select(df, concat_ws(":", df$name)))[[2, 1]], "Andy")

build/spark-build-info

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
#!/usr/bin/env bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
# This script generates the build info for spark and places it into the spark-version-info.properties file.
21+
# Arguments:
22+
# build_tgt_directory - The target directory where properties file would be created. [./core/target/extra-resources]
23+
# spark_version - The current version of spark
24+
25+
RESOURCE_DIR="$1"
26+
mkdir -p "$RESOURCE_DIR"
27+
SPARK_BUILD_INFO="${RESOURCE_DIR}"/spark-version-info.properties
28+
29+
echo_build_properties() {
30+
echo version=$1
31+
echo user=$USER
32+
echo revision=$(git rev-parse HEAD)
33+
echo branch=$(git rev-parse --abbrev-ref HEAD)
34+
echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ)
35+
echo url=$(git config --get remote.origin.url)
36+
}
37+
38+
echo_build_properties $2 > "$SPARK_BUILD_INFO"

common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,6 @@ public long size() {
5151
* Creates a memory block pointing to the memory used by the long array.
5252
*/
5353
public static MemoryBlock fromLongArray(final long[] array) {
54-
return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8);
54+
return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L);
5555
}
5656
}

0 commit comments

Comments
 (0)