Skip to content

Commit 3d7a611

Browse files
committed
Merge remote-tracking branch 'origin/branch-2.4' into fastdateformat-micros
2 parents 915a755 + 7459353 commit 3d7a611

File tree

643 files changed

+12769
-3740
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

643 files changed

+12769
-3740
lines changed

.github/workflows/branch-2.4.yml

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
name: branch-2.4
2+
3+
on:
4+
push:
5+
branches:
6+
- branch-2.4
7+
pull_request:
8+
branches:
9+
- branch-2.4
10+
11+
jobs:
12+
build:
13+
14+
runs-on: ubuntu-latest
15+
strategy:
16+
matrix:
17+
scala: [ '2.11', '2.12' ]
18+
hadoop: [ 'hadoop-2.6', 'hadoop-2.7' ]
19+
name: Build Spark with Scala ${{ matrix.scala }} / Hadoop ${{ matrix.hadoop }}
20+
21+
steps:
22+
- uses: actions/checkout@master
23+
# We split caches because GitHub Action Cache has a 400MB-size limit.
24+
- uses: actions/cache@v1
25+
with:
26+
path: ~/.m2/repository/com
27+
key: ${{ matrix.scala }}-${{ matrix.hadoop }}-maven-com-${{ hashFiles('**/pom.xml') }}
28+
restore-keys: |
29+
${{ matrix.scala }}-${{ matrix.hadoop }}-maven-com-
30+
- uses: actions/cache@v1
31+
with:
32+
path: ~/.m2/repository/org
33+
key: ${{ matrix.scala }}-${{ matrix.hadoop }}-maven-org-${{ hashFiles('**/pom.xml') }}
34+
restore-keys: |
35+
${{ matrix.scala }}-${{ matrix.hadoop }}-maven-org-
36+
- name: Set up JDK 8
37+
uses: actions/setup-java@v1
38+
with:
39+
java-version: '1.8'
40+
- name: Change to Scala ${{ matrix.scala }}
41+
run: |
42+
dev/change-scala-version.sh ${{ matrix.scala }}
43+
- name: Build with Maven
44+
run: |
45+
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
46+
export MAVEN_CLI_OPTS="--no-transfer-progress"
47+
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Pscala-${{ matrix.scala }} -P${{ matrix.hadoop }} -Phadoop-cloud install
48+
rm -rf ~/.m2/repository/org/apache/spark
49+
50+
51+
lint:
52+
runs-on: ubuntu-latest
53+
name: Linters
54+
steps:
55+
- uses: actions/checkout@master
56+
- uses: actions/setup-java@v1
57+
with:
58+
java-version: '1.8'
59+
- uses: actions/setup-python@v1
60+
with:
61+
python-version: '3.7'
62+
architecture: 'x64'
63+
- name: Scala
64+
run: ./dev/lint-scala
65+
- name: Java
66+
run: ./dev/lint-java
67+
- name: Python
68+
run: |
69+
pip install flake8 sphinx numpy
70+
./dev/lint-python
71+
- name: License
72+
run: ./dev/check-license
73+
- name: Dependencies
74+
run: ./dev/test-dependencies.sh

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ project/plugins/project/build.properties
6161
project/plugins/src_managed/
6262
project/plugins/target/
6363
python/lib/pyspark.zip
64+
python/.eggs/
6465
python/deps
6566
python/test_coverage/coverage_data
6667
python/test_coverage/htmlcov

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ MIT License
243243
core/src/main/resources/org/apache/spark/ui/static/dagre-d3.min.js
244244
core/src/main/resources/org/apache/spark/ui/static/*dataTables*
245245
core/src/main/resources/org/apache/spark/ui/static/graphlib-dot.min.js
246-
ore/src/main/resources/org/apache/spark/ui/static/jquery*
246+
core/src/main/resources/org/apache/spark/ui/static/jquery*
247247
core/src/main/resources/org/apache/spark/ui/static/sorttable.js
248248
docs/js/vendor/anchor.min.js
249249
docs/js/vendor/jquery*

LICENSE-binary

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,6 @@ com.google.code.gson:gson
305305
com.google.inject:guice
306306
com.google.inject.extensions:guice-servlet
307307
com.twitter:parquet-hadoop-bundle
308-
commons-beanutils:commons-beanutils-core
309308
commons-cli:commons-cli
310309
commons-dbcp:commons-dbcp
311310
commons-io:commons-io
@@ -468,6 +467,7 @@ Common Development and Distribution License (CDDL) 1.1
468467
------------------------------------------------------
469468

470469
javax.annotation:javax.annotation-api https://jcp.org/en/jsr/detail?id=250
470+
javax.el:javax.el-api https://javaee.github.io/uel-ri/
471471
javax.servlet:javax.servlet-api https://javaee.github.io/servlet-spec/
472472
javax.transaction:jta http://www.oracle.com/technetwork/java/index.html
473473
javax.ws.rs:javax.ws.rs-api https://github.com/jax-rs

R/pkg/DESCRIPTION

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Package: SparkR
22
Type: Package
3-
Version: 2.4.1
4-
Title: R Frontend for Apache Spark
5-
Description: Provides an R Frontend for Apache Spark.
3+
Version: 2.4.5
4+
Title: R Front End for 'Apache Spark'
5+
Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>.
66
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
77
email = "[email protected]"),
88
person("Xiangrui", "Meng", role = "aut",
@@ -11,8 +11,8 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
1111
email = "[email protected]"),
1212
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
1313
License: Apache License (== 2.0)
14-
URL: http://www.apache.org/ http://spark.apache.org/
15-
BugReports: http://spark.apache.org/contributing.html
14+
URL: https://www.apache.org/ https://spark.apache.org/
15+
BugReports: https://spark.apache.org/contributing.html
1616
SystemRequirements: Java (== 8)
1717
Depends:
1818
R (>= 3.0),

R/pkg/R/SQLContext.R

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -655,7 +655,8 @@ loadDF <- function(x = NULL, ...) {
655655
#'
656656
#' @param url JDBC database url of the form \code{jdbc:subprotocol:subname}
657657
#' @param tableName the name of the table in the external database
658-
#' @param partitionColumn the name of a column of integral type that will be used for partitioning
658+
#' @param partitionColumn the name of a column of numeric, date, or timestamp type
659+
#' that will be used for partitioning.
659660
#' @param lowerBound the minimum value of \code{partitionColumn} used to decide partition stride
660661
#' @param upperBound the maximum value of \code{partitionColumn} used to decide partition stride
661662
#' @param numPartitions the number of partitions, This, along with \code{lowerBound} (inclusive),

R/pkg/R/context.R

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,7 @@ broadcastRDD <- function(sc, object) {
297297
#' Set the checkpoint directory
298298
#'
299299
#' Set the directory under which RDDs are going to be checkpointed. The
300-
#' directory must be a HDFS path if running on a cluster.
300+
#' directory must be an HDFS path if running on a cluster.
301301
#'
302302
#' @param sc Spark Context to use
303303
#' @param dirName Directory path
@@ -321,7 +321,8 @@ setCheckpointDirSC <- function(sc, dirName) {
321321
#'
322322
#' A directory can be given if the recursive option is set to true.
323323
#' Currently directories are only supported for Hadoop-supported filesystems.
324-
#' Refer Hadoop-supported filesystems at \url{https://wiki.apache.org/hadoop/HCFS}.
324+
#' Refer Hadoop-supported filesystems at
325+
#' \url{https://cwiki.apache.org/confluence/display/HADOOP2/HCFS}.
325326
#'
326327
#' Note: A path can be added only once. Subsequent additions of the same path are ignored.
327328
#'
@@ -441,7 +442,7 @@ setLogLevel <- function(level) {
441442
#' Set checkpoint directory
442443
#'
443444
#' Set the directory under which SparkDataFrame are going to be checkpointed. The directory must be
444-
#' a HDFS path if running on a cluster.
445+
#' an HDFS path if running on a cluster.
445446
#'
446447
#' @rdname setCheckpointDir
447448
#' @param directory Directory path to checkpoint to

R/pkg/R/functions.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3337,7 +3337,7 @@ setMethod("size",
33373337

33383338
#' @details
33393339
#' \code{slice}: Returns an array containing all the elements in x from the index start
3340-
#' (or starting from the end if start is negative) with the specified length.
3340+
#' (array indices start at 1, or from the end if start is negative) with the specified length.
33413341
#'
33423342
#' @rdname column_collection_functions
33433343
#' @param start an index indicating the first element occurring in the result.

R/pkg/tests/fulltests/test_streaming.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ test_that("Specify a schema by using a DDL-formatted string when reading", {
127127
expect_false(awaitTermination(q, 5 * 1000))
128128
callJMethod(q@ssq, "processAllAvailable")
129129
expect_equal(head(sql("SELECT count(*) FROM people3"))[[1]], 3)
130+
stopQuery(q)
130131

131132
expect_error(read.stream(path = parquetPath, schema = "name stri"),
132133
"DataType stri is not supported.")

R/pkg/vignettes/sparkr-vignettes.Rmd

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,20 @@ First, let's load and attach the package.
5757
library(SparkR)
5858
```
5959

60+
```{r, include=FALSE}
61+
# disable eval if java version not supported
62+
override_eval <- tryCatch(!is.numeric(SparkR:::checkJavaVersion()),
63+
error = function(e) { TRUE },
64+
warning = function(e) { TRUE })
65+
66+
if (override_eval) {
67+
opts_hooks$set(eval = function(options) {
68+
options$eval = FALSE
69+
options
70+
})
71+
}
72+
```
73+
6074
`SparkSession` is the entry point into SparkR which connects your R program to a Spark cluster. You can create a `SparkSession` using `sparkR.session` and pass in options such as the application name, any Spark packages depended on, etc.
6175

6276
We use default settings in which it runs in local mode. It auto downloads Spark package in the background if no previous installation is found. For more details about setup, see [Spark Session](#SetupSparkSession).

0 commit comments

Comments
 (0)