Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
208b902
[SPARK-7566][SQL] Add type to HiveContext.analyzer
smola May 13, 2015
df9b94a
[SPARK-7482] [SPARKR] Rename some DataFrame API methods in SparkR to …
May 13, 2015
98195c3
[SPARK-7526] [SPARKR] Specify ip of RBackend, MonitorServer and RRDD …
Sephiroth-Lin May 13, 2015
50c7270
[SPARK-6568] spark-shell.cmd --jars option does not accept the jar th…
tsudukim May 13, 2015
10c546e
[SPARK-7599] [SQL] Don't restrict customized output committers to be …
liancheng May 13, 2015
b061bd5
[SQL] In InsertIntoFSBasedRelation.insert, log cause before abort job…
yhuai May 13, 2015
aa6ba3f
[MINOR] [SQL] Removes debugging println
liancheng May 13, 2015
0da254f
[SPARK-6734] [SQL] Add UDTF.close support in Generate
chenghao-intel May 13, 2015
bec938f
[SPARK-7589] [STREAMING] [WEBUI] Make "Input Rate" in the Streaming p…
zsxwing May 13, 2015
7ff16e8
[SPARK-7567] [SQL] Migrating Parquet data source to FSBasedRelation
liancheng May 13, 2015
213a6f3
[SPARK-7551][DataFrame] support backticks for DataFrame attribute res…
cloud-fan May 13, 2015
e676fc0
[MINOR] Avoid passing the PermGenSize option to IBM JVMs.
tellison May 13, 2015
3cd9ad2
[MINOR] Enhance SizeEstimator to detect IBM compressed refs and s390 …
tellison May 13, 2015
51030b8
[MINOR] [CORE] Accept alternative mesos unsatisfied link error in test.
tellison May 13, 2015
5db18ba
[SPARK-7593] [ML] Python Api for ml.feature.Bucketizer
brkyvz May 13, 2015
61e05fc
[SPARK-7545] [MLLIB] Added check in Bernoulli Naive Bayes to make sur…
leahmcguire May 13, 2015
df2fb13
[SPARK-7382] [MLLIB] Feature Parity in PySpark for ml.classification
brkyvz May 13, 2015
59250fe
[SPARK-7303] [SQL] push down project if possible when the child is sort
scwf May 13, 2015
e683182
[SQL] Move some classes into packages that are more appropriate.
rxin May 13, 2015
f6e1838
[SPARK-7608] Clean up old state in RDDOperationGraphListener
May 13, 2015
f88ac70
[SPARK-7399] Spark compilation error for scala 2.11
May 13, 2015
4440341
[SPARK-7464] DAG visualization: highlight the same RDDs on hover
May 13, 2015
aa18378
[SPARK-7502] DAG visualization: gracefully handle removed stages
May 13, 2015
bb6dec3
[STREAMING] [MINOR] Keep streaming.UIUtils private
May 13, 2015
61d1e87
[SPARK-7356] [STREAMING] Fix flakey tests in FlumePollingStreamSuite …
harishreedharan May 13, 2015
73bed40
[SPARK-7081] Faster sort-based shuffle path using binary processing c…
JoshRosen May 14, 2015
59aaa1d
[SPARK-7601] [SQL] Support Insert into JDBC Datasource
gvramana May 14, 2015
bce00da
[SPARK-6752] [STREAMING] [REVISED] Allow StreamingContext to be recre…
tdas May 14, 2015
32e27df
[HOTFIX] Bug in merge script
pwendell May 14, 2015
728af88
[HOTFIX] Use 'new Job' in fsBasedParquet.scala
zsxwing May 14, 2015
3113da9
[HOT FIX #6125] Do not wait for all stages to start rendering
May 14, 2015
d5f18de
[SPARK-7612] [MLLIB] update NB training to use mllib's BLAS
mengxr May 14, 2015
d3db2fd
[SPARK-7620] [ML] [MLLIB] Removed calling size, length in while condi…
May 14, 2015
13e652b
[SPARK-7595] [SQL] Window will cause resolve failed with self join
Sephiroth-Lin May 14, 2015
1b8625f
[SPARK-7407] [MLLIB] use uid + name to identify parameters
mengxr May 14, 2015
c1080b6
[SPARK-7568] [ML] ml.LogisticRegression doesn't output the right pred…
May 14, 2015
7fb715d
[SPARK-7249] Updated Hadoop dependencies due to inconsistency in the …
FavioVazquez May 14, 2015
f2cd00b
[SQL][minor] rename apply for QueryPlanner
cloud-fan May 14, 2015
5d7d4f8
[SPARK-7278] [PySpark] DateType should find datetime.datetime acceptable
ksonj May 14, 2015
11a1a13
Make SPARK prefix a variable
tedyu May 14, 2015
93dbb3a
[SPARK-7598] [DEPLOY] Add aliveWorkers metrics in Master
twilightgod May 14, 2015
57ed16c
[SPARK-7643] [UI] use the correct size in RDDPage for storage info an…
mengxr May 14, 2015
0a317c1
[SPARK-7649] [STREAMING] [WEBUI] Use window.localStorage to store the…
zsxwing May 14, 2015
b208f99
[SPARK-7645] [STREAMING] [WEBUI] Show milliseconds in the UI if the b…
zsxwing May 14, 2015
723853e
[SPARK-7648] [MLLIB] Add weights and intercept to GLM wrappers in spa…
mengxr May 15, 2015
48fc38f
[SPARK-7619] [PYTHON] fix docstring signature
mengxr May 15, 2015
6d0633e
[SPARK-7548] [SQL] Add explode function for DataFrames
marmbrus May 15, 2015
f9705d4
[SPARK-7098][SQL] Make the WHERE clause with timestamp show consisten…
viirya May 15, 2015
e8f0e01
[SQL] When creating partitioned table scan, explicitly create UnionRDD.
yhuai May 15, 2015
7da33ce
[HOTFIX] Add workaround for SPARK-7660 to fix JavaAPISuite failures.
JoshRosen May 15, 2015
daf4ae7
[CORE] Remove unreachable Heartbeat message from Worker
kanzhang May 15, 2015
cf842d4
[SPARK-7650] [STREAMING] [WEBUI] Move streaming css and js files to t…
zsxwing May 15, 2015
9476148
[SPARK-6258] [MLLIB] GaussianMixture Python API parity check
yanboliang May 15, 2015
fdf5bba
[SPARK-7591] [SQL] Partitioning support API tweaks
liancheng May 15, 2015
c64ff80
[SPARK-7503] [YARN] Resources in .sparkStaging directory can't be cle…
sarutak May 15, 2015
f96b85a
[SPARK-7668] [MLLIB] Preserve isTransposed property for Matrix after …
viirya May 15, 2015
8f4aaba
[SPARK-7651] [MLLIB] [PYSPARK] GMM predict, predictSoft should raise …
FlytxtRnD May 15, 2015
b1b9d58
[SPARK-7233] [CORE] Detect REPL mode once
preeze May 15, 2015
270d4b5
[CORE] Protect additional test vars from early GC
tellison May 15, 2015
8ab1450
[SPARK-5412] [DEPLOY] Cannot bind Master to a specific hostname as pe…
srowen May 15, 2015
ad92af9
[SPARK-7664] [WEBUI] DAG visualization: Fix incorrect link paths of DAG.
sarutak May 15, 2015
8e3822a
[SPARK-7504] [YARN] NullPointerException when initializing SparkConte…
zzvara May 15, 2015
9b6cf28
[SPARK-7296] Add timeline visualization for stages in the UI.
sarutak May 15, 2015
50da9e8
[SPARK-7226] [SPARKR] Support math functions in R DataFrame
hqzizania May 15, 2015
6e77105
[SPARK-7677] [STREAMING] Add Kafka modules to the 2.11 build.
dragos May 15, 2015
c869633
[SPARK-7556] [ML] [DOC] Add user guide for spark.ml Binarizer, includ…
viirya May 15, 2015
e745456
[SPARK-7676] Bug fix and cleanup of stage timeline view
kayousterhout May 16, 2015
2c04c8a
[SPARK-7563] OutputCommitCoordinator.stop() should only run on the dr…
JoshRosen May 16, 2015
cc12a86
[SPARK-7575] [ML] [DOC] Example code for OneVsRest
May 16, 2015
adfd366
[SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python
May 16, 2015
d7b6994
[SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files
May 16, 2015
deb4113
[SPARK-7473] [MLLIB] Add reservoir sample in RandomForest
May 16, 2015
578bfee
[SPARK-7654][SQL] DataFrameReader and DataFrameWriter for input/outpu…
rxin May 16, 2015
d41ae43
[SPARK-7671] Fix wrong URLs in MLlib Data Types Documentation
FavioVazquez May 16, 2015
1fd3381
[SPARK-4556] [BUILD] binary distribution assembly can't run in local …
srowen May 16, 2015
0ac8b01
[SPARK-7672] [CORE] Use int conversion in translating kryoserializer.…
nishkamravi2 May 16, 2015
47e7ffe
[SPARK-7655][Core][SQL] Remove 'scala.concurrent.ExecutionContext.Imp…
zsxwing May 16, 2015
ce63912
[HOTFIX] [SQL] Fixes DataFrameWriter.mode(String)
liancheng May 16, 2015
1b4e710
[BUILD] update jblas dependency version to 1.2.4
mtbrandy May 16, 2015
161d0b4
[SPARK-7654][MLlib] Migrate MLlib to the DataFrame reader/writer API.
rxin May 16, 2015
3b6ef2c
[SPARK-7655][Core] Deserializing value should not hold the TaskSchedu…
zsxwing May 17, 2015
517eb37
[SPARK-7654][SQL] Move JDBC into DataFrame's reader/writer interface.
rxin May 17, 2015
ba4f8ca
[MINOR] [SQL] Removes an unreachable case clause
liancheng May 17, 2015
1a7b9ce
[MINOR] Add 1.3, 1.3.1 to master branch EC2 scripts
shivaram May 17, 2015
edf09ea
[SQL] [MINOR] Skip unresolved expression for InConversion
scwf May 17, 2015
3399055
[SPARK-7447] [SQL] Don't re-merge Parquet schema when the relation is…
viirya May 17, 2015
5021766
[SPARK-7669] Builds against Hadoop 2.6+ get inconsistent curator depend…
steveloughran May 17, 2015
f2cc6b5
[SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug
JoshRosen May 17, 2015
5645628
[SPARK-7686] [SQL] DescribeCommand is assigned wrong output attribute…
JoshRosen May 17, 2015
2ca60ac
[SPARK-7491] [SQL] Allow configuration of classloader isolation for hive
marmbrus May 17, 2015
ca4257a
[SPARK-6514] [SPARK-5960] [SPARK-6656] [SPARK-7679] [STREAMING] [KINE…
tdas May 17, 2015
2f22424
[SQL] [MINOR] use catalyst type converter in ScalaUdf
cloud-fan May 17, 2015
ff71d34
[SPARK-7693][Core] Remove "import scala.concurrent.ExecutionContext.I…
zsxwing May 18, 2015
775e6f9
[SPARK-7694] [MLLIB] Use getOrElse for getting the threshold of LR model
coderxiang May 18, 2015
e32c0f6
[SPARK-7299][SQL] Set precision and scale for Decimal according to JD…
viirya May 18, 2015
1ecfac6
[SPARK-6657] [PYSPARK] Fix doc warnings
mengxr May 18, 2015
814b3da
[SPARK-7272] [MLLIB] User guide for PMML model export
selvinsource May 18, 2015
563bfcc
[SPARK-7627] [SPARK-7472] DAG visualization: style skipped stages
May 18, 2015
e1ac2a9
[SPARK-6888] [SQL] Make the jdbc driver handling user-definable
rtreffer May 18, 2015
010a1c2
[SPARK-7570] [SQL] Ignores _temporary during partition discovery
liancheng May 18, 2015
56ede88
[SQL] [MINOR] [THIS] use private for internal field in ScalaUdf
cloud-fan May 18, 2015
9c7e802
[SPARK-7380] [MLLIB] pipeline stages should be copyable in Python
mengxr May 18, 2015
aa31e43
[SPARK-2883] [SQL] ORC data source for Spark SQL
zhzhan May 18, 2015
fc2480e
[SPARK-7631] [SQL] treenode argString should not print children
scwf May 18, 2015
103c863
[SPARK-7269] [SQL] Incorrect analysis for aggregation(use semanticEqu…
cloud-fan May 18, 2015
530397b
[SPARK-7567] [SQL] [follow-up] Use a new flag to set output committer…
yhuai May 18, 2015
9dadf01
[SPARK-7673] [SQL] WIP: HadoopFsRelation and ParquetRelation2 perform…
liancheng May 18, 2015
32fbd29
[SPARK-6216] [PYSPARK] check python version of worker with driver
May 18, 2015
0b6f503
[SPARK-7658] [STREAMING] [WEBUI] Update the mouse behaviors for the t…
zsxwing May 18, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -861,7 +861,7 @@ The following components are provided under a BSD-style license. See project lin

(BSD 3 Clause) core (com.github.fommil.netlib:core:1.1.2 - https://github.com/fommil/netlib-java/core)
(BSD 3 Clause) JPMML-Model (org.jpmml:pmml-model:1.1.15 - https://github.com/jpmml/jpmml-model)
(BSD 3-clause style license) jblas (org.jblas:jblas:1.2.3 - http://jblas.org/)
(BSD 3-clause style license) jblas (org.jblas:jblas:1.2.4 - http://jblas.org/)
(BSD License) AntLR Parser Generator (antlr:antlr:2.7.7 - http://www.antlr.org/)
(BSD License) Javolution (javolution:javolution:5.5.1 - http://javolution.org)
(BSD licence) ANTLR ST4 4.0.4 (org.antlr:ST4:4.0.4 - http://www.stringtemplate.org)
Expand Down
29 changes: 27 additions & 2 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ exportMethods("arrange",
"registerTempTable",
"rename",
"repartition",
"sampleDF",
"sample",
"sample_frac",
"saveAsParquetFile",
"saveAsTable",
Expand All @@ -53,38 +53,62 @@ exportMethods("arrange",
"unpersist",
"where",
"withColumn",
"withColumnRenamed")
"withColumnRenamed",
"write.df")

exportClasses("Column")

exportMethods("abs",
"acos",
"alias",
"approxCountDistinct",
"asc",
"asin",
"atan",
"atan2",
"avg",
"cast",
"cbrt",
"ceiling",
"contains",
"cos",
"cosh",
"countDistinct",
"desc",
"endsWith",
"exp",
"expm1",
"floor",
"getField",
"getItem",
"hypot",
"isNotNull",
"isNull",
"last",
"like",
"log",
"log10",
"log1p",
"lower",
"max",
"mean",
"min",
"n",
"n_distinct",
"rint",
"rlike",
"sign",
"sin",
"sinh",
"sqrt",
"startsWith",
"substr",
"sum",
"sumDistinct",
"tan",
"tanh",
"toDegrees",
"toRadians",
"upper")

exportClasses("GroupedData")
Expand All @@ -101,6 +125,7 @@ export("cacheTable",
"jsonFile",
"loadDF",
"parquetFile",
"read.df",
"sql",
"table",
"tableNames",
Expand Down
35 changes: 22 additions & 13 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -294,8 +294,8 @@ setMethod("registerTempTable",
#'\dontrun{
#' sc <- sparkR.init()
#' sqlCtx <- sparkRSQL.init(sc)
#' df <- loadDF(sqlCtx, path, "parquet")
#' df2 <- loadDF(sqlCtx, path2, "parquet")
#' df <- read.df(sqlCtx, path, "parquet")
#' df2 <- read.df(sqlCtx, path2, "parquet")
#' registerTempTable(df, "table1")
#' insertInto(df2, "table1", overwrite = TRUE)
#'}
Expand Down Expand Up @@ -473,14 +473,14 @@ setMethod("distinct",
dataFrame(sdf)
})

#' SampleDF
#' Sample
#'
#' Return a sampled subset of this DataFrame using a random seed.
#'
#' @param x A SparkSQL DataFrame
#' @param withReplacement Sampling with replacement or not
#' @param fraction The (rough) sample target fraction
#' @rdname sampleDF
#' @rdname sample
#' @aliases sample_frac
#' @export
#' @examples
Expand All @@ -489,10 +489,10 @@ setMethod("distinct",
#' sqlCtx <- sparkRSQL.init(sc)
#' path <- "path/to/file.json"
#' df <- jsonFile(sqlCtx, path)
#' collect(sampleDF(df, FALSE, 0.5))
#' collect(sampleDF(df, TRUE, 0.5))
#' collect(sample(df, FALSE, 0.5))
#' collect(sample(df, TRUE, 0.5))
#'}
setMethod("sampleDF",
setMethod("sample",
# TODO : Figure out how to send integer as java.lang.Long to JVM so
# we can send seed as an argument through callJMethod
signature(x = "DataFrame", withReplacement = "logical",
Expand All @@ -503,13 +503,13 @@ setMethod("sampleDF",
dataFrame(sdf)
})

#' @rdname sampleDF
#' @aliases sampleDF
#' @rdname sample
#' @aliases sample
setMethod("sample_frac",
signature(x = "DataFrame", withReplacement = "logical",
fraction = "numeric"),
function(x, withReplacement, fraction) {
sampleDF(x, withReplacement, fraction)
sample(x, withReplacement, fraction)
})

#' Count
Expand Down Expand Up @@ -1303,17 +1303,17 @@ setMethod("except",
#' @param source A name for external data source
#' @param mode One of 'append', 'overwrite', 'error', 'ignore'
#'
#' @rdname saveAsTable
#' @rdname write.df
#' @export
#' @examples
#'\dontrun{
#' sc <- sparkR.init()
#' sqlCtx <- sparkRSQL.init(sc)
#' path <- "path/to/file.json"
#' df <- jsonFile(sqlCtx, path)
#' saveAsTable(df, "myfile")
#' write.df(df, "myfile", "parquet", "overwrite")
#' }
setMethod("saveDF",
setMethod("write.df",
signature(df = "DataFrame", path = 'character', source = 'character',
mode = 'character'),
function(df, path = NULL, source = NULL, mode = "append", ...){
Expand All @@ -1334,6 +1334,15 @@ setMethod("saveDF",
callJMethod(df@sdf, "save", source, jmode, options)
})

#' @rdname write.df
#' @aliases saveDF
#' @export
setMethod("saveDF",
signature(df = "DataFrame", path = 'character', source = 'character',
mode = 'character'),
function(df, path = NULL, source = NULL, mode = "append", ...){
write.df(df, path, source, mode, ...)
})

#' saveAsTable
#'
Expand Down
4 changes: 2 additions & 2 deletions R/pkg/R/RDD.R
Original file line number Diff line number Diff line change
Expand Up @@ -927,7 +927,7 @@ setMethod("takeSample", signature(x = "RDD", withReplacement = "logical",
MAXINT)))))

# TODO(zongheng): investigate if this call is an in-place shuffle?
sample(samples)[1:total]
base::sample(samples)[1:total]
})

# Creates tuples of the elements in this RDD by applying a function.
Expand Down Expand Up @@ -996,7 +996,7 @@ setMethod("coalesce",
if (shuffle || numPartitions > SparkR:::numPartitions(x)) {
func <- function(partIndex, part) {
set.seed(partIndex) # partIndex as seed
start <- as.integer(sample(numPartitions, 1) - 1)
start <- as.integer(base::sample(numPartitions, 1) - 1)
lapply(seq_along(part),
function(i) {
pos <- (start + i) %% numPartitions
Expand Down
13 changes: 10 additions & 3 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -421,7 +421,7 @@ clearCache <- function(sqlCtx) {
#' \dontrun{
#' sc <- sparkR.init()
#' sqlCtx <- sparkRSQL.init(sc)
#' df <- loadDF(sqlCtx, path, "parquet")
#' df <- read.df(sqlCtx, path, "parquet")
#' registerTempTable(df, "table")
#' dropTempTable(sqlCtx, "table")
#' }
Expand Down Expand Up @@ -450,10 +450,10 @@ dropTempTable <- function(sqlCtx, tableName) {
#'\dontrun{
#' sc <- sparkR.init()
#' sqlCtx <- sparkRSQL.init(sc)
#' df <- load(sqlCtx, "path/to/file.json", source = "json")
#' df <- read.df(sqlCtx, "path/to/file.json", source = "json")
#' }

loadDF <- function(sqlCtx, path = NULL, source = NULL, ...) {
read.df <- function(sqlCtx, path = NULL, source = NULL, ...) {
options <- varargsToEnv(...)
if (!is.null(path)) {
options[['path']] <- path
Expand All @@ -462,6 +462,13 @@ loadDF <- function(sqlCtx, path = NULL, source = NULL, ...) {
dataFrame(sdf)
}

#' @aliases loadDF
#' @export

loadDF <- function(sqlCtx, path = NULL, source = NULL, ...) {
read.df(sqlCtx, path, source, ...)
}

#' Create an external table
#'
#' Creates an external table based on the dataset in a data source,
Expand Down
36 changes: 33 additions & 3 deletions R/pkg/R/column.R
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,17 @@ operators <- list(
"+" = "plus", "-" = "minus", "*" = "multiply", "/" = "divide", "%%" = "mod",
"==" = "equalTo", ">" = "gt", "<" = "lt", "!=" = "notEqual", "<=" = "leq", ">=" = "geq",
# we can not override `&&` and `||`, so use `&` and `|` instead
"&" = "and", "|" = "or" #, "!" = "unary_$bang"
"&" = "and", "|" = "or", #, "!" = "unary_$bang"
"^" = "pow"
)
column_functions1 <- c("asc", "desc", "isNull", "isNotNull")
column_functions2 <- c("like", "rlike", "startsWith", "endsWith", "getField", "getItem", "contains")
functions <- c("min", "max", "sum", "avg", "mean", "count", "abs", "sqrt",
"first", "last", "lower", "upper", "sumDistinct")
"first", "last", "lower", "upper", "sumDistinct",
"acos", "asin", "atan", "cbrt", "ceiling", "cos", "cosh", "exp",
"expm1", "floor", "log", "log10", "log1p", "rint", "sign",
"sin", "sinh", "tan", "tanh", "toDegrees", "toRadians")
binary_mathfunctions<- c("atan2", "hypot")

createOperator <- function(op) {
setMethod(op,
Expand All @@ -76,7 +81,11 @@ createOperator <- function(op) {
if (class(e2) == "Column") {
e2 <- e2@jc
}
callJMethod(e1@jc, operators[[op]], e2)
if (op == "^") {
jc <- callJStatic("org.apache.spark.sql.functions", operators[[op]], e1@jc, e2)
} else {
callJMethod(e1@jc, operators[[op]], e2)
}
}
column(jc)
})
Expand Down Expand Up @@ -106,11 +115,29 @@ createStaticFunction <- function(name) {
setMethod(name,
signature(x = "Column"),
function(x) {
if (name == "ceiling") {
name <- "ceil"
}
if (name == "sign") {
name <- "signum"
}
jc <- callJStatic("org.apache.spark.sql.functions", name, x@jc)
column(jc)
})
}

createBinaryMathfunctions <- function(name) {
setMethod(name,
signature(y = "Column"),
function(y, x) {
if (class(x) == "Column") {
x <- x@jc
}
jc <- callJStatic("org.apache.spark.sql.functions", name, y@jc, x)
column(jc)
})
}

createMethods <- function() {
for (op in names(operators)) {
createOperator(op)
Expand All @@ -124,6 +151,9 @@ createMethods <- function() {
for (x in functions) {
createStaticFunction(x)
}
for (name in binary_mathfunctions) {
createBinaryMathfunctions(name)
}
}

createMethods()
Expand Down
Loading