Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
383 commits
Select commit Hold shift + click to select a range
fb0d608
[SPARK-17849][SQL] Fix NPE problem when using grouping sets
Nov 5, 2016
9a87c31
[SPARK-17964][SPARKR] Enable SparkR with Mesos client mode and cluste…
susanxhuynh Nov 5, 2016
15d3926
[MINOR][DOCUMENTATION] Fix some minor descriptions in functions consi…
HyukjinKwon Nov 6, 2016
23ce0d1
[SPARK-18276][ML] ML models should copy the training summary and set …
sethah Nov 6, 2016
340f09d
[SPARK-17854][SQL] rand/randn allows null/long as input seed
HyukjinKwon Nov 6, 2016
b89d055
[SPARK-18210][ML] Pipeline.copy does not create an instance with the …
wojtek-szymanski Nov 6, 2016
556a3b7
[SPARK-18269][SQL] CSV datasource should read null properly when sche…
HyukjinKwon Nov 7, 2016
46b2e49
[SPARK-18173][SQL] data source tables should support truncating parti…
cloud-fan Nov 7, 2016
07ac3f0
[SPARK-18167][SQL] Disable flaky hive partition pruning test.
rxin Nov 7, 2016
9db06c4
[SPARK-18296][SQL] Use consistent naming for expression test suites
rxin Nov 7, 2016
57626a5
[SPARK-16904][SQL] Removal of Hive Built-in Hash Functions and TestHi…
gatorsmile Nov 7, 2016
a814eea
[SPARK-18125][SQL] Fix a compilation error in codegen due to splitExp…
viirya Nov 7, 2016
daa975f
[SPARK-18291][SPARKR][ML] SparkR glm predict should output original l…
yanboliang Nov 7, 2016
b06c23d
[SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whethe…
tdas Nov 7, 2016
0d95662
[SPARK-17108][SQL] Fix BIGINT and INT comparison failure in spark sql
weiqingy Nov 7, 2016
8f0ea01
[SPARK-14914][CORE] Fix Resource not closed after using, mostly for u…
HyukjinKwon Nov 7, 2016
19cf208
[SPARK-17490][SQL] Optimize SerializeFromObject() for a primitive array
kiszk Nov 7, 2016
3a710b9
[SPARK-18236] Reduce duplicate objects in Spark UI and HistoryServer
JoshRosen Nov 8, 2016
3eda057
[SPARK-18295][SQL] Make to_json function null safe (matching it to fr…
HyukjinKwon Nov 8, 2016
9b0593d
[SPARK-18086] Add support for Hive session vars.
rdblue Nov 8, 2016
c1a0c66
[SPARK-18261][STRUCTURED STREAMING] Add statistics to MemorySink for …
lw-lin Nov 8, 2016
1da64e1
[SPARK-18217][SQL] Disallow creating permanent views based on tempora…
gatorsmile Nov 8, 2016
6f36971
[SPARK-16575][CORE] partition calculation mismatch with sc.binaryFiles
fidato13 Nov 8, 2016
47731e1
[SPARK-18207][SQL] Fix a compilation error due to HashExpression.doGe…
kiszk Nov 8, 2016
c291bd2
[SPARK-18137][SQL] Fix RewriteDistinctAggregates UnresolvedException …
Nov 8, 2016
ee2e741
[SPARK-13770][DOCUMENTATION][ML] Document the ML feature Interaction
Nov 8, 2016
b1033fb
[MINOR][DOC] Unify example marks
zhengruifeng Nov 8, 2016
344dcad
[SPARK-17868][SQL] Do not use bitmasks during parsing and analysis of…
jiangxb1987 Nov 8, 2016
73feaa3
[SPARK-18346][SQL] TRUNCATE TABLE should fail if no partition is matc…
cloud-fan Nov 8, 2016
9c41969
[SPARK-18191][CORE] Port RDD API to use commit protocol
jiangxb1987 Nov 8, 2016
245e5a2
[SPARK-18357] Fix yarn files/archive broken issue andd unit tests
kishorvpatil Nov 8, 2016
26e1c53
[SPARK-17748][ML] Minor cleanups to one-pass linear regression with e…
jkbradley Nov 8, 2016
b6de0c9
[SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBac…
zsxwing Nov 8, 2016
6f7ecb0
[SPARK-18342] Make rename failures fatal in HDFSBackedStateStore
brkyvz Nov 8, 2016
55964c1
[SPARK-18239][SPARKR] Gradient Boosted Tree for R
felixcheung Nov 9, 2016
4afa39e
[SPARK-18333][SQL] Revert hacks in parquet and orc reader to support …
ericl Nov 9, 2016
b9192bb
[SPARK-18368] Fix regexp_replace with task serialization.
rdblue Nov 9, 2016
e256392
[SPARK-17659][SQL] Partitioned View is Not Supported By SHOW CREATE T…
gatorsmile Nov 9, 2016
02c5325
[SPARK-18292][SQL] LogicalPlanToSQLSuite should not use resource depe…
dongjoon-hyun Nov 9, 2016
205e6d5
[SPARK-18338][SQL][TEST-MAVEN] Fix test case initialization order und…
liancheng Nov 9, 2016
06a13ec
[SPARK-16808][CORE] History Server main page does not honor APPLICATI…
vijoshi Nov 9, 2016
4763661
Revert "[SPARK-18368] Fix regexp_replace with task serialization."
yhuai Nov 9, 2016
d4028de
[SPARK-18368][SQL] Fix regexp replace when serialized
rdblue Nov 9, 2016
d8b81f7
[SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelatio…
hvanhovell Nov 9, 2016
64fbdf1
[SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputFormat` is `Co…
jiangxb1987 Nov 9, 2016
3f62e1b
[SPARK-17829][SQL] Stable format for offset log
Nov 9, 2016
6021c95
[SPARK-18147][SQL] do not fail for very complex aggregator result type
cloud-fan Nov 10, 2016
cc86fcd
[MINOR][PYSPARK] Improve error message when running PySpark with diff…
viirya Nov 10, 2016
96a5910
[SPARK-18268][ML][MLLIB] ALS fail with better message if ratings is e…
techaddict Nov 10, 2016
22a9d06
[SPARK-14914][CORE] Fix Resource not closed after using, for unit tes…
wangmiao1981 Nov 10, 2016
16eaad9
[SPARK-18262][BUILD][SQL] JSON.org license is now CatX
srowen Nov 10, 2016
b533fa2
[SPARK-17993][SQL] Fix Parquet log output redirection
Nov 10, 2016
2f7461f
[SPARK-17990][SPARK-18302][SQL] correct several partition related beh…
cloud-fan Nov 10, 2016
e0deee1
[SPARK-18403][SQL] Temporarily disable flaky ObjectHashAggregateSuite
liancheng Nov 10, 2016
a335634
[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasourc…
ericl Nov 11, 2016
5ddf694
[SPARK-18401][SPARKR][ML] SparkR random forest should support output …
yanboliang Nov 11, 2016
4f15d94
[SPARK-13331] AES support for over-the-wire encryption
Nov 11, 2016
a531fe1
[SPARK-17843][WEB UI] Indicate event logs pending for processing on h…
vijoshi Nov 11, 2016
d42bb7c
[SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with pare…
dongjoon-hyun Nov 11, 2016
6e95325
[SPARK-18387][SQL] Add serialization to checkEvaluation.
rdblue Nov 11, 2016
ba23f76
[SPARK-18264][SPARKR] build vignettes with package, update vignettes …
felixcheung Nov 11, 2016
46b2550
[SPARK-18060][ML] Avoid unnecessary computation for MLOR
sethah Nov 12, 2016
3af8945
[SPARK-16759][CORE] Add a configuration property to pass caller conte…
weiqingy Nov 12, 2016
bc41d99
[SPARK-18375][SPARK-18383][BUILD][CORE] Upgrade netty to 4.0.42.Final
witgo Nov 12, 2016
22cb3a0
[SPARK-14077][ML][FOLLOW-UP] Minor refactor and cleanup for NaiveBayes
yanboliang Nov 12, 2016
1386fd2
[SPARK-18418] Fix flags for make_binary_release for hadoop profile
holdenk Nov 12, 2016
b91a51b
[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Stru…
Nov 14, 2016
07be232
[SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms…
yanboliang Nov 14, 2016
f95b124
[SPARK-18382][WEBUI] "run at null:-1" in UI when no file/line info in…
srowen Nov 14, 2016
ae6cddb
[SPARK-18166][MLLIB] Fix Poisson GLM bug due to wrong requirement of …
actuaryzhang Nov 14, 2016
637a0bb
[SPARK-18396][HISTORYSERVER] Duration" column makes search result con…
WangTaoTheTonic Nov 14, 2016
9d07cee
[SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB
moomindani Nov 14, 2016
bdfe60a
[SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store
tdas Nov 14, 2016
89d1fa5
[SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis
koeninger Nov 14, 2016
7593445
[SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPer…
zhengruifeng Nov 14, 2016
bd85603
[SPARK-17348][SQL] Incorrect results from subquery transformation
nsyca Nov 14, 2016
c071878
[SPARK-18124] Observed delay based Event Time Watermarks
marmbrus Nov 15, 2016
c31def1
[SPARK-18428][DOC] Update docs for GraphX
zhengruifeng Nov 15, 2016
86430cc
[SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocatio…
gatorsmile Nov 15, 2016
d89bfc9
[SPARK-18232][MESOS] Support CNI
Nov 15, 2016
33be4da
[SPARK-18427][DOC] Update docs of mllib.KMeans
zhengruifeng Nov 15, 2016
f14ae49
[SPARK-18300][SQL] Do not apply foldable propagation with expand as a…
hvanhovell Nov 15, 2016
745ab8b
[SPARK-18379][SQL] Make the parallelism of parallelPartitionDiscovery…
Nov 15, 2016
6f9e598
[SPARK-13027][STREAMING] Added batch time as a parameter to updateSta…
Nov 15, 2016
2afdaa9
[SPARK-18337] Complete mode memory sinks should be able to recover fr…
brkyvz Nov 15, 2016
5bcb9a7
[SPARK-18417][YARN] Define 'spark.yarn.am.port' in yarn config object
weiqingy Nov 15, 2016
1ae4652
[SPARK-18440][STRUCTURED STREAMING] Pass correct query execution to F…
tdas Nov 15, 2016
503378f
[SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir …
HyukjinKwon Nov 15, 2016
3ce057d
[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators
dongjoon-hyun Nov 15, 2016
4b35d13
[SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation
hvanhovell Nov 16, 2016
4ac9759
[SPARK-18377][SQL] warehouse path should be a static conf
cloud-fan Nov 16, 2016
95eb06b
[SPARK-18438][SPARKR][ML] spark.mlp should support RFormula.
yanboliang Nov 16, 2016
74f5c21
[SPARK-18433][SQL] Improve DataSource option keys to be more case-ins…
dongjoon-hyun Nov 16, 2016
3e01f12
[DOC][MINOR] Kafka doc: breakup into lines
lw-lin Nov 16, 2016
43a2689
[SPARK-18400][STREAMING] NPE when resharding Kinesis Stream
srowen Nov 16, 2016
e614577
[SPARK-18410][STREAMING] Add structured kafka example
uncleGen Nov 16, 2016
241e04b
[MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-…
weiqingy Nov 16, 2016
c68f1a3
[SPARK-18434][ML] Add missing ParamValidations for ML algos
zhengruifeng Nov 16, 2016
a75e3fe
[SPARK-18446][ML][DOCS] Add links to API docs for ML algos
zhengruifeng Nov 16, 2016
7569cf6
[SPARK-18420][BUILD] Fix the errors caused by lint check in Java
Nov 16, 2016
608ecc5
[SPARK-18415][SQL] Weird Plan Output when CTE used in RunnableCommand
gatorsmile Nov 16, 2016
0048ce7
[SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to b…
tdas Nov 16, 2016
bb6cdfd
[SPARK-18461][DOCS][STRUCTUREDSTREAMING] Added more information about…
tdas Nov 16, 2016
a36a76a
[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed
holdenk Nov 16, 2016
2ca8ae9
[SPARK-18186] Migrate HiveUDAFFunction to TypedImperativeAggregate fo…
liancheng Nov 16, 2016
5558998
[YARN][DOC] Increasing NodeManager's heap size with External Shuffle …
Devian-ua Nov 16, 2016
170eeb3
[SPARK-18442][SQL] Fix nullability of WrapOption.
ueshin Nov 17, 2016
07b3f04
[SPARK-18464][SQL] support old table which doesn't store schema in me…
cloud-fan Nov 17, 2016
a3cac7b
[YARN][DOC] Remove non-Yarn specific configurations from running-on-y…
weiqingy Nov 17, 2016
49b6f45
[SPARK-18365][DOCS] Improve Sample Method Documentation
bllchmbrs Nov 17, 2016
de77c67
[SPARK-17462][MLLIB]use VersionUtils to parse Spark version strings
Nov 17, 2016
cdaf4ce
[SPARK-18480][DOCS] Fix wrong links for ML guide docs
zhengruifeng Nov 17, 2016
b0aa1aa
[SPARK-18490][SQL] duplication nodename extrainfo for ShuffleExchange
Nov 17, 2016
ce13c26
[SPARK-18360][SQL] default table path of tables in default database s…
cloud-fan Nov 18, 2016
d9dd979
[SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdat…
JoshRosen Nov 18, 2016
6c6aba2
refine connect and read code
yinxusen Nov 18, 2016
51baca2
[SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactIn…
Nov 18, 2016
795e9fc
[SPARK-18457][SQL] ORC and other columnar formats using HiveShim read…
aray Nov 18, 2016
5fe0cde
refine test suite of arrow vectors
yinxusen Nov 18, 2016
a6ff7da
merge with bryan
yinxusen Nov 18, 2016
40d59ff
[SPARK-18422][CORE] Fix wholeTextFiles test to pass on Windows in Jav…
HyukjinKwon Nov 18, 2016
974de14
update supported types
yinxusen Nov 18, 2016
e5f5c29
[SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog
zsxwing Nov 19, 2016
6f7ff75
[SPARK-18505][SQL] Simplify AnalyzeColumnCommand
rxin Nov 19, 2016
2a40de4
[SPARK-18497][SS] Make ForeachSink support watermark
zsxwing Nov 19, 2016
db9fb9b
[SPARK-18448][CORE] SparkSession should implement java.lang.AutoClose…
srowen Nov 19, 2016
d5b1d5f
[SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note…
HyukjinKwon Nov 19, 2016
8b1e108
[SPARK-18353][CORE] spark.rpc.askTimeout defalut value is not 120s
srowen Nov 19, 2016
ded5fef
[SPARK-18448][CORE] Fix @since 2.1.0 on new SparkSession.close() method
srowen Nov 19, 2016
ea77c81
[SPARK-17062][MESOS] add conf option to mesos dispatcher
skonto Nov 20, 2016
856e004
[SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients i…
sethah Nov 20, 2016
d93b655
[SPARK-18458][CORE] Fix signed integer overflow problem at an express…
kiszk Nov 20, 2016
bce9a03
[SPARK-18508][SQL] Fix documentation error for DateDiff
rxin Nov 20, 2016
a64f25d
[SQL] Fix documentation for Concat and ConcatWs
rxin Nov 20, 2016
7ca7a63
[SPARK-15214][SQL] Code-generation for Generate
hvanhovell Nov 20, 2016
c528812
[SPARK-3359][BUILD][DOCS] Print examples and disable group and tparam…
HyukjinKwon Nov 20, 2016
6659ae5
Fix Mesos build break for Scala 2.10.
rxin Nov 20, 2016
b625a36
[HOTFIX][SQL] Fix DDLSuite failure.
rxin Nov 21, 2016
6585479
[SPARK-18467][SQL] Extracts method for preparing arguments from Stati…
ueshin Nov 21, 2016
e811fbf
[SPARK-18282][ML][PYSPARK] Add python clustering summaries for GMM an…
sethah Nov 21, 2016
9f262ae
[SPARK-18398][SQL] Fix nullabilities of MapObjects and ExternalMapToC…
ueshin Nov 21, 2016
07beb5d
[SPARK-18413][SQL] Add `maxConnections` JDBCOption
dongjoon-hyun Nov 21, 2016
7017687
[SPARK-18361][PYSPARK] Expose RDD localCheckpoint in PySpark
Nov 21, 2016
ddd02f5
[SPARK-18517][SQL] DROP TABLE IF EXISTS should not warn for non-exist…
dongjoon-hyun Nov 21, 2016
a2d4647
[SPARK-17765][SQL] Support for writing out user-defined type in ORC d…
HyukjinKwon Nov 21, 2016
97a8239
[SPARK-18493] Add missing python APIs: withWatermark and checkpoint t…
brkyvz Nov 22, 2016
ebeb083
[SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStrea…
lw-lin Nov 22, 2016
acb9715
[SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not …
yanboliang Nov 22, 2016
4922f9c
[SPARK-18514][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that` …
HyukjinKwon Nov 22, 2016
933a654
[SPARK-18447][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that` …
HyukjinKwon Nov 22, 2016
bb152cd
[SPARK-18519][SQL] map type can not be used in EqualTo
cloud-fan Nov 22, 2016
45ea46b
[SPARK-18504][SQL] Scalar subquery with extra group by columns return…
nsyca Nov 22, 2016
702cd40
[SPARK-18507][SQL] HiveExternalCatalog.listPartitions should only cal…
cloud-fan Nov 22, 2016
bdc8153
[SPARK-18465] Add 'IF EXISTS' clause to 'UNCACHE' to not throw except…
brkyvz Nov 22, 2016
016bc62
refine test
yinxusen Nov 22, 2016
2fd101b
[SPARK-18373][SPARK-18529][SS][KAFKA] Make failOnDataLoss=false work …
zsxwing Nov 22, 2016
9c42d4a
[SPARK-16803][SQL] SaveAsTable does not work when target table is a H…
gatorsmile Nov 22, 2016
39a1d30
[SPARK-18533] Raise correct error upon specification of schema for da…
dilipbiswal Nov 22, 2016
d0212eb
[SPARK-18530][SS][KAFKA] Change Kafka timestamp column type to Timest…
zsxwing Nov 23, 2016
982b82e
[SPARK-18501][ML][SPARKR] Fix spark.glm errors when fitting on collin…
yanboliang Nov 23, 2016
2559fb4
[SPARK-18179][SQL] Throws analysis exception with a proper message fo…
HyukjinKwon Nov 23, 2016
7e0cd1d
[SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site
srowen Nov 23, 2016
85235ed
[SPARK-18545][SQL] Verify number of hive client RPCs in PartitionedTa…
ericl Nov 23, 2016
84284e8
[SPARK-18053][SQL] compare unsafe and safe complex-type values correctly
cloud-fan Nov 23, 2016
9785ed4
[SPARK-18557] Downgrade confusing memory leak warning message
rxin Nov 23, 2016
70ad07a
[SPARK-18522][SQL] Explicit contract for column stats serialization
rxin Nov 23, 2016
f129ebc
[SPARK-18050][SQL] do not create default database if it already exists
cloud-fan Nov 23, 2016
0d1bf2b
[SPARK-18510] Fix data corruption from inferred partition column data…
brkyvz Nov 23, 2016
223fa21
[SPARK-18510][SQL] Follow up to address comments in #15951
zsxwing Nov 24, 2016
2dfabec
[SPARK-18520][ML] Add missing setXXXCol methods for BisectingKMeansMo…
zhengruifeng Nov 24, 2016
a367d5f
[SPARK-18578][SQL] Full outer join in correlated subquery returns inc…
nsyca Nov 24, 2016
f58a8aa
[SPARK-18575][WEB] Keep same style: adjust the position of driver log…
uncleGen Nov 25, 2016
f42db0c
[SPARK-18119][SPARK-CORE] Namenode safemode check is only performed o…
Nov 25, 2016
51b1c15
[SPARK-3359][BUILD][DOCS] More changes to resolve javadoc 8 errors th…
HyukjinKwon Nov 25, 2016
5ecdc7c
[SPARK-18559][SQL] Fix HLL++ with small relative error
wzhfy Nov 25, 2016
445d4d9
[SPARK-18356][ML] Improve MLKmeans Performance
ZakariaHili Nov 25, 2016
fb07bbe
[SPARK-18413][SQL][FOLLOW-UP] Use `numPartitions` instead of `maxConn…
dongjoon-hyun Nov 25, 2016
e2fb9fd
[SPARK-18436][SQL] isin causing SQL syntax error with JDBC
jiangxb1987 Nov 25, 2016
a88329d
[SPARK-18583][SQL] Fix nullability of InputFileName.
ueshin Nov 26, 2016
c4a7eef
[SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML
yanboliang Nov 26, 2016
f4a98e4
[WIP][SQL][DOC] Fix incorrect `code` tag
weiqingy Nov 26, 2016
9c03c56
[SPARK-17251][SQL] Improve `OuterReference` to be `NamedExpression`
dongjoon-hyun Nov 26, 2016
07f32c2
[SPARK-18594][SQL] Name Validation of Databases/Tables
gatorsmile Nov 28, 2016
fc2c13b
[SPARK-18482][SQL] make sure Spark can access the table metadata crea…
cloud-fan Nov 28, 2016
8714162
[SPARK-18585][SQL] Use `ev.isNull = "false"` if possible for Janino t…
ueshin Nov 28, 2016
454b804
[SPARK-18604][SQL] Make sure CollapseWindow returns the attributes in…
hvanhovell Nov 28, 2016
f075cd9
[SPARK-18118][SQL] fix a compilation error due to nested JavaBeans
kiszk Nov 28, 2016
70dfdcb
[SPARK-18118][SQL] fix a compilation error due to nested JavaBeans\nR…
hvanhovell Nov 28, 2016
9f273c5
[SPARK-17783][SQL] Hide Credentials in CREATE and DESC FORMATTED/EXTE…
gatorsmile Nov 28, 2016
38e2982
[SPARK-18597][SQL] Do not push-down join conditions to the right side…
hvanhovell Nov 28, 2016
d31ff9b
[SPARK-17732][SQL] Revert ALTER TABLE DROP PARTITION should support c…
cloud-fan Nov 28, 2016
237c3b9
[SPARK-18535][UI][YARN] Redact sensitive information from Spark logs …
markgrover Nov 28, 2016
eba7277
[SPARK-18602] Set the version of org.codehaus.janino:commons-compiler…
yhuai Nov 28, 2016
1856428
[SQL][MINOR] DESC should use 'Catalog' as partition provider
cloud-fan Nov 28, 2016
0f5f52a
[SPARK-16282][SQL] Implement percentile SQL function.
jiangxb1987 Nov 28, 2016
ad67993
[SPARK-17680][SQL][TEST] Added test cases for InMemoryRelation
kiszk Nov 28, 2016
8b1609b
[SPARK-18117][CORE] Add test for TaskSetBlacklist
squito Nov 28, 2016
05f7c6f
[SPARK-18408][ML] API Improvements for LSH
Nov 28, 2016
2e80990
[SPARK-18403][SQL] Fix unsafe data false sharing issue in ObjectHashA…
liancheng Nov 29, 2016
71352c9
[SPARK-18523][PYSPARK] Make SparkContext.stop more reliable
kxepal Nov 29, 2016
e64a204
[SPARK-16282][SQL] Follow-up: remove "percentile" from temp function …
lins05 Nov 29, 2016
1633ff3
[SPARK-18588][SS][KAFKA] Ignore the flaky kafka test
zsxwing Nov 29, 2016
8b325b1
[SPARK-18547][CORE] Propagate I/O encryption key when executors regis…
Nov 29, 2016
d449988
[SPARK-18058][SQL][TRIVIAL] Use dataType.sameResult(...) instead equa…
hvanhovell Nov 29, 2016
e2318ed
[SPARK-18544][SQL] Append with df.saveAsTable writes data to wrong lo…
ericl Nov 29, 2016
3c0beea
[SPARK-18339][SPARK-18513][SQL] Don't push down current_timestamp for…
Nov 29, 2016
7d5cb3a
[SPARK-18188] add checksum for blocks of broadcast
Nov 29, 2016
f830bb9
[SPARK-3359][DOCS] Make javadoc8 working for unidoc/genjavadoc compat…
HyukjinKwon Nov 29, 2016
f045d9d
[MINOR][DOCS] Updates to the Accumulator example in the programming g…
Nov 29, 2016
1a87009
[SPARK-18615][DOCS] Switch to multi-line doc to avoid a genjavadoc bu…
HyukjinKwon Nov 29, 2016
95f7985
[SPARK-18592][ML] Move DT/RF/GBT Param setter methods to subclasses
yanboliang Nov 29, 2016
f643fe4
[SPARK-18498][SQL] Revise HDFSMetadataLog API for better testing
Nov 29, 2016
d57a594
[SPARK-18429][SQL] implement a new Aggregate for CountMinSketch
Nov 29, 2016
f8878a4
[SPARK-18631][SQL] Changed ExchangeCoordinator re-partitioning to avo…
markhamstra Nov 29, 2016
3600635
[SPARK-18614][SQL] Incorrect predicate pushdown from ExistenceJoin
nsyca Nov 29, 2016
9a02f68
[SPARK-18553][CORE] Fix leak of TaskSetManager following executor loss
JoshRosen Nov 30, 2016
c3d08e2
[SPARK-18516][SQL] Split state and progress in streaming
tdas Nov 30, 2016
9b670bc
[SPARK-18319][ML][QA2.1] 2.1 QA: API: Experimental, DeveloperApi, fin…
YY-OnCall Nov 30, 2016
af9789a
[SPARK-18632][SQL] AggregateFunction should not implement ImplicitCas…
hvanhovell Nov 30, 2016
489845f
[SPARK-18145] Update documentation for hive partition management in 2.1
ericl Nov 30, 2016
4c82ca8
[SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of PySpark
zjffdu Nov 30, 2016
bc09a2b
[SPARK-18516][STRUCTURED STREAMING] Follow up PR to add StreamingQuer…
tdas Nov 30, 2016
a1d9138
[SPARK-17680][SQL][TEST] Added a Testcase for Verifying Unicode Chara…
gatorsmile Nov 30, 2016
879ba71
[SPARK-18622][SQL] Fix the datatype of the Sum aggregate function
hvanhovell Nov 30, 2016
56c82ed
[SPARK-18617][CORE][STREAMING] Close "kryo auto pick" feature for Spa…
uncleGen Nov 30, 2016
fe854f2
[SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark for QuantileD…
techaddict Nov 30, 2016
c5a64d7
[SPARK-18612][MLLIB] Delete broadcasted variable in LBFGS CostFun
Nov 30, 2016
2eb093d
[SPARK-17897][SQL] Fixed IsNotNull Constraint Inference Rule
gatorsmile Nov 30, 2016
c24076d
[SPARK-17932][SQL] Support SHOW TABLES EXTENDED LIKE 'identifier_with…
jiangxb1987 Nov 30, 2016
3f03c90
[SPARK-18220][SQL] read Hive orc table with varchar column should not…
cloud-fan Nov 30, 2016
bc95ea0
[SPARK][EXAMPLE] Added missing semicolon in quick-start-guide example
Nov 30, 2016
c51c772
[SPARK-18640] Add synchronization to TaskScheduler.runningTasksByExec…
JoshRosen Nov 30, 2016
60022bf
[SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs, docs
yanboliang Nov 30, 2016
f135b70
[SPARK-18251][SQL] the type of Dataset can't be Option of non-flat type
cloud-fan Nov 30, 2016
93e9d88
[SPARK-18546][CORE] Fix merging shuffle spills when using encryption.
Nov 30, 2016
c4979f6
[SPARK-18655][SS] Ignore Structured Streaming 2.0.2 logs in history s…
zsxwing Dec 1, 2016
0a81121
[SPARK-18617][SPARK-18560][TEST] Fix flaky test: StreamingContextSuit…
zsxwing Dec 1, 2016
2eb6764
[SPARK-18476][SPARKR][ML] SparkR Logistic Regression should should su…
wangmiao1981 Dec 1, 2016
b28fe4a
[SPARK-18538][SQL] Fix Concurrent Table Fetching Using DataFrameReade…
gatorsmile Dec 1, 2016
88f559f
[SPARK-18635][SQL] Partition name/values not escaped correctly in som…
ericl Dec 1, 2016
dbf842b
[SPARK-18666][WEB UI] Remove the codes checking deprecated config spa…
viirya Dec 1, 2016
2ab8551
[SPARK-18645][DEPLOY] Fix spark-daemon.sh arguments error lead to thr…
wangyum Dec 1, 2016
e653484
[SPARK-18674][SQL] improve the error message of using join
cloud-fan Dec 1, 2016
78bb7f8
[SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWrapper
techaddict Dec 1, 2016
d76dcc9
add support for string
yinxusen Dec 1, 2016
086b0c8
[SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSui…
zsxwing Dec 1, 2016
9a368b5
fix type error
yinxusen Dec 1, 2016
4744cd3
Merge branch 'master' into wip-toPandas_with_arrow-SPARK-13534
yinxusen Dec 1, 2016
1977f25
add string support
yinxusen Dec 2, 2016
38d7420
fix string type convert
yinxusen Dec 3, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
Please review http://spark.apache.org/contributing.html before opening a pull request.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/deps
python/pyspark/python
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Contributing to Spark

*Before opening a pull request*, review the
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
- Is the change being proposed clearly explained and motivated?

When you contribute code, you affirm that the contribution is your original work and that you
Expand Down
3 changes: 0 additions & 3 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
This product includes/uses ASM (http://asm.ow2.org/),
Copyright (c) 2000-2007 INRIA, France Telecom.

This product includes/uses org.json (http://www.json.org/java/index.html),
Copyright (c) 2002 JSON.org

This product includes/uses JLine (http://jline.sourceforge.net/),
Copyright (c) 2002-2006, Marc Prud'hommeaux <[email protected]>.

Expand Down
91 changes: 91 additions & 0 deletions R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# SparkR CRAN Release

To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
`[email protected]` community and R package maintainer on this.

### Release

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `check-cran.sh` is running `R CMD check`, it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release.

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Once everything is in place, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
```

For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check

### Testing: build package manually

To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.

Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.

#### Build source package

To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
```

(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)

Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.

For example, this should be the content of the source package:

```sh
DESCRIPTION R inst tests
NAMESPACE build man vignettes

inst/doc/
sparkr-vignettes.html
sparkr-vignettes.Rmd
sparkr-vignettes.Rman

build/
vignette.rds

man/
*.Rd files...

vignettes/
sparkr-vignettes.Rmd
```

#### Test source package

To install, run this:

```sh
R CMD INSTALL SparkR_2.1.0.tar.gz
```

With "2.1.0" replaced with the version of SparkR.

This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:

```R
library(SparkR)
vignette("sparkr-vignettes", package="SparkR")
```

#### Build binary package

To build binary package locally, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
```

For example, this should be the content of the binary package:

```sh
DESCRIPTION Meta R html tests
INDEX NAMESPACE help profile worker
```
10 changes: 5 additions & 5 deletions R/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
Example:
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
Expand Down Expand Up @@ -46,19 +46,19 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
sparkR.session()
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
Expand Down
33 changes: 27 additions & 6 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,27 @@ if [ ! -z "$R_HOME" ]
fi
echo "USING R_HOME = $R_HOME"

# Build the latest docs
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
# Build source package with vignettes
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
. "${SPARK_HOME}"/bin/load-spark-env.sh
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ -d "$SPARK_JARS_DIR" ]; then
# Build a zip file containing the source package with vignettes
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Error Spark JARs not found in $SPARK_HOME"
exit 1
fi

# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
Expand All @@ -54,11 +70,16 @@ fi

if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
fi

echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"

"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz

if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
then
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi
popd > /dev/null
19 changes: 1 addition & 18 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.

# After running this script the html docs can be found in
# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
Expand Down Expand Up @@ -52,21 +52,4 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit

popd

# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi

popd
11 changes: 7 additions & 4 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: SparkR
Type: Package
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2016-08-27
Version: 2.1.0
Date: 2016-11-06
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "[email protected]"),
person("Xiangrui", "Meng", role = "aut",
Expand All @@ -11,14 +11,16 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "[email protected]"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports
BugReports: http://spark.apache.org/contributing.html
Depends:
R (>= 3.0),
methods
Suggests:
testthat,
e1071,
survival
survival,
knitr,
rmarkdown
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
Expand Down Expand Up @@ -48,3 +50,4 @@ Collate:
'utils.R'
'window.R'
RoxygenNote: 5.0.1
VignetteBuilder: knitr
16 changes: 14 additions & 2 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,10 @@ exportMethods("glm",
"spark.isoreg",
"spark.gaussianMixture",
"spark.als",
"spark.kstest")
"spark.kstest",
"spark.logit",
"spark.randomForest",
"spark.gbt")

# Job group lifecycle management methods
export("setJobGroup",
Expand Down Expand Up @@ -124,6 +127,7 @@ exportMethods("arrange",
"selectExpr",
"show",
"showDF",
"storageLevel",
"subset",
"summarize",
"summary",
Expand Down Expand Up @@ -348,7 +352,11 @@ export("as.DataFrame",
"uncacheTable",
"print.summary.GeneralizedLinearRegressionModel",
"read.ml",
"print.summary.KSTest")
"print.summary.KSTest",
"print.summary.RandomForestRegressionModel",
"print.summary.RandomForestClassificationModel",
"print.summary.GBTRegressionModel",
"print.summary.GBTClassificationModel")

export("structField",
"structField.jobj",
Expand All @@ -373,6 +381,10 @@ S3method(print, structField)
S3method(print, structType)
S3method(print, summary.GeneralizedLinearRegressionModel)
S3method(print, summary.KSTest)
S3method(print, summary.RandomForestRegressionModel)
S3method(print, summary.RandomForestClassificationModel)
S3method(print, summary.GBTRegressionModel)
S3method(print, summary.GBTClassificationModel)
S3method(structField, character)
S3method(structField, jobj)
S3method(structType, jobj)
Expand Down
Loading