Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
292 commits
Select commit Hold shift + click to select a range
dcbc426
[SPARK-17964][SPARKR] Enable SparkR with Mesos client mode and cluste…
susanxhuynh Nov 5, 2016
e9f1d4a
[MINOR][DOCUMENTATION] Fix some minor descriptions in functions consi…
HyukjinKwon Nov 6, 2016
c42301f
[SPARK-18276][ML] ML models should copy the training summary and set …
sethah Nov 6, 2016
dcbf3fd
[SPARK-17854][SQL] rand/randn allows null/long as input seed
HyukjinKwon Nov 6, 2016
d2f2cf6
[SPARK-18210][ML] Pipeline.copy does not create an instance with the …
wojtek-szymanski Nov 6, 2016
a8fbcdb
[SPARK-18269][SQL] CSV datasource should read null properly when sche…
HyukjinKwon Nov 7, 2016
9c78d35
[SPARK-18173][SQL] data source tables should support truncating parti…
cloud-fan Nov 7, 2016
9ebd5e5
[SPARK-18167][SQL] Disable flaky hive partition pruning test.
rxin Nov 7, 2016
2fa1a63
[SPARK-18296][SQL] Use consistent naming for expression test suites
rxin Nov 7, 2016
4101029
[SPARK-16904][SQL] Removal of Hive Built-in Hash Functions and TestHi…
gatorsmile Nov 7, 2016
df40ee2
[SPARK-18125][SQL] Fix a compilation error in codegen due to splitExp…
viirya Nov 7, 2016
6b33290
[SPARK-18291][SPARKR][ML] SparkR glm predict should output original l…
yanboliang Nov 7, 2016
7a84edb
[SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whethe…
tdas Nov 7, 2016
d1eac3e
[SPARK-17108][SQL] Fix BIGINT and INT comparison failure in spark sql
weiqingy Nov 7, 2016
9873d57
[SPARK-17490][SQL] Optimize SerializeFromObject() for a primitive array
kiszk Nov 7, 2016
4af82d5
[SPARK-18295][SQL] Make to_json function null safe (matching it to fr…
HyukjinKwon Nov 8, 2016
29f59c7
[SPARK-18086] Add support for Hive session vars.
rdblue Nov 8, 2016
4943929
[SPARK-18261][STRUCTURED STREAMING] Add statistics to MemorySink for …
lw-lin Nov 8, 2016
4cb4e5f
[SPARK-18217][SQL] Disallow creating permanent views based on tempora…
gatorsmile Nov 8, 2016
c8879bf
[SPARK-16575][CORE] partition calculation mismatch with sc.binaryFiles
fidato13 Nov 8, 2016
ee400f6
[SPARK-18207][SQL] Fix a compilation error due to HashExpression.doGe…
kiszk Nov 8, 2016
3b360e5
[SPARK-18137][SQL] Fix RewriteDistinctAggregates UnresolvedException …
Nov 8, 2016
ef6b6d3
[SPARK-13770][DOCUMENTATION][ML] Document the ML feature Interaction
Nov 8, 2016
9595a71
[SPARK-18346][SQL] TRUNCATE TABLE should fail if no partition is matc…
cloud-fan Nov 8, 2016
876eee2
[SPARK-18357] Fix yarn files/archive broken issue andd unit tests
kishorvpatil Nov 8, 2016
21bbf94
[SPARK-17748][ML] Minor cleanups to one-pass linear regression with e…
jkbradley Nov 8, 2016
ba80eaf
[SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBac…
zsxwing Nov 8, 2016
988f908
[SPARK-18342] Make rename failures fatal in HDFSBackedStateStore
brkyvz Nov 8, 2016
98dd7ac
[SPARK-18239][SPARKR] Gradient Boosted Tree for R
felixcheung Nov 9, 2016
0dc14f1
[SPARK-18333][SQL] Revert hacks in parquet and orc reader to support …
ericl Nov 9, 2016
f672083
[SPARK-18368] Fix regexp_replace with task serialization.
rdblue Nov 9, 2016
b89c38b
[SPARK-17659][SQL] Partitioned View is Not Supported By SHOW CREATE T…
gatorsmile Nov 9, 2016
ac441d1
[SPARK-18292][SQL] LogicalPlanToSQLSuite should not use resource depe…
dongjoon-hyun Nov 9, 2016
5bd31dc
[SPARK-16808][CORE] History Server main page does not honor APPLICATI…
vijoshi Nov 9, 2016
626f6d6
Revert "[SPARK-18368] Fix regexp_replace with task serialization."
yhuai Nov 9, 2016
80f5851
[SPARK-18368][SQL] Fix regexp replace when serialized
rdblue Nov 9, 2016
4424c90
[SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelatio…
hvanhovell Nov 9, 2016
b7d2925
[SPARK-17829][SQL] Stable format for offset log
Nov 9, 2016
8c489a7
[SPARK-18147][SQL] do not fail for very complex aggregator result type
cloud-fan Nov 10, 2016
b54d71b
[MINOR][PYSPARK] Improve error message when running PySpark with diff…
viirya Nov 10, 2016
62236b9
[SPARK-18262][BUILD][SQL] JSON.org license is now CatX
srowen Nov 10, 2016
be3933d
[SPARK-17993][SQL] Fix Parquet log output redirection
Nov 10, 2016
c602894
[SPARK-17990][SPARK-18302][SQL] correct several partition related beh…
cloud-fan Nov 10, 2016
064d431
[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasourc…
ericl Nov 11, 2016
51dca61
[SPARK-18401][SPARKR][ML] SparkR random forest should support output …
yanboliang Nov 11, 2016
00c9c7d
[SPARK-17843][WEB UI] Indicate event logs pending for processing on h…
vijoshi Nov 11, 2016
465e4b4
[SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with pare…
dongjoon-hyun Nov 11, 2016
87820da
[SPARK-18387][SQL] Add serialization to checkEvaluation.
rdblue Nov 11, 2016
c2ebda4
[SPARK-18264][SPARKR] build vignettes with package, update vignettes …
felixcheung Nov 11, 2016
56859c0
[SPARK-18060][ML] Avoid unnecessary computation for MLOR
sethah Nov 12, 2016
8933551
[SPARK-18375][SPARK-18383][BUILD][CORE] Upgrade netty to 4.0.42.Final
witgo Nov 12, 2016
b2ba83d
[SPARK-14077][ML][FOLLOW-UP] Minor refactor and cleanup for NaiveBayes
yanboliang Nov 12, 2016
6fae424
[SPARK-18418] Fix flags for make_binary_release for hadoop profile
holdenk Nov 12, 2016
0c69224
[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Stru…
Nov 14, 2016
8fc6455
[SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms…
yanboliang Nov 14, 2016
12bde11
[SPARK-18382][WEBUI] "run at null:-1" in UI when no file/line info in…
srowen Nov 14, 2016
d554c02
[SPARK-18166][MLLIB] Fix Poisson GLM bug due to wrong requirement of …
actuaryzhang Nov 14, 2016
518dc1e
[SPARK-18396][HISTORYSERVER] Duration" column makes search result con…
WangTaoTheTonic Nov 14, 2016
c07fe1c
[SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB
moomindani Nov 14, 2016
3c623d2
[SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store
tdas Nov 14, 2016
db691f0
[SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis
koeninger Nov 14, 2016
cff7a70
[SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPer…
zhengruifeng Nov 14, 2016
ae66799
[SPARK-17348][SQL] Incorrect results from subquery transformation
nsyca Nov 14, 2016
27999b3
[SPARK-18124] Observed delay based Event Time Watermarks
marmbrus Nov 15, 2016
649c15f
[SPARK-18428][DOC] Update docs for GraphX
zhengruifeng Nov 15, 2016
a0125fd
[SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocatio…
gatorsmile Nov 15, 2016
0762c0c
[SPARK-18427][DOC] Update docs of mllib.KMeans
zhengruifeng Nov 15, 2016
0af94e7
[SPARK-18300][SQL] Do not apply foldable propagation with expand as a…
hvanhovell Nov 15, 2016
5f7a9af
[SPARK-13027][STREAMING] Added batch time as a parameter to updateSta…
Nov 15, 2016
f13a33b
[SPARK-18337] Complete mode memory sinks should be able to recover fr…
brkyvz Nov 15, 2016
b424dc9
[SPARK-18440][STRUCTURED STREAMING] Pass correct query execution to F…
tdas Nov 15, 2016
e469d3b
[SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir …
HyukjinKwon Nov 15, 2016
1126c31
[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators
dongjoon-hyun Nov 15, 2016
175c478
[SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation
hvanhovell Nov 16, 2016
436ae20
[SPARK-18377][SQL] warehouse path should be a static conf
cloud-fan Nov 16, 2016
7b57e48
[SPARK-18438][SPARKR][ML] spark.mlp should support RFormula.
yanboliang Nov 16, 2016
b18c5a9
[SPARK-18433][SQL] Improve DataSource option keys to be more case-ins…
dongjoon-hyun Nov 16, 2016
4567db9
[DOC][MINOR] Kafka doc: breakup into lines
lw-lin Nov 16, 2016
a94659c
[SPARK-18400][STREAMING] NPE when resharding Kinesis Stream
srowen Nov 16, 2016
6b2301b
[SPARK-18410][STREAMING] Add structured kafka example
uncleGen Nov 16, 2016
8208470
[MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-…
weiqingy Nov 16, 2016
6b6eb4e
[SPARK-18434][ML] Add missing ParamValidations for ML algos
zhengruifeng Nov 16, 2016
416bc3d
[SPARK-18446][ML][DOCS] Add links to API docs for ML algos
zhengruifeng Nov 16, 2016
b0ae871
[SPARK-18420][BUILD] Fix the errors caused by lint check in Java
Nov 16, 2016
c0dbe08
[SPARK-18415][SQL] Weird Plan Output when CTE used in RunnableCommand
gatorsmile Nov 16, 2016
b86e962
[SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to b…
tdas Nov 16, 2016
3d4756d
[SPARK-18461][DOCS][STRUCTUREDSTREAMING] Added more information about…
tdas Nov 16, 2016
523abfe
[YARN][DOC] Increasing NodeManager's heap size with External Shuffle …
Devian-ua Nov 16, 2016
9515793
[SPARK-18442][SQL] Fix nullability of WrapOption.
ueshin Nov 17, 2016
6a3cbbc
[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed
holdenk Nov 16, 2016
014fcee
[SPARK-18464][SQL] support old table which doesn't store schema in me…
cloud-fan Nov 17, 2016
2ee4fc8
[YARN][DOC] Remove non-Yarn specific configurations from running-on-y…
weiqingy Nov 17, 2016
4fcecb4
[SPARK-18365][DOCS] Improve Sample Method Documentation
bllchmbrs Nov 17, 2016
42777b1
[SPARK-17462][MLLIB]use VersionUtils to parse Spark version strings
Nov 17, 2016
536a215
[SPARK-18480][DOCS] Fix wrong links for ML guide docs
zhengruifeng Nov 17, 2016
9787988
[SPARK-18490][SQL] duplication nodename extrainfo for ShuffleExchange
Nov 17, 2016
fc466be
[SPARK-18360][SQL] default table path of tables in default database s…
cloud-fan Nov 18, 2016
e8b1955
[SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdat…
JoshRosen Nov 18, 2016
5912c19
[SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactIn…
Nov 18, 2016
ec622eb
[SPARK-18457][SQL] ORC and other columnar formats using HiveShim read…
aray Nov 18, 2016
6717981
[SPARK-18422][CORE] Fix wholeTextFiles test to pass on Windows in Jav…
HyukjinKwon Nov 18, 2016
136f687
[SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog
zsxwing Nov 19, 2016
4b1df0e
[SPARK-18505][SQL] Simplify AnalyzeColumnCommand
rxin Nov 19, 2016
b4bad04
[SPARK-18497][SS] Make ForeachSink support watermark
zsxwing Nov 19, 2016
693401b
[SPARK-18448][CORE] SparkSession should implement java.lang.AutoClose…
srowen Nov 19, 2016
4b396a6
[SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note…
HyukjinKwon Nov 19, 2016
30a6fbb
[SPARK-18353][CORE] spark.rpc.askTimeout defalut value is not 120s
srowen Nov 19, 2016
15ad3a3
[SPARK-18448][CORE] Fix @since 2.1.0 on new SparkSession.close() method
srowen Nov 19, 2016
15eb86c
[SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients i…
sethah Nov 20, 2016
b0b2f10
[SPARK-18458][CORE] Fix signed integer overflow problem at an express…
kiszk Nov 20, 2016
94a9eed
[SPARK-18508][SQL] Fix documentation error for DateDiff
rxin Nov 20, 2016
063da0c
[SQL] Fix documentation for Concat and ConcatWs
rxin Nov 20, 2016
bc3e7b3
[SPARK-3359][BUILD][DOCS] Print examples and disable group and tparam…
HyukjinKwon Nov 20, 2016
cffaf50
[SPARK-17732][SQL] Revert ALTER TABLE DROP PARTITION should support c…
hvanhovell Nov 20, 2016
f8662db
[HOTFIX][SQL] Fix DDLSuite failure.
rxin Nov 21, 2016
fb4e635
[SPARK-18467][SQL] Extracts method for preparing arguments from Stati…
ueshin Nov 21, 2016
31002e4
[SPARK-18282][ML][PYSPARK] Add python clustering summaries for GMM an…
sethah Nov 21, 2016
251a992
[SPARK-18398][SQL] Fix nullabilities of MapObjects and ExternalMapToC…
ueshin Nov 21, 2016
b0a73c9
[SPARK-18517][SQL] DROP TABLE IF EXISTS should not warn for non-exist…
dongjoon-hyun Nov 21, 2016
406f339
[SPARK-18361][PYSPARK] Expose RDD localCheckpoint in PySpark
Nov 21, 2016
2afc18b
[SPARK-17765][SQL] Support for writing out user-defined type in ORC d…
HyukjinKwon Nov 21, 2016
6dbe448
[SPARK-18493] Add missing python APIs: withWatermark and checkpoint t…
brkyvz Nov 22, 2016
aaa2a17
[SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStrea…
lw-lin Nov 22, 2016
c702140
[SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not …
yanboliang Nov 22, 2016
63aa01f
[SPARK-18514][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that` …
HyukjinKwon Nov 22, 2016
36cd10d
[SPARK-18447][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that` …
HyukjinKwon Nov 22, 2016
0e60e4b
[SPARK-18519][SQL] map type can not be used in EqualTo
cloud-fan Nov 22, 2016
0e624e9
[SPARK-18504][SQL] Scalar subquery with extra group by columns return…
nsyca Nov 22, 2016
fa36013
[SPARK-18507][SQL] HiveExternalCatalog.listPartitions should only cal…
cloud-fan Nov 22, 2016
fb2ea54
[SPARK-18465] Add 'IF EXISTS' clause to 'UNCACHE' to not throw except…
brkyvz Nov 22, 2016
bd338f6
[SPARK-18373][SPARK-18529][SS][KAFKA] Make failOnDataLoss=false work …
zsxwing Nov 22, 2016
64b9de9
[SPARK-16803][SQL] SaveAsTable does not work when target table is a H…
gatorsmile Nov 22, 2016
4b96ffb
[SPARK-18533] Raise correct error upon specification of schema for da…
dilipbiswal Nov 22, 2016
3be2d1e
[SPARK-18530][SS][KAFKA] Change Kafka timestamp column type to Timest…
zsxwing Nov 23, 2016
fc5fee8
[SPARK-18501][ML][SPARKR] Fix spark.glm errors when fitting on collin…
yanboliang Nov 23, 2016
fabb5ae
[SPARK-18179][SQL] Throws analysis exception with a proper message fo…
HyukjinKwon Nov 23, 2016
5f198d2
[SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site
srowen Nov 23, 2016
ebeb051
[SPARK-18053][SQL] compare unsafe and safe complex-type values correctly
cloud-fan Nov 23, 2016
539c193
[SPARK-18545][SQL] Verify number of hive client RPCs in PartitionedTa…
ericl Nov 23, 2016
e11d7c6
[SPARK-18557] Downgrade confusing memory leak warning message
rxin Nov 23, 2016
599dac1
[SPARK-18522][SQL] Explicit contract for column stats serialization
rxin Nov 23, 2016
835f03f
[SPARK-18050][SQL] do not create default database if it already exists
cloud-fan Nov 23, 2016
15d2cf2
[SPARK-18510] Fix data corruption from inferred partition column data…
brkyvz Nov 23, 2016
27d81d0
[SPARK-18510][SQL] Follow up to address comments in #15951
zsxwing Nov 24, 2016
04ec74f
[SPARK-18520][ML] Add missing setXXXCol methods for BisectingKMeansMo…
zhengruifeng Nov 24, 2016
a7f4145
[SPARK-18578][SQL] Full outer join in correlated subquery returns inc…
nsyca Nov 24, 2016
57dbc68
[SPARK-18575][WEB] Keep same style: adjust the position of driver log…
uncleGen Nov 25, 2016
a49dfa9
[SPARK-18119][SPARK-CORE] Namenode safemode check is only performed o…
Nov 25, 2016
69856f2
[SPARK-3359][BUILD][DOCS] More changes to resolve javadoc 8 errors th…
HyukjinKwon Nov 25, 2016
b5afdac
[SPARK-18559][SQL] Fix HLL++ with small relative error
wzhfy Nov 25, 2016
906d82c
[SPARK-18436][SQL] isin causing SQL syntax error with JDBC
jiangxb1987 Nov 25, 2016
da66b97
[SPARK-18583][SQL] Fix nullability of InputFileName.
ueshin Nov 26, 2016
830ee13
[SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML
yanboliang Nov 26, 2016
ff69933
[WIP][SQL][DOC] Fix incorrect `code` tag
weiqingy Nov 26, 2016
9c54957
[SPARK-17251][SQL] Improve `OuterReference` to be `NamedExpression`
dongjoon-hyun Nov 26, 2016
1e8fbef
[SPARK-18594][SQL] Name Validation of Databases/Tables
gatorsmile Nov 28, 2016
6b77889
[SPARK-18482][SQL] make sure Spark can access the table metadata crea…
cloud-fan Nov 28, 2016
886f880
[SPARK-18585][SQL] Use `ev.isNull = "false"` if possible for Janino t…
ueshin Nov 28, 2016
d6e027e
[SPARK-18604][SQL] Make sure CollapseWindow returns the attributes in…
hvanhovell Nov 28, 2016
712bd5a
[SPARK-18118][SQL] fix a compilation error due to nested JavaBeans
kiszk Nov 28, 2016
e449f75
[SPARK-18118][SQL] fix a compilation error due to nested JavaBeans
hvanhovell Nov 28, 2016
a9d4feb
[SPARK-17783][SQL] Hide Credentials in CREATE and DESC FORMATTED/EXTE…
gatorsmile Nov 28, 2016
32b259f
[SPARK-18597][SQL] Do not push-down join conditions to the right side…
hvanhovell Nov 28, 2016
34ad4d5
[SPARK-18602] Set the version of org.codehaus.janino:commons-compiler…
yhuai Nov 28, 2016
4d79478
[SQL][MINOR] DESC should use 'Catalog' as partition provider
cloud-fan Nov 28, 2016
81e3f97
[SPARK-16282][SQL] Implement percentile SQL function.
jiangxb1987 Nov 28, 2016
b386943
[SPARK-17680][SQL][TEST] Added test cases for InMemoryRelation
kiszk Nov 28, 2016
80aabc0
Preparing Spark release v2.1.0-rc1
pwendell Nov 28, 2016
75d73d1
Preparing development version 2.1.1-SNAPSHOT
pwendell Nov 28, 2016
cdf315b
[SPARK-18408][ML] API Improvements for LSH
Nov 28, 2016
c46928f
[SPARK-18523][PYSPARK] Make SparkContext.stop more reliable
kxepal Nov 29, 2016
a0c1c69
[SPARK-16282][SQL] Follow-up: remove "percentile" from temp function …
lins05 Nov 29, 2016
45e2b3c
[SPARK-18588][SS][KAFKA] Ignore the flaky kafka test
zsxwing Nov 29, 2016
c4cbdc8
[SPARK-18547][CORE] Propagate I/O encryption key when executors regis…
Nov 29, 2016
1759cf6
[SPARK-18058][SQL][TRIVIAL] Use dataType.sameResult(...) instead equa…
hvanhovell Nov 29, 2016
27a1a5c
[SPARK-18544][SQL] Append with df.saveAsTable writes data to wrong lo…
ericl Nov 29, 2016
ea6957d
[SPARK-18339][SPARK-18513][SQL] Don't push down current_timestamp for…
Nov 29, 2016
06a56df
[SPARK-18188] add checksum for blocks of broadcast
Nov 29, 2016
84b2af2
[SPARK-3359][DOCS] Make javadoc8 working for unidoc/genjavadoc compat…
HyukjinKwon Nov 29, 2016
124944a
[MINOR][DOCS] Updates to the Accumulator example in the programming g…
Nov 29, 2016
086a3bd
[SPARK-18615][DOCS] Switch to multi-line doc to avoid a genjavadoc bu…
HyukjinKwon Nov 29, 2016
d3aaed2
[SPARK-18592][ML] Move DT/RF/GBT Param setter methods to subclasses
yanboliang Nov 29, 2016
e8ca1ae
[SPARK-18498][SQL] Revise HDFSMetadataLog API for better testing
Nov 29, 2016
68e8d24
[SPARK-18614][SQL] Incorrect predicate pushdown from ExistenceJoin
nsyca Nov 29, 2016
045ae29
[SPARK-18553][CORE] Fix leak of TaskSetManager following executor loss
JoshRosen Nov 30, 2016
28b57c8
[SPARK-18516][SQL] Split state and progress in streaming
tdas Nov 30, 2016
eb0b363
[SPARK-18319][ML][QA2.1] 2.1 QA: API: Experimental, DeveloperApi, fin…
YY-OnCall Nov 30, 2016
55b1142
[SPARK-18145] Update documentation for hive partition management in 2.1
ericl Nov 30, 2016
b95aad7
[SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of PySpark
zjffdu Nov 30, 2016
e780733
[SPARK-18516][STRUCTURED STREAMING] Follow up PR to add StreamingQuer…
tdas Nov 30, 2016
a5ec2a7
[SPARK-17680][SQL][TEST] Added a Testcase for Verifying Unicode Chara…
gatorsmile Nov 30, 2016
8cd466e
[SPARK-18622][SQL] Fix the datatype of the Sum aggregate function
hvanhovell Nov 30, 2016
5e4afbf
[SPARK-18617][CORE][STREAMING] Close "kryo auto pick" feature for Spa…
uncleGen Nov 30, 2016
7043c6b
[SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark for QuantileD…
techaddict Nov 30, 2016
05ba5ee
[SPARK-18612][MLLIB] Delete broadcasted variable in LBFGS CostFun
Nov 30, 2016
6e044ab
[SPARK-17897][SQL] Fixed IsNotNull Constraint Inference Rule
gatorsmile Nov 30, 2016
3de93fb
[SPARK-18220][SQL] read Hive orc table with varchar column should not…
cloud-fan Nov 30, 2016
eae85da
[SPARK][EXAMPLE] Added missing semicolon in quick-start-guide example
Nov 30, 2016
7c0e296
[SPARK-18640] Add synchronization to TaskScheduler.runningTasksByExec…
JoshRosen Nov 30, 2016
f542df3
[SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs, docs
yanboliang Nov 30, 2016
9e96ac5
[SPARK-18251][SQL] the type of Dataset can't be Option of non-flat type
cloud-fan Nov 30, 2016
c2c2fdc
[SPARK-18546][CORE] Fix merging shuffle spills when using encryption.
Nov 30, 2016
6e2e987
[SPARK-18655][SS] Ignore Structured Streaming 2.0.2 logs in history s…
zsxwing Dec 1, 2016
7d45967
[SPARK-18617][SPARK-18560][TEST] Fix flaky test: StreamingContextSuit…
zsxwing Dec 1, 2016
e8d8e35
[SPARK-18476][SPARKR][ML] SparkR Logistic Regression should should su…
wangmiao1981 Dec 1, 2016
9dc3ef6
[SPARK-18635][SQL] Partition name/values not escaped correctly in som…
ericl Dec 1, 2016
8579ab5
[SPARK-18666][WEB UI] Remove the codes checking deprecated config spa…
viirya Dec 1, 2016
cbbe217
[SPARK-18645][DEPLOY] Fix spark-daemon.sh arguments error lead to thr…
wangyum Dec 1, 2016
6916ddc
[SPARK-18674][SQL] improve the error message of using join
cloud-fan Dec 1, 2016
4c673c6
[SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWrapper
techaddict Dec 1, 2016
4746674
[SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSui…
zsxwing Dec 1, 2016
2d2e801
[SPARK-18639] Build only a single pip package
rxin Dec 2, 2016
2f91b01
[SPARK-18141][SQL] Fix to quote column names in the predicate clause …
sureshthalamati Dec 2, 2016
b9eb100
[SPARK-18538][SQL][BACKPORT-2.1] Fix Concurrent Table Fetching Using …
gatorsmile Dec 2, 2016
fce1be6
[SPARK-18284][SQL] Make ExpressionEncoder.serializer.nullable precise
kiszk Dec 2, 2016
0f0903d
[SPARK-18647][SQL] do not put provider in table properties for Hive s…
cloud-fan Dec 2, 2016
a7f8ebb
[SPARK-17213][SQL] Disable Parquet filter push-down for string and bi…
liancheng Dec 2, 2016
65e896a
[SPARK-18679][SQL] Fix regression in file listing performance for non…
ericl Dec 2, 2016
415730e
[SPARK-18419][SQL] `JDBCRelation.insert` should not remove Spark options
dongjoon-hyun Dec 2, 2016
e374b24
[SPARK-18659][SQL] Incorrect behaviors in overwrite table for datasou…
ericl Dec 2, 2016
32c8538
[SPARK-18674][SQL][FOLLOW-UP] improve the error message of using join
gatorsmile Dec 2, 2016
c69825a
[SPARK-18677] Fix parsing ['key'] in JSON path expressions.
rdblue Dec 2, 2016
f915f81
[SPARK-18291][SPARKR][ML] Revert "[SPARK-18291][SPARKR][ML] SparkR gl…
yanboliang Dec 2, 2016
f537632
[SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamPr…
zsxwing Dec 2, 2016
839d4e9
[SPARK-18324][ML][DOC] Update ML programming and migration guide for …
yanboliang Dec 3, 2016
cf3dbec
[SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames
zero323 Dec 3, 2016
28ea432
[SPARK-18685][TESTS] Fix URI and release resources after opening in t…
HyukjinKwon Dec 3, 2016
b098b48
[SPARK-18582][SQL] Whitelist LogicalPlan operators allowed in correla…
nsyca Dec 3, 2016
28f698b
[SPARK-18081][ML][DOCS] Add user guide for Locality Sensitive Hashing…
Yunni Dec 4, 2016
8145c82
[SPARK-18091][SQL] Deep if expressions cause Generated SpecificUnsafe…
Dec 4, 2016
41d698e
[SPARK-18661][SQL] Creating a partitioned datasource table should not…
ericl Dec 4, 2016
c13c293
[SPARK-18643][SPARKR] SparkR hangs at session start when installed as…
felixcheung Dec 5, 2016
88e07ef
[SPARK-18625][ML] OneVsRestModel should support setFeaturesCol and se…
zhengruifeng Dec 5, 2016
1821cbe
[SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide.
yanboliang Dec 5, 2016
afd2321
[MINOR][DOC] Use SparkR `TRUE` value and add default values for `Stru…
dongjoon-hyun Dec 5, 2016
30c0743
Revert "[SPARK-18284][SQL] Make ExpressionEncoder.serializer.nullable…
rxin Dec 5, 2016
e23c8cf
[SPARK-18711][SQL] should disable subexpression elimination for Lambd…
cloud-fan Dec 5, 2016
39759ff
[DOCS][MINOR] Update location of Spark YARN shuffle jar
nchammas Dec 5, 2016
c6a4e3d
[SPARK-18694][SS] Add StreamingQuery.explain and exception to Python …
zsxwing Dec 5, 2016
fecd23d
[SPARK-18634][PYSPARK][SQL] Corruption and Correctness issues with ex…
viirya Dec 6, 2016
6c4c336
[SPARK-18729][SS] Move DataFrame.collect out of synchronized block in…
zsxwing Dec 6, 2016
1946854
[SPARK-18657][SPARK-18668] Make StreamingQuery.id persists across res…
tdas Dec 6, 2016
d458816
[SPARK-18722][SS] Move no data rate limit from StreamExecution to Pro…
zsxwing Dec 6, 2016
8ca6a82
[SPARK-18572][SQL] Add a method `listPartitionNames` to `ExternalCata…
Dec 6, 2016
655297b
[SPARK-18721][SS] Fix ForeachSink with watermark + append
zsxwing Dec 6, 2016
e362d99
[SPARK-18634][SQL][TRIVIAL] Touch-up Generate
hvanhovell Dec 6, 2016
ace4079
[SPARK-18714][SQL] Add a simple time function to SparkSession
rxin Dec 6, 2016
d20e0d6
[SPARK-18671][SS][TEST] Added tests to ensure stability of that all S…
tdas Dec 6, 2016
65f5331
[SPARK-18652][PYTHON] Include the example data and third-party licens…
lins05 Dec 6, 2016
9b5bc2a
[SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as fo…
tdas Dec 7, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
Please review http://spark.apache.org/contributing.html before opening a pull request.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/deps
python/pyspark/python
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Contributing to Spark

*Before opening a pull request*, review the
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
- Is the change being proposed clearly explained and motivated?

When you contribute code, you affirm that the contribution is your original work and that you
Expand Down
3 changes: 0 additions & 3 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
This product includes/uses ASM (http://asm.ow2.org/),
Copyright (c) 2000-2007 INRIA, France Telecom.

This product includes/uses org.json (http://www.json.org/java/index.html),
Copyright (c) 2002 JSON.org

This product includes/uses JLine (http://jline.sourceforge.net/),
Copyright (c) 2002-2006, Marc Prud'hommeaux <[email protected]>.

Expand Down
91 changes: 91 additions & 0 deletions R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# SparkR CRAN Release

To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
`[email protected]` community and R package maintainer on this.

### Release

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `check-cran.sh` is running `R CMD check`, it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release.

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Once everything is in place, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
```

For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check

### Testing: build package manually

To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.

Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.

#### Build source package

To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
```

(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)

Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.

For example, this should be the content of the source package:

```sh
DESCRIPTION R inst tests
NAMESPACE build man vignettes

inst/doc/
sparkr-vignettes.html
sparkr-vignettes.Rmd
sparkr-vignettes.Rman

build/
vignette.rds

man/
*.Rd files...

vignettes/
sparkr-vignettes.Rmd
```

#### Test source package

To install, run this:

```sh
R CMD INSTALL SparkR_2.1.0.tar.gz
```

With "2.1.0" replaced with the version of SparkR.

This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:

```R
library(SparkR)
vignette("sparkr-vignettes", package="SparkR")
```

#### Build binary package

To build binary package locally, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
```

For example, this should be the content of the binary package:

```sh
DESCRIPTION Meta R html tests
INDEX NAMESPACE help profile worker
```
10 changes: 5 additions & 5 deletions R/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
Example:
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
Expand Down Expand Up @@ -46,19 +46,19 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
sparkR.session()
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
Expand Down
33 changes: 27 additions & 6 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,27 @@ if [ ! -z "$R_HOME" ]
fi
echo "USING R_HOME = $R_HOME"

# Build the latest docs
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
# Build source package with vignettes
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
. "${SPARK_HOME}"/bin/load-spark-env.sh
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ -d "$SPARK_JARS_DIR" ]; then
# Build a zip file containing the source package with vignettes
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Error Spark JARs not found in $SPARK_HOME"
exit 1
fi

# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
Expand All @@ -54,11 +70,16 @@ fi

if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
fi

echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"

"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz

if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
then
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi
popd > /dev/null
19 changes: 1 addition & 18 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.

# After running this script the html docs can be found in
# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
Expand Down Expand Up @@ -52,21 +52,4 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit

popd

# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi

popd
11 changes: 7 additions & 4 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: SparkR
Type: Package
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2016-08-27
Version: 2.1.1
Date: 2016-11-06
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "[email protected]"),
person("Xiangrui", "Meng", role = "aut",
Expand All @@ -11,14 +11,16 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "[email protected]"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports
BugReports: http://spark.apache.org/contributing.html
Depends:
R (>= 3.0),
methods
Suggests:
testthat,
e1071,
survival
survival,
knitr,
rmarkdown
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
Expand Down Expand Up @@ -48,3 +50,4 @@ Collate:
'utils.R'
'window.R'
RoxygenNote: 5.0.1
VignetteBuilder: knitr
9 changes: 7 additions & 2 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ exportMethods("glm",
"spark.als",
"spark.kstest",
"spark.logit",
"spark.randomForest")
"spark.randomForest",
"spark.gbt")

# Job group lifecycle management methods
export("setJobGroup",
Expand Down Expand Up @@ -353,7 +354,9 @@ export("as.DataFrame",
"read.ml",
"print.summary.KSTest",
"print.summary.RandomForestRegressionModel",
"print.summary.RandomForestClassificationModel")
"print.summary.RandomForestClassificationModel",
"print.summary.GBTRegressionModel",
"print.summary.GBTClassificationModel")

export("structField",
"structField.jobj",
Expand All @@ -380,6 +383,8 @@ S3method(print, summary.GeneralizedLinearRegressionModel)
S3method(print, summary.KSTest)
S3method(print, summary.RandomForestRegressionModel)
S3method(print, summary.RandomForestClassificationModel)
S3method(print, summary.GBTRegressionModel)
S3method(print, summary.GBTClassificationModel)
S3method(structField, character)
S3method(structField, jobj)
S3method(structType, jobj)
Expand Down
10 changes: 7 additions & 3 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -936,7 +936,9 @@ setMethod("unique",

#' Sample
#'
#' Return a sampled subset of this SparkDataFrame using a random seed.
#' Return a sampled subset of this SparkDataFrame using a random seed.
#' Note: this is not guaranteed to provide exactly the fraction specified
#' of the total count of of the given SparkDataFrame.
#'
#' @param x A SparkDataFrame
#' @param withReplacement Sampling with replacement or not
Expand Down Expand Up @@ -2539,7 +2541,8 @@ generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) {
#'
#' Return a new SparkDataFrame containing the union of rows in this SparkDataFrame
#' and another SparkDataFrame. This is equivalent to \code{UNION ALL} in SQL.
#' Note that this does not remove duplicate rows across the two SparkDataFrames.
#'
#' Note: This does not remove duplicate rows across the two SparkDataFrames.
#'
#' @param x A SparkDataFrame
#' @param y A SparkDataFrame
Expand Down Expand Up @@ -2582,7 +2585,8 @@ setMethod("unionAll",
#' Union two or more SparkDataFrames
#'
#' Union two or more SparkDataFrames. This is equivalent to \code{UNION ALL} in SQL.
#' Note that this does not remove duplicate rows across the two SparkDataFrames.
#'
#' Note: This does not remove duplicate rows across the two SparkDataFrames.
#'
#' @param x a SparkDataFrame.
#' @param ... additional SparkDataFrame(s).
Expand Down
Loading