Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
696 commits
Select commit Hold shift + click to select a range
db37049
[SPARK-19120] Refresh Metadata Cache After Loading Hive Tables
gatorsmile Jan 15, 2017
bf2f233
[SPARK-19092][SQL][BACKPORT-2.1] Save() API of DataFrameWriter should…
gatorsmile Jan 16, 2017
4f3ce06
[SPARK-19082][SQL] Make ignoreCorruptFiles work for Parquet
viirya Jan 16, 2017
9758905
[SPARK-19232][SPARKR] Update Spark distribution download cache locati…
felixcheung Jan 16, 2017
f4317be
[SPARK-18905][STREAMING] Fix the issue of removing a failed jobset fr…
CodingCat Jan 17, 2017
2ff3669
[SPARK-19019] [PYTHON] Fix hijacked `collections.namedtuple` and port…
HyukjinKwon Jan 17, 2017
13986a7
[SPARK-19065][SQL] Don't inherit expression id in dropDuplicates
zsxwing Jan 17, 2017
3ec3e3f
[SPARK-19129][SQL] SessionCatalog: Disallow empty part col values in …
gatorsmile Jan 17, 2017
29b954b
[SPARK-19066][SPARKR][BACKPORT-2.1] LDA doesn't set optimizer correctly
wangmiao1981 Jan 18, 2017
77202a6
[SPARK-19231][SPARKR] add error handling for download and untar for S…
felixcheung Jan 18, 2017
047506b
[SPARK-19113][SS][TESTS] Ignore StreamingQueryException thrown from a…
zsxwing Jan 18, 2017
4cff0b5
[SPARK-19168][STRUCTURED STREAMING] StateStore should be aborted upon…
lw-lin Jan 18, 2017
7bc3e9b
[SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error check…
cloud-fan Dec 20, 2016
482d361
[SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in S…
tdas Jan 20, 2017
4d286c9
[SPARK-18589][SQL] Fix Python UDF accessing attributes from both side…
Jan 21, 2017
6f0ad57
[SPARK-19267][SS] Fix a race condition when stopping StateStore
zsxwing Jan 21, 2017
8daf10e
[SPARK-19155][ML] MLlib GeneralizedLinearRegression family and link s…
yanboliang Jan 22, 2017
1e07a71
[SPARK-19155][ML] Make family case insensitive in GLM
actuaryzhang Jan 23, 2017
ed5d1e7
[SPARK-19306][CORE] Fix inconsistent state in DiskBlockObject when ex…
jerryshao Jan 23, 2017
4a2be09
[SPARK-9435][SQL] Reuse function in Java UDF to correctly support exp…
HyukjinKwon Jan 24, 2017
570e5e1
[SPARK-19268][SS] Disallow adaptive query execution for streaming que…
zsxwing Jan 24, 2017
9c04e42
[SPARK-18823][SPARKR] add support for assigning to column
felixcheung Jan 24, 2017
d128b6a
[SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm failing in edge case
imatiach-msft Jan 23, 2017
b94fb28
[SPARK-19017][SQL] NOT IN subquery with more than one column may retu…
nsyca Jan 24, 2017
c133787
[SPARK-19330][DSTREAMS] Also show tooltip for successful batches
lw-lin Jan 25, 2017
e2f7739
[SPARK-16046][DOCS] Aggregations in the Spark SQL programming guide
Jan 25, 2017
f391ad2
[SPARK-18750][YARN] Avoid using "mapValues" when allocating containers.
Jan 25, 2017
af95455
[SPARK-18863][SQL] Output non-aggregate expressions without GROUP BY …
nsyca Jan 25, 2017
c9f075a
[SPARK-19307][PYSPARK] Make sure user conf is propagated to SparkCont…
Jan 25, 2017
97d3353
[SPARK-18750][YARN] Follow up: move test to correct directory in 2.1 …
Jan 25, 2017
a5c10ff
[SPARK-19064][PYSPARK] Fix pip installing of sub components
holdenk Jan 25, 2017
0d7e385
[SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD
tdas Jan 26, 2017
b12a76a
[SPARK-19338][SQL] Add UDF names in explain
maropu Jan 26, 2017
59502bb
[SPARK-19220][UI] Make redirection to HTTPS apply to all URIs. (branc…
Jan 27, 2017
ba2a5ad
[SPARK-18788][SPARKR] Add API for getNumPartitions
felixcheung Jan 27, 2017
4002ee9
[SPARK-19333][SPARKR] Add Apache License headers to R files
felixcheung Jan 27, 2017
9a49f9a
[SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in S…
felixcheung Jan 27, 2017
445438c
[SPARK-19396][DOC] JDBC Options are Case In-sensitive
gatorsmile Jan 30, 2017
07a1788
[SPARK-19406][SQL] Fix function to_json to respect user-provided options
gatorsmile Jan 31, 2017
e43f161
[BACKPORT-2.1][SPARKR][DOCS] update R API doc for subset/extract
felixcheung Jan 31, 2017
d35a126
[SPARK-19378][SS] Ensure continuity of stateOperator and eventTime me…
brkyvz Feb 1, 2017
61cdc8c
[SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning
zhengruifeng Feb 1, 2017
f946464
[SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED
Feb 1, 2017
7c23bd4
[SPARK-19432][CORE] Fix an unexpected failure when connecting timeout
zsxwing Feb 2, 2017
f55bd4c
[SPARK-19472][SQL] Parser should not mistake CASE WHEN(...) for a fun…
hvanhovell Feb 6, 2017
62fab5b
[SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting…
uncleGen Feb 7, 2017
dd1abef
[SPARK-19444][ML][DOCUMENTATION] Fix imports not being present in doc…
anshbansal Feb 7, 2017
e642a07
[SPARK-18682][SS] Batch Source for Kafka
Feb 7, 2017
706d6c1
[SPARK-19499][SS] Add more notes in the comments of Sink.addBatch()
CodingCat Feb 8, 2017
4d04029
[MINOR][DOC] Remove parenthesis in readStream() on kafka structured s…
manugarri Feb 8, 2017
71b6eac
[SPARK-18609][SPARK-18841][SQL][BACKPORT-2.1] Fix redundant Alias rem…
hvanhovell Feb 8, 2017
502c927
[SPARK-19413][SS] MapGroupsWithState for arbitrary stateful operation…
tdas Feb 8, 2017
b3fd36a
[SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.…
zsxwing Feb 9, 2017
a3d5300
[SPARK-19509][SQL] Grouping Sets do not respect nullable grouping col…
Feb 9, 2017
ff5818b
[SPARK-19512][BACKPORT-2.1][SQL] codegen for compare structs fails #1…
bogdanrdc Feb 10, 2017
7b5ea00
[SPARK-19543] from_json fails when the input row is empty
brkyvz Feb 10, 2017
e580bb0
[SPARK-18717][SQL] Make code generation for Scala Map work with immut…
aray Dec 13, 2016
173c238
[SPARK-19342][SPARKR] bug fixed in collect method for collecting time…
titicaca Feb 12, 2017
06e77e0
[SPARK-19319][BACKPORT-2.1][SPARKR] SparkR Kmeans summary returns err…
wangmiao1981 Feb 12, 2017
fe4fcc5
[SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader's consumers s…
lw-lin Feb 13, 2017
a3b6751
[SPARK-19574][ML][DOCUMENTATION] Fix Liquid Exception: Start indices …
gatorsmile Feb 13, 2017
ef4fb7e
[SPARK-19506][ML][PYTHON] Import warnings in pyspark.ml.util
zero323 Feb 13, 2017
c5a7cb0
[SPARK-19542][SS] Delete the temp checkpoint if a query is stopped wi…
zsxwing Feb 13, 2017
328b229
[SPARK-17714][CORE][TEST-MAVEN][TEST-HADOOP2.6] Avoid using ExecutorC…
zsxwing Feb 13, 2017
2968d8c
[HOTFIX][SPARK-19542][SS]Fix the missing import in DataStreamReaderWr…
zsxwing Feb 13, 2017
5db2347
[SPARK-19529] TransportClientFactory.createClient() shouldn't call aw…
JoshRosen Feb 13, 2017
7fe3543
[SPARK-19520][STREAMING] Do not encrypt data written to the WAL.
Feb 13, 2017
c8113b0
[SPARK-19585][DOC][SQL] Fix the cacheTable and uncacheTable api call …
skambha Feb 14, 2017
f837ced
[SPARK-19501][YARN] Reduce the number of HDFS RPCs during YARN deploy…
jongwook Feb 14, 2017
7763b0b
[SPARK-19387][SPARKR] Tests do not run with SparkR source package in …
felixcheung Feb 14, 2017
8ee4ec8
[SPARK-19584][SS][DOCS] update structured streaming documentation aro…
Feb 15, 2017
6c35399
[SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column
felixcheung Feb 15, 2017
88c43f4
[SPARK-19599][SS] Clean up HDFSMetadataLog
zsxwing Feb 16, 2017
b9ab4c0
[SPARK-19604][TESTS] Log the start of every Python test
yhuai Feb 15, 2017
db7adb6
[SPARK-19603][SS] Fix StreamingQuery explain command
zsxwing Feb 16, 2017
252dd05
[SPARK-19399][SPARKR][BACKPORT-2.1] fix tests broken by merge
felixcheung Feb 16, 2017
55958bc
[SPARK-19622][WEBUI] Fix a http error in a paged table when using a `…
stanzhai Feb 17, 2017
6e3abed
[SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap
Feb 17, 2017
b083ec5
[SPARK-19517][SS] KafkaSource fails to initialize partition offsets
vitillo Feb 17, 2017
7c371de
[SPARK-19646][CORE][STREAMING] binaryRecords replicates records in sc…
srowen Feb 20, 2017
c331674
[SPARK-19646][BUILD][HOTFIX] Fix compile error from cherry-pick of SP…
srowen Feb 20, 2017
6edf02a
[SPARK-19626][YARN] Using the correct config to set credentials updat…
yaooqinn Feb 21, 2017
9a890b5
[SPARK-19617][SS] Fix the race condition when starting and stopping a…
zsxwing Feb 22, 2017
21afc45
[SPARK-19652][UI] Do auth checks for REST API access (branch-2.1).
Feb 22, 2017
d30238f
[SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[…
actuaryzhang Feb 23, 2017
43084b3
[SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields…
hvanhovell Feb 23, 2017
66a7ca2
[SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculatin…
maropu Feb 24, 2017
6da6a27
[SPARK-19707][CORE] Improve the invalid path check for sc.addJar
jerryshao Feb 24, 2017
ed9aaa3
[SPARK-19038][YARN] Avoid overwriting keytab configuration in yarn-cl…
jerryshao Feb 24, 2017
97866e1
[MINOR][DOCS] Fixes two problems in the SQL programing guide page
boazmohar Feb 25, 2017
20a4329
[SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala imp…
BryanCutler Feb 26, 2017
04fbb9e
[SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to h…
Feb 26, 2017
4b4c3bf
[SPARK-19748][SQL] refresh function has a wrong order to do cache inv…
windpiger Feb 28, 2017
947c0cd
[SPARK-19677][SS] Committing a delta file atop an existing one should…
vitillo Feb 28, 2017
d887f75
[SPARK-19769][DOCS] Update quickstart instructions
elmiko Feb 28, 2017
f719ccc
[SPARK-19572][SPARKR] Allow to disable hive in sparkR shell
zjffdu Mar 1, 2017
bbe0d8c
[SPARK-19766][SQL] Constant alias columns in INNER JOIN should not be…
stanzhai Mar 1, 2017
27347b5
[SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio …
Mar 1, 2017
3a7591a
[SPARK-19750][UI][BRANCH-2.1] Fix redirect issue from http to https
jerryshao Mar 3, 2017
1237aae
[SPARK-19779][SS] Delete needless tmp file after restart structured s…
gf53520 Mar 3, 2017
accbed7
[SPARK-19797][DOC] ML pipeline document correction
ymwdalex Mar 3, 2017
da04d45
[SPARK-19774] StreamExecution should call stop() on sources when a st…
brkyvz Mar 3, 2017
664c979
[SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite do…
zsxwing Mar 4, 2017
ca7a7e8
[SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should…
uncleGen Mar 6, 2017
fd6c6d5
[SPARK-19719][SS] Kafka writer for both structured streaming and batc…
Mar 7, 2017
711addd
[SPARK-19561] [PYTHON] cast TimestampType.toInternal output to long
JasonMWhite Mar 7, 2017
551b7bd
[SPARK-19857][YARN] Correctly calculate next credential update time.
Mar 8, 2017
cbc3700
Revert "[SPARK-19561] [PYTHON] cast TimestampType.toInternal output t…
cloud-fan Mar 8, 2017
3b648a6
[SPARK-19859][SS] The new watermark should override the old one
zsxwing Mar 8, 2017
0ba9ecb
[SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe
BryanCutler Mar 8, 2017
320eff1
[SPARK-18055][SQL] Use correct mirror in ExpresionEncoder
marmbrus Mar 8, 2017
f6c1ad2
[SPARK-19813] maxFilesPerTrigger combo latestFirst may miss old files…
brkyvz Mar 8, 2017
3457c32
Revert "[SPARK-19413][SS] MapGroupsWithState for arbitrary stateful o…
zsxwing Mar 8, 2017
78cc572
[MINOR][SQL] The analyzer rules are fired twice for cases when Analys…
dilipbiswal Mar 9, 2017
00859e1
[SPARK-19874][BUILD] Hide API docs for org.apache.spark.sql.internal
zsxwing Mar 9, 2017
0c140c1
[SPARK-19859][SS][FOLLOW-UP] The new watermark should override the ol…
uncleGen Mar 9, 2017
2a76e24
[SPARK-19561][SQL] add int case handling for TimestampType
JasonMWhite Mar 9, 2017
ffe65b0
[SPARK-19861][SS] watermark should not be a negative time.
uncleGen Mar 9, 2017
a59cc36
[SPARK-19886] Fix reportDataLoss if statement in SS KafkaSource
brkyvz Mar 10, 2017
f0d50fd
[SPARK-19891][SS] Await Batch Lock notified on stream execution exit
Mar 10, 2017
5a2ad43
[SPARK-19893][SQL] should not run DataFrame set oprations with map type
cloud-fan Mar 11, 2017
e481a73
[SPARK-19611][SQL] Introduce configurable table schema inference
Mar 11, 2017
f9833c6
[DOCS][SS] fix structured streaming python example
uncleGen Mar 12, 2017
8c46080
[SPARK-19853][SS] uppercase kafka topics fail when startingOffsets ar…
uncleGen Mar 13, 2017
4545782
[SPARK-19933][SQL] Do not change output of a subquery
hvanhovell Mar 14, 2017
a0ce845
[SPARK-19887][SQL] dynamic partition keys can be null or empty string
cloud-fan Mar 15, 2017
80ebca6
[SPARK-19944][SQL] Move SQLConf from sql/core to sql/catalyst (branch…
rxin Mar 15, 2017
0622546
[SPARK-19872] [PYTHON] Use the correct deserializer for RDD construct…
HyukjinKwon Mar 15, 2017
9d032d0
[SPARK-19329][SQL][BRANCH-2.1] Reading from or writing to a datasourc…
windpiger Mar 16, 2017
4b977ff
[SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BACKPORT-2.1][SQ…
gatorsmile Mar 17, 2017
710b555
[SPARK-19721][SS][BRANCH-2.1] Good error message for version mismatch…
lw-lin Mar 17, 2017
5fb7083
[SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests mor…
zsxwing Mar 17, 2017
780f606
[SQL][MINOR] Fix scaladoc for UDFRegistration
jaceklaskowski Mar 18, 2017
b60f690
[SPARK-18817][SPARKR][SQL] change derby log output to temp dir
felixcheung Mar 19, 2017
af8bf21
[SPARK-19994][SQL] Wrong outputOrdering for right/full outer smj
Mar 20, 2017
d205d40
[SPARK-17204][CORE] Fix replicated off heap storage
Mar 21, 2017
c4c7b18
[SPARK-19912][SQL] String literals should be escaped for Hive metasto…
dongjoon-hyun Mar 21, 2017
a88c88a
[SPARK-20017][SQL] change the nullability of function 'StringToMap' f…
zhaorongsheng Mar 21, 2017
5c18b6c
[SPARK-19237][SPARKR][CORE] On Windows spark-submit should handle whe…
felixcheung Mar 21, 2017
9dfdd2a
clarify array_contains function description
lwwmanning Mar 21, 2017
a04428f
[SPARK-19980][SQL][BACKPORT-2.1] Add NULL checks in Bean serializer
maropu Mar 22, 2017
30abb95
Preparing Spark release v2.1.1-rc1
pwendell Mar 22, 2017
c4d2b83
Preparing development version 2.1.2-SNAPSHOT
pwendell Mar 22, 2017
277ed37
[SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles fails when it wa…
yanboliang Mar 22, 2017
56f997f
[SPARK-20021][PYSPARK] Miss backslash in python code
uncleGen Mar 22, 2017
af960e8
[SPARK-19970][SQL][BRANCH-2.1] Table owner should be USER instead of …
dongjoon-hyun Mar 23, 2017
92f0b01
[SPARK-19959][SQL] Fix to throw NullPointerException in df[java.lang…
kiszk Mar 24, 2017
d989434
[SPARK-19674][SQL] Ignore driver accumulator updates don't belong to …
carsonwang Mar 25, 2017
b6d348e
[SPARK-20086][SQL] CollapseWindow should not collapse dependent adjac…
hvanhovell Mar 26, 2017
4056191
[SPARK-20102] Fix nightly packaging and RC packaging scripts w/ two m…
JoshRosen Mar 27, 2017
4bcb7d6
[SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuin…
jerryshao Mar 28, 2017
fd2e406
[SPARK-20125][SQL] Dataset of type option of map does not work
cloud-fan Mar 28, 2017
e669dd7
[SPARK-14536][SQL][BACKPORT-2.1] fix to handle null value in array ty…
sureshthalamati Mar 28, 2017
02b165d
Preparing Spark release v2.1.1-rc2
pwendell Mar 28, 2017
4964dbe
Preparing development version 2.1.2-SNAPSHOT
pwendell Mar 28, 2017
3095480
[SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails…
facaiy Mar 28, 2017
f8c1b3e
[SPARK-20134][SQL] SQLMetrics.postDriverMetricUpdates to simplify dri…
rxin Mar 29, 2017
103ff54
[SPARK-20059][YARN] Use the correct classloader for HBaseCredentialPr…
jerryshao Mar 29, 2017
6a1b2eb
[SPARK-20164][SQL] AnalysisException not tolerant of null query plan.
kunalkhamar Mar 31, 2017
e3cec18
[SPARK-20084][CORE] Remove internal.metrics.updatedBlockStatuses from…
rdblue Mar 31, 2017
968eace
[SPARK-19999][BACKPORT-2.1][CORE] Workaround JDK-8165231 to identify …
kiszk Apr 2, 2017
ca14410
[SPARK-20197][SPARKR][BRANCH-2.1] CRAN check fail with package instal…
felixcheung Apr 3, 2017
77700ea
[MINOR][DOCS] Replace non-breaking space to normal spaces that breaks…
HyukjinKwon Apr 3, 2017
f9546da
[SPARK-20190][APP-ID] applications//jobs' in rest api,status should b…
Apr 4, 2017
00c1248
[SPARK-20191][YARN] Crate wrapper for RackResolver so tests can overr…
Apr 4, 2017
efc72dc
[SPARK-20042][WEB UI] Fix log page buttons for reverse proxy mode
okoethibm Apr 5, 2017
2b85e05
[SPARK-20223][SQL] Fix typo in tpcds q77.sql
Apr 5, 2017
fb81a41
[SPARK-20214][ML] Make sure converted csc matrix has sorted indices
viirya Apr 6, 2017
7791120
[SPARK-20218][DOC][APP-ID] applications//stages' in REST API,add desc…
Apr 7, 2017
fc242cc
[SPARK-20246][SQL] should not push predicate down through aggregate w…
cloud-fan Apr 8, 2017
658b358
[SPARK-20262][SQL] AssertNotNull should throw NullPointerException
rxin Apr 8, 2017
43a7fca
[SPARK-20260][MLLIB] String interpolation required for error message
Apr 9, 2017
1a73046
[SPARK-20264][SQL] asm should be non-test dependency in sql/core
rxin Apr 10, 2017
bc7304e
[SPARK-20280][CORE] FileStatusCache Weigher integer overflow
bogdanrdc Apr 10, 2017
489c1f3
[SPARK-20285][TESTS] Increase the pyspark streaming test timeout to 3…
zsxwing Apr 10, 2017
b26f2c2
[SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values …
Dec 6, 2016
f40e44d
[SPARK-20270][SQL] na.fill should not change the values in long or in…
Apr 10, 2017
8eb71b8
[SPARK-17564][TESTS] Fix flaky RequestTimeoutIntegrationSuite.further…
zsxwing Apr 11, 2017
03a42c0
[SPARK-18555][MINOR][SQL] Fix the @since tag when backporting from 2.…
dbtsai Apr 11, 2017
46e212d
[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to N…
Apr 12, 2017
b2970d9
[MINOR][DOCS] Fix spacings in Structured Streaming Programming Guide
dongjinleekr Apr 12, 2017
dbb6d1b
[SPARK-20296][TRIVIAL][DOCS] Count distinct error message for streaming
jtoka Apr 12, 2017
7e0ddda
[SPARK-20304][SQL] AssertNotNull should not include path in string re…
rxin Apr 12, 2017
be36c2f
[SPARK-20131][CORE] Don't use `this` lock in StandaloneSchedulerBacke…
zsxwing Apr 13, 2017
98ae548
[SPARK-19924][SQL][BACKPORT-2.1] Handle InvocationTargetException for…
gatorsmile Apr 13, 2017
bca7ce2
[SPARK-19946][TESTS][BACKPORT-2.1] DebugFilesystem.assertNoOpenStream…
bogdanrdc Apr 13, 2017
6f715c0
[SPARK-20243][TESTS] DebugFilesystem.assertNoOpenStreams thread race
bogdanrdc Apr 10, 2017
2ed19cf
Preparing Spark release v2.1.1-rc3
pwendell Apr 14, 2017
2a3e50e
Preparing development version 2.1.2-SNAPSHOT
pwendell Apr 14, 2017
efa11a4
[SPARK-20335][SQL][BACKPORT-2.1] Children expressions of Hive UDF imp…
gatorsmile Apr 17, 2017
7aad057
[SPARK-20349][SQL] ListFunctions returns duplicate functions after us…
gatorsmile Apr 17, 2017
db9517c
[SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patterns.
jodersky Apr 17, 2017
622d7a8
[HOTFIX] Fix compilation.
rxin Apr 17, 2017
3808b47
[SPARK-20349][SQL][REVERT-BRANCH2.1] ListFunctions returns duplicate …
gatorsmile Apr 18, 2017
a4c1ebc
[SPARK-17647][SQL][FOLLOWUP][MINOR] fix typo
felixcheung Apr 18, 2017
171bf65
[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin …
koertkuipers Apr 19, 2017
9e5dc82
[MINOR][SS] Fix a missing space in UnsupportedOperationChecker error …
zsxwing Apr 20, 2017
66e7a8f
[SPARK-20409][SQL] fail early if aggregate function in GROUP BY
cloud-fan Apr 20, 2017
fb0351a
Small rewording about history server use case
dud225 Apr 21, 2017
ba50580
[SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySuite 'Enabling/disabl…
bogdanrdc Apr 22, 2017
d99b49b
[SPARK-20450][SQL] Unexpected first-query schema inference cost with …
ericl Apr 24, 2017
4279665
[SPARK-20451] Filter out nested mapType datatypes from sort order in …
sameeragarwal Apr 25, 2017
65990fc
[SPARK-20455][DOCS] Fix Broken Docker IT Docs
original-brownbear Apr 25, 2017
2d47e1a
[SPARK-20404][CORE] Using Option(name) instead of Some(name)
szhem Apr 25, 2017
359382c
[SPARK-20239][CORE][2.1-BACKPORT] Improve HistoryServer's ACL mechanism
jerryshao Apr 25, 2017
267aca5
Preparing Spark release v2.1.1-rc4
pwendell Apr 25, 2017
8460b09
Preparing development version 2.1.2-SNAPSHOT
pwendell Apr 25, 2017
6696ad0
[SPARK-20439][SQL][BACKPORT-2.1] Fix Catalog API listTables and getTa…
gatorsmile Apr 26, 2017
5131b0a
[SPARK-20496][SS] Bug in KafkaWriter Looks at Unanalyzed Plans
Apr 28, 2017
868b4a1
[SPARK-20517][UI] Fix broken history UI download link
jerryshao May 1, 2017
5915588
[SPARK-20540][CORE] Fix unstable executor requests.
rdblue May 1, 2017
d10b0f6
[SPARK-20558][CORE] clear InheritableThreadLocal variables in SparkCo…
cloud-fan May 3, 2017
179f537
[SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode
jyu00 May 5, 2017
2a7f5da
[SPARK-20613] Remove excess quotes in Windows executable
jarrettmeyer May 5, 2017
704b249
[SPARK-20603][SS][TEST] Set default number of topic partitions to 1 t…
zsxwing May 5, 2017
a1112c6
[SPARK-20616] RuleExecutor logDebug of batch results should show diff…
juliuszsompolski May 5, 2017
f7a91a1
[SPARK-20615][ML][TEST] SparseVector.argmax throws IndexOutOfBoundsEx…
May 9, 2017
12c937e
[SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Pyt…
holdenk May 9, 2017
50f28df
[SPARK-17685][SQL] Make SortMergeJoinExec's currentVars is null when …
wangyum May 10, 2017
8e09789
[SPARK-20686][SQL] PropagateEmptyRelation incorrectly handles aggrega…
JoshRosen May 10, 2017
69786ea
[SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsisten…
zero323 May 10, 2017
bdc08ab
[SPARK-20688][SQL] correctly check analysis for scalar sub-queries
cloud-fan May 10, 2017
92a71a6
[SPARK-20685] Fix BatchPythonEvaluation bug in case of single UDF w/ …
JoshRosen May 10, 2017
6e89d57
[SPARK-20665][SQL] Bround" and "Round" function return NULL
10110346 May 12, 2017
95de467
[SPARK-17424] Fix unsound substitution bug in ScalaReflection.
rdblue May 12, 2017
62969e9
[SPARK-20705][WEB-UI] The sort function can not be used in the master…
May 15, 2017
14b6a9d
[SPARK-20735][SQL][TEST] Enable cross join in TPCDSQueryBenchmark
dongjoon-hyun May 15, 2017
ba35c6b
[SPARK-20769][DOC] Incorrect documentation for using Jupyter notebook
aray May 17, 2017
e06d936
[SPARK-20796] the location of start-master.sh in spark-standalone.md …
liu-zhaokun May 18, 2017
e326de4
[SPARK-20798] GenerateUnsafeProjection should check if a value is nul…
ala May 19, 2017
c53fe79
[SPARK-20759] SCALA_VERSION in _config.yml should be consistent with …
liu-zhaokun May 19, 2017
e9804b3
[SPARK-20781] the location of Dockerfile in docker.properties.templat…
liu-zhaokun May 19, 2017
c3a986b
[SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash when convert…
ghoto May 22, 2017
f5ef076
[SPARK-20756][YARN] yarn-shuffle jar references unshaded guava
markgrover May 22, 2017
f4538c9
[SPARK-20763][SQL][BACKPORT-2.1] The function of `month` and `day` re…
10110346 May 23, 2017
13adc0f
[SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape i…
MrBago May 24, 2017
2f68631
[SPARK-20848][SQL] Shutdown the pool after reading parquet files
viirya May 24, 2017
c3302e8
[SPARK-18406][CORE][BACKPORT-2.1] Race between end-of-task and comple…
jiangxb1987 May 25, 2017
7015f6f
[SPARK-20848][SQL][FOLLOW-UP] Shutdown the pool after reading parquet…
viirya May 25, 2017
7fc2347
[SPARK-20250][CORE] Improper OOM error when a task been killed while …
ConeyLiu May 25, 2017
4f6fccf
[SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to exam…
zsxwing May 25, 2017
6e6adcc
[SPARK-20868][CORE] UnsafeShuffleWriter should verify the position af…
cloud-fan May 26, 2017
ebd72f4
[SPARK-20843][CORE] Add a config to set driver terminate timeout
zsxwing May 27, 2017
38f37c5
[SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities
n-marion May 10, 2017
4640086
[SPARK-20275][UI] Do not display "Completed" column for in-progress a…
jerryshao May 31, 2017
dade85f
[SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateExcep…
zsxwing Jun 1, 2017
772a9b9
[SPARK-20922][CORE] Add whitelist of classes that can be deserialized…
Jun 1, 2017
0b25a7d
[SPARK-20922][CORE][HOTFIX] Don't use Java 8 lambdas in older branches.
Jun 1, 2017
afab855
[SPARK-20974][BUILD] we should run REPL tests if SQL module has code …
cloud-fan Jun 3, 2017
03cc18b
[SPARK-20914][DOCS] Javadoc contains code that is invalid
srowen Jun 8, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
Please review http://spark.apache.org/contributing.html before opening a pull request.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/deps
python/pyspark/python
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Contributing to Spark

*Before opening a pull request*, review the
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
- Is the change being proposed clearly explained and motivated?

When you contribute code, you affirm that the contribution is your original work and that you
Expand Down
10 changes: 5 additions & 5 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -249,11 +249,11 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(Interpreter classes (all .scala files in repl/src/main/scala
except for Main.Scala, SparkHelper.scala and ExecutorClassLoader.scala),
and for SerializableMapWrapper in JavaUtils.scala)
(BSD-like) Scala Actors library (org.scala-lang:scala-actors:2.11.7 - http://www.scala-lang.org/)
(BSD-like) Scala Compiler (org.scala-lang:scala-compiler:2.11.7 - http://www.scala-lang.org/)
(BSD-like) Scala Compiler (org.scala-lang:scala-reflect:2.11.7 - http://www.scala-lang.org/)
(BSD-like) Scala Library (org.scala-lang:scala-library:2.11.7 - http://www.scala-lang.org/)
(BSD-like) Scalap (org.scala-lang:scalap:2.11.7 - http://www.scala-lang.org/)
(BSD-like) Scala Actors library (org.scala-lang:scala-actors:2.11.8 - http://www.scala-lang.org/)
(BSD-like) Scala Compiler (org.scala-lang:scala-compiler:2.11.8 - http://www.scala-lang.org/)
(BSD-like) Scala Compiler (org.scala-lang:scala-reflect:2.11.8 - http://www.scala-lang.org/)
(BSD-like) Scala Library (org.scala-lang:scala-library:2.11.8 - http://www.scala-lang.org/)
(BSD-like) Scalap (org.scala-lang:scalap:2.11.8 - http://www.scala-lang.org/)
(BSD-style) scalacheck (org.scalacheck:scalacheck_2.11:1.10.0 - http://www.scalacheck.org)
(BSD-style) spire (org.spire-math:spire_2.11:0.7.1 - http://spire-math.org)
(BSD-style) spire-macros (org.spire-math:spire-macros_2.11:0.7.1 - http://spire-math.org)
Expand Down
3 changes: 0 additions & 3 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
This product includes/uses ASM (http://asm.ow2.org/),
Copyright (c) 2000-2007 INRIA, France Telecom.

This product includes/uses org.json (http://www.json.org/java/index.html),
Copyright (c) 2002 JSON.org

This product includes/uses JLine (http://jline.sourceforge.net/),
Copyright (c) 2002-2006, Marc Prud'hommeaux <[email protected]>.

Expand Down
91 changes: 91 additions & 0 deletions R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# SparkR CRAN Release

To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
`[email protected]` community and R package maintainer on this.

### Release

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Once everything is in place, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
```

For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check

### Testing: build package manually

To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.

Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.

#### Build source package

To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
```

(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)

Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.

For example, this should be the content of the source package:

```sh
DESCRIPTION R inst tests
NAMESPACE build man vignettes

inst/doc/
sparkr-vignettes.html
sparkr-vignettes.Rmd
sparkr-vignettes.Rman

build/
vignette.rds

man/
*.Rd files...

vignettes/
sparkr-vignettes.Rmd
```

#### Test source package

To install, run this:

```sh
R CMD INSTALL SparkR_2.1.0.tar.gz
```

With "2.1.0" replaced with the version of SparkR.

This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:

```R
library(SparkR)
vignette("sparkr-vignettes", package="SparkR")
```

#### Build binary package

To build binary package locally, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
```

For example, this should be the content of the binary package:

```sh
DESCRIPTION Meta R html tests
INDEX NAMESPACE help profile worker
```
10 changes: 5 additions & 5 deletions R/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
Example:
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
Expand Down Expand Up @@ -46,19 +46,19 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
sparkR.session()
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
Expand Down
50 changes: 44 additions & 6 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,30 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Build the latest docs
# Install the package (this is required for code in vignettes to run when building it later)
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
# Build source package with vignettes
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
. "${SPARK_HOME}"/bin/load-spark-env.sh
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ -d "$SPARK_JARS_DIR" ]; then
# Build a zip file containing the source package with vignettes
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Error Spark JARs not found in $SPARK_HOME"
exit 1
fi

# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
Expand All @@ -54,11 +71,32 @@ fi

if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
fi

echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"

"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
then
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi

# Install source package to get it to generate vignettes rds files, etc.
if [ -n "$CLEAN_INSTALL" ]
then
echo "Removing lib path and installing from source package"
LIB_DIR="$FWDIR/lib"
rm -rf $LIB_DIR
mkdir -p $LIB_DIR
"$R_SCRIPT_PATH/"R CMD INSTALL SparkR_"$VERSION".tar.gz --library=$LIB_DIR

# Zip the SparkR package so that it can be distributed to worker nodes on YARN
pushd $LIB_DIR > /dev/null
jar cfM "$LIB_DIR/sparkr.zip" SparkR
popd > /dev/null
fi

popd > /dev/null
19 changes: 1 addition & 18 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.

# After running this script the html docs can be found in
# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
Expand Down Expand Up @@ -52,21 +52,4 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit

popd

# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi

popd
2 changes: 1 addition & 1 deletion R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Generate Rd files if devtools is installed
"$R_SCRIPT_PATH/"Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'
Expand Down
3 changes: 3 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
^cran-comments\.md$
^NEWS\.md$
^README\.Rmd$
^src-native$
^html$
12 changes: 7 additions & 5 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
Package: SparkR
Type: Package
Version: 2.1.2
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2016-08-27
Description: The SparkR package provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "[email protected]"),
person("Xiangrui", "Meng", role = "aut",
email = "[email protected]"),
person("Felix", "Cheung", role = "aut",
email = "[email protected]"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
License: Apache License (== 2.0)
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports
BugReports: http://spark.apache.org/contributing.html
Depends:
R (>= 3.0),
methods
Suggests:
knitr,
rmarkdown,
testthat,
e1071,
survival
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
'schema.R'
'generics.R'
Expand Down Expand Up @@ -48,3 +49,4 @@ Collate:
'utils.R'
'window.R'
RoxygenNote: 5.0.1
VignetteBuilder: knitr
Loading