Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1219 commits
Select commit Hold shift + click to select a range
47eb9a6
Preparing Spark release v2.0.0-rc3
pwendell Jul 14, 2016
5244f86
Preparing development version 2.0.1-SNAPSHOT
pwendell Jul 14, 2016
f6eda6b
[SPARK-16503] SparkSession should provide Spark version
lw-lin Jul 14, 2016
48d1fa3
Preparing Spark release v2.0.0-rc3
pwendell Jul 14, 2016
b3ebecb
Preparing development version 2.0.1-SNAPSHOT
pwendell Jul 14, 2016
240c42b
[SPARK-16500][ML][MLLIB][OPTIMIZER] add LBFGS convergence warning for…
WeichenXu123 Jul 14, 2016
4e9080f
[SPARK-16509][SPARKR] Rename window.partitionBy and window.orderBy to…
sun-rui Jul 14, 2016
29281bc
[SPARK-16538][SPARKR] fix R call with namespace operator on SparkSess…
felixcheung Jul 14, 2016
e5f8c11
Preparing Spark release v2.0.0-rc4
pwendell Jul 14, 2016
0a651aa
Preparing development version 2.0.1-SNAPSHOT
pwendell Jul 14, 2016
7418019
[SPARK-16529][SQL][TEST] `withTempDatabase` should set `default` data…
dongjoon-hyun Jul 14, 2016
23e1ab9
[SPARK-16528][SQL] Fix NPE problem in HiveClientImpl
jacek-lewandowski Jul 14, 2016
1fe0bcd
[SPARK-16540][YARN][CORE] Avoid adding jars twice for Spark running o…
jerryshao Jul 14, 2016
5c56bc0
[SPARK-16553][DOCS] Fix SQL example file name in docs
shivaram Jul 14, 2016
aa4690b
[SPARK-16555] Work around Jekyll error-handling bug which led to sile…
JoshRosen Jul 14, 2016
c5f9355
[SPARK-16557][SQL] Remove stale doc in sql/README.md
rxin Jul 15, 2016
90686ab
[SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLl…
jkbradley Jul 15, 2016
e833c90
[SPARK-16538][SPARKR] Add more tests for namespace call to SparkSessi…
felixcheung Jul 15, 2016
34ac45a
[SPARK-16230][CORE] CoarseGrainedExecutorBackend to self kill if ther…
tejasapatil Jul 15, 2016
5d49529
[SPARK-16582][SQL] Explicitly define isNull = false for non-nullable …
sameeragarwal Jul 16, 2016
cad4693
[SPARK-3359][DOCS] More changes to resolve javadoc 8 errors that will…
srowen Jul 16, 2016
8c2ec44
[SPARK-16112][SPARKR] Programming guide for gapply/gapplyCollect
Jul 16, 2016
c527e9e
[SPARK-16507][SPARKR] Add a CRAN checker, fix Rd aliases
shivaram Jul 17, 2016
a4bf13a
[SPARK-16584][SQL] Move regexp unit tests to RegexpExpressionsSuite
rxin Jul 17, 2016
808d69a
[SPARK-16588][SQL] Deprecate monotonicallyIncreasingId in Scala/Java
rxin Jul 18, 2016
2365d63
[MINOR][TYPO] fix fininsh typo
WeichenXu123 Jul 18, 2016
085f3cc
[SPARK-16055][SPARKR] warning added while using sparkPackages with sp…
krishnakalyan3 Jul 18, 2016
33d92f7
[SPARK-16515][SQL] set default record reader and writer for script tr…
adrian-wang Jul 18, 2016
7889585
[SPARKR][DOCS] minor code sample update in R programming guide
felixcheung Jul 18, 2016
aac8608
[SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to check generated S…
dongjoon-hyun Jul 19, 2016
1dd1526
[HOTFIX] Fix Scala 2.10 compilation
rxin Jul 19, 2016
24ea875
[SPARK-16615][SQL] Expose sqlContext in SparkSession
rxin Jul 19, 2016
ef2a6f1
[SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update
liancheng Jul 19, 2016
504aa6f
[DOC] improve python doc for rdd.histogram and dataframe.join
mortada Jul 19, 2016
eb1c20f
[MINOR][BUILD] Fix Java Linter `LineLength` errors
dongjoon-hyun Jul 19, 2016
929fa28
[MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar
ahmed-mahran Jul 19, 2016
2c74b6d
[SPARK-16600][MLLIB] fix some latex formula syntax error
WeichenXu123 Jul 19, 2016
6ca1d94
[SPARK-16620][CORE] Add back the tokenization process in `RDD.pipe(co…
lw-lin Jul 19, 2016
f18f9ca
[SPARK-16602][SQL] `Nvl` function should support numeric-string cases
dongjoon-hyun Jul 19, 2016
80ab8b6
[SPARK-15705][SQL] Change the default value of spark.sql.hive.convert…
yhuai Jul 19, 2016
13650fc
Preparing Spark release v2.0.0-rc5
pwendell Jul 19, 2016
307f892
Preparing development version 2.0.1-SNAPSHOT
pwendell Jul 19, 2016
f58fd46
[SPARK-16568][SQL][DOCUMENTATION] update sql programming guide refres…
WeichenXu123 Jul 20, 2016
6f209c8
[SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to Sp…
shivaram Jul 20, 2016
c2b5b3c
[SPARK-16632][SQL] Respect Hive schema when merging parquet schema.
Jul 20, 2016
3f6b272
[SPARK-16440][MLLIB] Destroy broadcasted variables even on driver
Jul 20, 2016
83b957e
[SPARK-15923][YARN] Spark Application rest api returns 'no such app: …
weiqingy Jul 20, 2016
b177e08
[SPARK-16613][CORE] RDD.pipe returns values for empty partitions
srowen Jul 20, 2016
81004f1
[SPARK-16634][SQL] Workaround JVM bug by moving some code out of ctor.
Jul 20, 2016
a804c92
[SPARK-16644][SQL] Aggregate should not propagate constraints contain…
cloud-fan Jul 21, 2016
c2b4228
[MINOR][DOCS][STREAMING] Minor docfix schema of csv rather than parqu…
holdenk Jul 21, 2016
f9367d6
[SPARK-16632][SQL] Use Spark requested schema to guide vectorized Par…
liancheng Jul 21, 2016
933d76a
[SPARK-16632][SQL] Revert PR #14272: Respect Hive schema when merging…
liancheng Jul 21, 2016
cd41e6a
[SPARK-16656][SQL] Try to make CreateTableAsSelectSuite more stable
yhuai Jul 21, 2016
4cb8ff7
[SPARK-16334] Maintain single dictionary per row-batch in vectorized …
sameeragarwal Jul 21, 2016
70bf8ce
[SPARK-16287][SQL] Implement str_to_map SQL function
techaddict Jul 22, 2016
0cc36ca
[SPARK-16287][HOTFIX][BUILD][SQL] Fix annotation argument needs to be…
jaceklaskowski Jul 22, 2016
fb944a1
[SPARK-16650] Improve documentation of spark.task.maxFailures
Jul 22, 2016
28bb2b0
[SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description…
dongjoon-hyun Jul 22, 2016
da34e8e
[SPARK-16380][EXAMPLES] Update SQL examples and programming guide for…
liancheng Jul 23, 2016
31c3bcb
[SPARK-16690][TEST] rename SQLTestUtils.withTempTable to withTempView
cloud-fan Jul 23, 2016
198b042
[SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/Windows...
lw-lin Jul 24, 2016
d226dce
[SPARK-16699][SQL] Fix performance bug in hash aggregate on long stri…
ooq Jul 25, 2016
fcbb7f6
[SPARK-16648][SQL] Make ignoreNullsExpr a child expression of First a…
liancheng Jul 25, 2016
b52e639
[SPARK-16698][SQL] Field names having dots should be allowed for data…
HyukjinKwon Jul 25, 2016
57d65e5
[SPARK-16703][SQL] Remove extra whitespace in SQL generation for wind…
liancheng Jul 25, 2016
d9bd066
[SPARKR][DOCS] fix broken url in doc
felixcheung Jul 25, 2016
f0d05f6
[SPARK-16485][DOC][ML] Fixed several inline formatting in ml features…
lins05 Jul 25, 2016
1b4f7cf
[SQL][DOC] Fix a default name for parquet compression
maropu Jul 25, 2016
41e72f6
[SPARK-16715][TESTS] Fix a potential ExprId conflict for Subexpressio…
zsxwing Jul 25, 2016
b17fe4e
[SPARK-14131][STREAMING] SQL Improved fix for avoiding potential dead…
tdas Jul 25, 2016
9d581dc
[SPARK-16722][TESTS] Fix a StreamingContext leak in StreamingContextS…
zsxwing Jul 26, 2016
3d35474
Fix description of spark.speculation.quantile
nwbvt Jul 26, 2016
aeb6d5c
[SPARK-16672][SQL] SQLBuilder should not raise exceptions on EXISTS q…
dongjoon-hyun Jul 26, 2016
4b38a6a
[SPARK-16724] Expose DefinedByConstructorParams
marmbrus Jul 26, 2016
4391d4a
[SPARK-16633][SPARK-16642][SPARK-16721][SQL] Fixes three issues relat…
yhuai Jul 26, 2016
44234b1
[TEST][STREAMING] Fix flaky Kafka rate controlling test
tdas Jul 26, 2016
be9965b
[SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
dongjoon-hyun Jul 27, 2016
4e98e69
[MINOR][ML] Fix some mistake in LinearRegression formula.
yanboliang Jul 27, 2016
8bc2877
[SPARK-16729][SQL] Throw analysis exception for invalid date casts
petermaxlee Jul 27, 2016
2f4e06e
[MINOR][DOC] missing keyword new
Jul 27, 2016
2d56a21
[SPARK-16730][SQL] Implement function aliases for type casts
petermaxlee Jul 28, 2016
0fd2dfb
[SPARK-15232][SQL] Add subquery SQL building tests to LogicalPlanToSQ…
dongjoon-hyun Jul 28, 2016
825c837
[SPARK-16639][SQL] The query with having condition that contains grou…
viirya Jul 28, 2016
f46a074
[SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap
sylvinus Jul 28, 2016
fb09a69
[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on O…
sameeragarwal Jul 28, 2016
5cd79c3
[SPARK-16772] Correct API doc references to PySpark classes + formatt…
nchammas Jul 28, 2016
ed03d0a
[SPARK-16664][SQL] Fix persist call on Data frames with more than 200…
Jul 29, 2016
efad4aa
[SPARK-16750][ML] Fix GaussianMixture training failed due to feature …
yanboliang Jul 29, 2016
268bf14
[SPARK-16751] Upgrade derby to 10.12.1.1
a-roberts Jul 29, 2016
a32531a
[SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md
sundapeng Jul 29, 2016
7d87fc9
[SPARK-16748][SQL] SparkExceptions during planning should not wrapped…
tdas Jul 30, 2016
26da5a7
[SPARK-16800][EXAMPLES][ML] Fix Java examples that fail to run due to…
BryanCutler Jul 30, 2016
75dd781
[SPARK-16812] Open up SparkILoop.getAddedJars
rxin Jul 31, 2016
d357ca3
[SPARK-16813][SQL] Remove private[sql] and private[spark] from cataly…
rxin Jul 31, 2016
c651ff5
[SPARK-16805][SQL] Log timezone when query result does not match
rxin Aug 1, 2016
4bdc558
[SPARK-16778][SQL][TRIVIAL] Fix deprecation warning with SQLContext
holdenk Aug 1, 2016
b49091e
[SPARK-16776][STREAMING] Replace deprecated API in KafkaTestUtils for…
HyukjinKwon Aug 1, 2016
1523bf6
[SPARK-16791][SQL] cast struct with timestamp field fails
Aug 1, 2016
4e73cb8
[SPARK-16774][SQL] Fix use of deprecated timestamp constructor & impr…
holdenk Aug 1, 2016
1813bbd
[SPARK-15869][STREAMING] Fix a potential NPE in StreamingJobProgressL…
zsxwing Aug 1, 2016
5fbf5f9
[SPARK-16818] Exchange reuse incorrectly reuses scans over different …
ericl Aug 2, 2016
9d9956e
[SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings
liancheng Aug 2, 2016
c5516ab
[SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use M…
yinxusen Aug 2, 2016
fc18e25
[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)
Aug 2, 2016
22f0899
[SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in cons…
tmagrino Aug 2, 2016
ef7927e
[SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs
viirya Aug 2, 2016
a937c9e
[SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP lit…
hvanhovell Aug 2, 2016
f190bb8
[SPARK-16850][SQL] Improve type checking error message for greatest/l…
petermaxlee Aug 2, 2016
063a507
[SPARK-16787] SparkContext.addFile() should not throw if called twice…
JoshRosen Aug 2, 2016
d9d3504
[SPARK-16831][PYTHON] Fixed bug in CrossValidator.avgMetrics
pkch Aug 3, 2016
969313b
[SPARK-16796][WEB UI] Visible passwords on Spark environment page
Devian-ua Aug 2, 2016
2daab33
[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's…
cloud-fan Aug 3, 2016
b44da5b
[SPARK-14204][SQL] register driverClass rather than user-specified class
mchalek Aug 3, 2016
bb30a3d
[SPARK-16770][BUILD] Fix JLine dependency management and version (Sca…
stsc-pentasys Aug 4, 2016
11854e5
[SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data
sharkdtu Aug 4, 2016
182991e
[SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap
Aug 4, 2016
ddbff01
[SPARK-16875][SQL] Add args checking for DataSet randomSplit and sample
zhengruifeng Aug 4, 2016
c66338b
[SPARK-16880][ML][MLLIB] make ann training data persisted if needed
WeichenXu123 Aug 4, 2016
818ddcf
[SPARK-16877][BUILD] Add rules for preventing to use Java annotations…
HyukjinKwon Aug 4, 2016
824d626
[SPARK-16863][ML] ProbabilisticClassifier.fit check threshoulds' length
zhengruifeng Aug 4, 2016
dae08fb
[SPARK-16907][SQL] Fix performance regression for parquet table when …
clockfly Aug 5, 2016
b4a89c1
[SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration
koeninger Aug 5, 2016
7fbac48
[MINOR] Update AccumulatorV2 doc to not mention "+=".
petermaxlee Aug 5, 2016
d99d909
[SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/Ve…
yanboliang Aug 5, 2016
b5d65b4
[SPARK-16901] Hive settings in hive-site.xml may be overridden by Hiv…
yhuai Aug 5, 2016
90e0460
[SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration…
nchammas Aug 6, 2016
d233431
[SPARK-16925] Master should call schedule() after all executor exit e…
JoshRosen Aug 7, 2016
58e7038
document that Mesos cluster mode supports python
Aug 7, 2016
c036448
[SPARK-16932][DOCS] Changed programming guide to not reference old ac…
BryanCutler Aug 7, 2016
3f8a95b
[SPARK-16870][DOCS] Summary:add "spark.sql.broadcastTimeout" into doc…
biglobster Aug 7, 2016
739a333
[SPARK-16911] Fix the links in the programming guide
shiv4nsh Aug 7, 2016
fd828e1
[SPARK-16409][SQL] regexp_extract with optional groups causes NPE
srowen Aug 7, 2016
f37ed6e
[SPARK-16939][SQL] Fix build error by using `Tuple1` explicitly in St…
dongjoon-hyun Aug 7, 2016
ca0c6e6
[SPARK-16457][SQL] Fix Wrong Messages when CTAS with a Partition By C…
gatorsmile Aug 8, 2016
b8a7958
[SPARK-16936][SQL] Case Sensitivity Support for Refresh Temp Table
gatorsmile Aug 8, 2016
69e278e
[SPARK-16586][CORE] Handle JVM errors printed to stdout.
Aug 8, 2016
9748a29
[SPARK-16953] Make requestTotalExecutors public Developer API to be c…
tdas Aug 8, 2016
6fc54b7
Update docs to include SASL support for RPC
Aug 8, 2016
601c649
[SPARK-16563][SQL] fix spark sql thrift server FetchResults bug
Aug 9, 2016
bbbd3cb
[SPARK-16610][SQL] Add `orc.compress` as an alias for `compression` o…
HyukjinKwon Aug 9, 2016
41d9dca
[SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.c…
Aug 9, 2016
44115e9
[SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable
JoshRosen Aug 9, 2016
2d136db
[SPARK-16905] SQL DDL: MSCK REPAIR TABLE
Aug 9, 2016
475ee38
Fixed typo
jupblb Aug 10, 2016
2285de7
[SPARK-16522][MESOS] Spark application throws exception on exit.
sun-rui Aug 10, 2016
20efb79
[SPARK-16324][SQL] regexp_extract should doc that it returns empty st…
srowen Aug 10, 2016
719ac5f
[SPARK-15899][SQL] Fix the construction of the file path with hadoop …
avulanov Aug 10, 2016
15637f7
Revert "[SPARK-15899][SQL] Fix the construction of the file path with…
srowen Aug 10, 2016
977fbbf
[SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level…
viirya Aug 10, 2016
d3a30d2
[SPARK-16579][SPARKR] add install.spark function
junyangq Aug 10, 2016
1e40135
[SPARK-17010][MINOR][DOC] Wrong description in memory management docu…
WangTaoTheTonic Aug 11, 2016
8611bc2
[SPARK-16866][SQL] Infrastructure for file-based SQL end-to-end tests
petermaxlee Aug 10, 2016
51b1016
[SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQue…
petermaxlee Aug 11, 2016
ea8a198
[SPARK-17007][SQL] Move test data files into a test-data folder
petermaxlee Aug 11, 2016
4b434e7
[SPARK-17011][SQL] Support testing exceptions in SQLQueryTestSuite
petermaxlee Aug 11, 2016
0ed6236
Correct example value for spark.ssl.YYY.XXX settings
ash211 Aug 11, 2016
33a213f
[SPARK-15899][SQL] Fix the construction of the file path with hadoop …
avulanov Aug 11, 2016
6bf20cd
[SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests
petermaxlee Aug 11, 2016
bc683f0
[SPARK-17018][SQL] literals.sql for testing literal parsing
petermaxlee Aug 11, 2016
0fb0149
[SPARK-17022][YARN] Handle potential deadlock in driver handling mess…
WangTaoTheTonic Aug 11, 2016
b4047fc
[SPARK-16975][SQL] Column-partition path starting '_' should be handl…
dongjoon-hyun Aug 12, 2016
bde94cd
[SPARK-17013][SQL] Parse negative numeric literals
petermaxlee Aug 12, 2016
38378f5
[SPARK-12370][DOCUMENTATION] Documentation should link to examples …
jagadeesanas2 Aug 13, 2016
a21ecc9
[SPARK-17023][BUILD] Upgrade to Kafka 0.10.0.1 release
lresende Aug 13, 2016
750f880
[SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.ap…
srowen Aug 13, 2016
e02d0d0
[SPARK-17027][ML] Avoid integer overflow in PolynomialExpansion.getPo…
zero323 Aug 14, 2016
8f4cacd
[SPARK-16508][SPARKR] Split docs for arrange and orderBy methods
junyangq Aug 15, 2016
4503632
[SPARK-17065][SQL] Improve the error message when encountering an inc…
zsxwing Aug 15, 2016
2e2c787
[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package
hvanhovell Aug 16, 2016
237ae54
Revert "[SPARK-16964][SQL] Remove private[hive] from sql.hive.executi…
rxin Aug 16, 2016
1c56971
[SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.ex…
hvanhovell Aug 16, 2016
022230c
[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings…
felixcheung Aug 16, 2016
6cb3eab
[SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator
phalodi Aug 16, 2016
3e0163b
[SPARK-17084][SQL] Rename ParserUtils.assert to validate
hvanhovell Aug 17, 2016
68a24d3
[MINOR][DOC] Fix the descriptions for `properties` argument in the do…
Aug 17, 2016
22c7660
[SPARK-15285][SQL] Generated SpecificSafeProjection.apply method grow…
kiszk Aug 17, 2016
394d598
[SPARK-17102][SQL] bypass UserDefinedGenerator for json format check
cloud-fan Aug 17, 2016
9406f82
[SPARK-17096][SQL][STREAMING] Improve exception string reported throu…
tdas Aug 17, 2016
585d1d9
[SPARK-17038][STREAMING] fix metrics retrieval source of 'lastReceive…
keypointt Aug 17, 2016
91aa532
[SPARK-16995][SQL] TreeNodeException when flat mapping RelationalGrou…
viirya Aug 18, 2016
5735b8b
[SPARK-16391][SQL] Support partial aggregation for reduceGroups
rxin Aug 18, 2016
ec5f157
[SPARK-17117][SQL] 1 / NULL should not fail analysis
petermaxlee Aug 18, 2016
176af17
[MINOR][SPARKR] R API documentation for "coltypes" is confusing
keypointt Aug 10, 2016
ea684b6
[SPARK-17069] Expose spark.range() as table-valued function in SQL
ericl Aug 18, 2016
c180d63
[SPARK-16947][SQL] Support type coercion and foldable expression for …
petermaxlee Aug 19, 2016
05b180f
HOTFIX: compilation broken due to protected ctor.
rxin Aug 19, 2016
d55d1f4
[SPARK-16961][CORE] Fixed off-by-one error that biased randomizeInPlace
nicklavers Aug 19, 2016
e0c60f1
[SPARK-16994][SQL] Whitelist operators for predicate pushdown
rxin Aug 19, 2016
d0707c6
[SPARK-11227][CORE] UnknownHostException can be thrown when NameNode …
sarutak Aug 19, 2016
3276ccf
[SPARK-16686][SQL] Remove PushProjectThroughSample since it is handle…
viirya Jul 26, 2016
ae89c8e
[SPARK-17113] [SHUFFLE] Job failure due to Executor OOM in offheap mode
Aug 19, 2016
efe8322
[SPARK-17149][SQL] array.sql for testing array related functions
petermaxlee Aug 20, 2016
379b127
[SPARK-17158][SQL] Change error message for out of range numeric lite…
srinathshankar Aug 20, 2016
f7458c7
[SPARK-17150][SQL] Support SQL generation for inline tables
petermaxlee Aug 20, 2016
4c4c275
[SPARK-17104][SQL] LogicalRelation.newInstance should follow the sema…
viirya Aug 20, 2016
24dd9a7
[SPARK-17124][SQL] RelationalGroupedDataset.agg should preserve order…
petermaxlee Aug 20, 2016
faff929
[SPARK-12666][CORE] SparkSubmit packages fix for when 'default' conf …
BryanCutler Aug 20, 2016
26d5a8b
[MINOR][R] add SparkR.Rcheck/ and SparkR_*.tar.gz to R/.gitignore
mengxr Aug 21, 2016
0297896
[SPARK-16508][SPARKR] Fix CRAN undocumented/duplicated arguments warn…
junyangq Aug 20, 2016
e62b29f
[SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(N…
dongjoon-hyun Aug 21, 2016
49cc44d
[SPARK-17115][SQL] decrease the threshold when split expressions
Aug 22, 2016
2add45f
[SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSU…
jagadeesanas2 Aug 22, 2016
7919598
[SPARKR][MINOR] Fix Cache Folder Path in Windows
junyangq Aug 22, 2016
94eff08
[SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6
srowen Aug 22, 2016
6dcc1a3
[SPARKR][MINOR] Add Xiangrui and Felix to maintainers
shivaram Aug 22, 2016
01a4d69
[SPARK-17162] Range does not support SQL generation
ericl Aug 22, 2016
b65b041
[SPARK-16508][SPARKR] doc updates and more CRAN check fixes
felixcheung Aug 22, 2016
ff2f873
[SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize …
ericl Aug 22, 2016
2258989
[SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh
shivaram Aug 23, 2016
eaea1c8
[SPARK-17182][SQL] Mark Collect as non-deterministic
liancheng Aug 23, 2016
d16f9a0
[SPARKR][MINOR] Update R DESCRIPTION file
felixcheung Aug 23, 2016
811a2ce
[SPARK-13286] [SQL] add the next expression of SQLException as cause
Aug 23, 2016
cc40189
[SPARKR][MINOR] Remove reference link for common Windows environment …
junyangq Aug 23, 2016
a2a7506
[MINOR][DOC] Use standard quotes instead of "curly quote" marks from …
HyukjinKwon Aug 23, 2016
a772b4b
[SPARK-17194] Use single quotes when generating SQL for string literals
JoshRosen Aug 23, 2016
a6e6a04
[MINOR][SQL] Remove implemented functions from comments of 'HiveSessi…
weiqingy Aug 24, 2016
df87f16
[SPARK-17186][SQL] remove catalog table type INDEX
cloud-fan Aug 24, 2016
ce7dce1
[MINOR][BUILD] Fix Java CheckStyle Error
weiqingy Aug 24, 2016
33d79b5
[SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscr…
Aug 24, 2016
29091d7
[SPARKR][MINOR] Fix doc for show method
junyangq Aug 24, 2016
9f924a0
[SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be…
srowen Aug 24, 2016
4327337
[SPARKR][MINOR] Add more examples to window function docs
junyangq Aug 24, 2016
9f363a6
[SPARKR][MINOR] Add installation message for remote master mode and i…
junyangq Aug 24, 2016
3258f27
[SPARK-16216][SQL][BRANCH-2.0] Backport Read/write dateFormat/timesta…
HyukjinKwon Aug 25, 2016
aa57083
[SPARK-17228][SQL] Not infer/propagate non-deterministic constraints
sameeragarwal Aug 25, 2016
c1c4980
[SPARK-17193][CORE] HadoopRDD NPE at DEBUG log level when getLocation…
srowen Aug 25, 2016
fb1c697
[SPARK-17061][SPARK-17093][SQL] MapObjects` should make copies of uns…
lw-lin Aug 25, 2016
88481ea
Revert "[SPARK-17061][SPARK-17093][SQL] MapObjects` should make copie…
hvanhovell Aug 25, 2016
184e78b
[SPARK-17061][SPARK-17093][SQL][BACKPORT] MapObjects should make copi…
lw-lin Aug 25, 2016
48ecf3d
[SPARK-16991][SPARK-17099][SPARK-17120][SQL] Fix Outer Join Eliminati…
gatorsmile Aug 25, 2016
2b32a44
[SPARK-17167][2.0][SQL] Issue Exceptions when Analyze Table on In-Mem…
gatorsmile Aug 25, 2016
356a359
[SPARK-16700][PYSPARK][SQL] create DataFrame from dict/Row with schema
Aug 15, 2016
55db262
[SPARK-15083][WEB UI] History Server can OOM due to unlimited TaskUIData
ajbozarth Aug 25, 2016
b3a4430
[SPARKR][BUILD] ignore cran-check.out under R folder
wangmiao1981 Aug 25, 2016
ff2e270
[SPARK-17205] Literal.sql should handle Infinity and NaN
JoshRosen Aug 25, 2016
73014a2
[SPARK-17231][CORE] Avoid building debug or trace log messages unless…
Aug 25, 2016
27ed6d5
[SPARK-17242][DOCUMENT] Update links of external dstream projects
zsxwing Aug 26, 2016
6f82d2d
[SPARKR][MINOR] Fix example of spark.naiveBayes
junyangq Aug 26, 2016
deb6a54
[SPARK-17165][SQL] FileStreamSource should not track the list of seen…
petermaxlee Aug 26, 2016
52feb3f
[SPARK-17246][SQL] Add BigDecimal literal
hvanhovell Aug 26, 2016
dfdfc30
[SPARK-17235][SQL] Support purging of old logs in MetadataLog
petermaxlee Aug 26, 2016
9c0ac6b
[SPARK-17244] Catalyst should not pushdown non-deterministic join con…
sameeragarwal Aug 26, 2016
94d52d7
[SPARK-17269][SQL] Move finish analysis optimization stage into its o…
rxin Aug 27, 2016
f91614f
[SPARK-17270][SQL] Move object optimization rules into its own file (…
rxin Aug 27, 2016
901ab06
[SPARK-17274][SQL] Move join optimizer rules into a separate file
rxin Aug 27, 2016
56a8426
[SPARK-15382][SQL] Fix a bug in sampling with replacement
maropu Aug 27, 2016
7306c5f
[ML][MLLIB] The require condition and message doesn't match in Sparse…
Aug 27, 2016
5487fa0
[SPARK-17216][UI] fix event timeline bars length
Aug 27, 2016
eec0371
[SPARK-16216][SQL][FOLLOWUP][BRANCH-2.0] Bacoport enabling timestamp …
HyukjinKwon Aug 28, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
/lib/
R-unit-tests.log
R/unit-tests.out
R/cran-check.out
build/*.jar
build/apache-maven*
build/scala*
Expand Down Expand Up @@ -72,7 +73,13 @@ metastore/
metastore_db/
sql/hive-thriftserver/test_warehouses
warehouse/
spark-warehouse/

# For R session data
.RData
.RHistory
.Rhistory
*.Rproj
*.Rproj.*

.Rproj.user
3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.9.2 - http://py4j.sourceforge.net/)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.3 - http://py4j.sourceforge.net/)
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
(BSD licence) sbt and sbt-launch-lib.bash
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
Expand Down Expand Up @@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(MIT License) blockUI (http://jquery.malsup.com/block/)
(MIT License) RowsGroup (http://datatables.net/license/mit)
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)
13 changes: 5 additions & 8 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Apache Spark
Copyright 2014 The Apache Software Foundation.
Copyright 2014 and onwards The Apache Software Foundation.

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Expand All @@ -12,7 +12,9 @@ Common Development and Distribution License 1.0
The following components are provided under the Common Development and Distribution License 1.0. See project link for details.

(CDDL 1.0) Glassfish Jasper (org.mortbay.jetty:jsp-2.1:6.1.14 - http://jetty.mortbay.org/project/modules/jsp-2.1)
(CDDL 1.0) JAX-RS (https://jax-rs-spec.java.net/)
(CDDL 1.0) Servlet Specification 2.5 API (org.mortbay.jetty:servlet-api-2.5:6.1.14 - http://jetty.mortbay.org/project/modules/servlet-api-2.5)
(CDDL 1.0) (GPL2 w/ CPE) javax.annotation API (https://glassfish.java.net/nonav/public/CDDL+GPL.html)
(COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0) (GNU General Public Library) Streaming API for XML (javax.xml.stream:stax-api:1.0-2 - no url defined)
(Common Development and Distribution License (CDDL) v1.0) JavaBeans Activation Framework (JAF) (javax.activation:activation:1.1 - http://java.sun.com/products/javabeans/jaf/index.jsp)

Expand All @@ -22,15 +24,10 @@ Common Development and Distribution License 1.1

The following components are provided under the Common Development and Distribution License 1.1. See project link for details.

(CDDL 1.1) (GPL2 w/ CPE) org.glassfish.hk2 (https://hk2.java.net)
(CDDL 1.1) (GPL2 w/ CPE) JAXB API bundle for GlassFish V3 (javax.xml.bind:jaxb-api:2.2.2 - https://jaxb.dev.java.net/)
(CDDL 1.1) (GPL2 w/ CPE) JAXB RI (com.sun.xml.bind:jaxb-impl:2.2.3-1 - http://jaxb.java.net/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-core (com.sun.jersey:jersey-core:1.8 - https://jersey.dev.java.net/jersey-core/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-core (com.sun.jersey:jersey-core:1.9 - https://jersey.java.net/jersey-core/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-guice (com.sun.jersey.contribs:jersey-guice:1.9 - https://jersey.java.net/jersey-contribs/jersey-guice/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-json (com.sun.jersey:jersey-json:1.8 - https://jersey.dev.java.net/jersey-json/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-json (com.sun.jersey:jersey-json:1.9 - https://jersey.java.net/jersey-json/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-server (com.sun.jersey:jersey-server:1.8 - https://jersey.dev.java.net/jersey-server/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-server (com.sun.jersey:jersey-server:1.9 - https://jersey.java.net/jersey-server/)
(CDDL 1.1) (GPL2 w/ CPE) Jersey 2 (https://jersey.java.net)

========================================================================
Common Public License 1.0
Expand Down
2 changes: 2 additions & 0 deletions R/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@
lib
pkg/man
pkg/html
SparkR.Rcheck/
SparkR_*.tar.gz
12 changes: 6 additions & 6 deletions R/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# SparkR Documentation

SparkR documentation is generated using in-source comments annotated using using
`roxygen2`. After making changes to the documentation, to generate man pages,
SparkR documentation is generated by using in-source comments and annotated by using
[`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/index.html). After making changes to the documentation and generating man pages,
you can run the following from an R console in the SparkR home directory

library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))

```R
library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))
```
You can verify if your changes are good by running

R CMD check pkg/
32 changes: 18 additions & 14 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# R on Spark

SparkR is an R package that provides a light-weight frontend to use Spark from R.

### Installing sparkR

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
```
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
./install-dev.sh
Expand All @@ -17,8 +18,9 @@ export R_HOME=/home/username/R
#### Build Spark

Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
```
build/mvn -DskipTests -Psparkr package

```bash
build/mvn -DskipTests -Psparkr package
```

#### Running sparkR
Expand All @@ -37,8 +39,8 @@ To set other options like driver memory, executor memory etc. you can pass in th

#### Using SparkR from RStudio

If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```R
# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
Expand All @@ -55,23 +57,25 @@ Once you have made your changes, please include unit tests for them and run exis

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:

./bin/spark-submit examples/src/main/r/dataframe.R

You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh
```bash
./bin/spark-submit examples/src/main/r/dataframe.R
```
You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
```bash
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh
```

### Running on YARN

The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
```
```bash
export YARN_CONF_DIR=/etc/hadoop/conf
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
```
20 changes: 20 additions & 0 deletions R/WINDOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,23 @@ include Rtools and R in `PATH`.
directory in Maven in `PATH`.
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`

## Unit tests

To run the SparkR unit tests on Windows, the following steps are required —assuming you are in the Spark root directory and do not have Apache Hadoop installed already:

1. Create a folder to download Hadoop related files for Windows. For example, `cd ..` and `mkdir hadoop`.

2. Download the relevant Hadoop bin package from [steveloughran/winutils](https://github.com/steveloughran/winutils). While these are not official ASF artifacts, they are built from the ASF release git hashes by a Hadoop PMC member on a dedicated Windows VM. For further reading, consult [Windows Problems on the Hadoop wiki](https://wiki.apache.org/hadoop/WindowsProblems).

3. Install the files into `hadoop\bin`; make sure that `winutils.exe` and `hadoop.dll` are present.

4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory.

5. Run unit tests for SparkR by running the command below. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:

```
R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.default.name="file:///" R\pkg\tests\run-all.R
```

64 changes: 64 additions & 0 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

set -o pipefail
set -e

FWDIR="$(cd `dirname $0`; pwd)"
pushd $FWDIR > /dev/null

if [ ! -z "$R_HOME" ]
then
R_SCRIPT_PATH="$R_HOME/bin"
else
# if system wide R_HOME is not found, then exit
if [ ! `command -v R` ]; then
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
exit 1
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"

# Build the latest docs
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`

CRAN_CHECK_OPTIONS="--as-cran"

if [ -n "$NO_TESTS" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-tests"
fi

if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
fi

echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"

"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz

popd > /dev/null
7 changes: 6 additions & 1 deletion R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,12 @@ pushd $FWDIR > /dev/null
if [ ! -z "$R_HOME" ]
then
R_SCRIPT_PATH="$R_HOME/bin"
else
else
# if system wide R_HOME is not found, then exit
if [ ! `command -v R` ]; then
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
exit 1
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
Expand Down
5 changes: 5 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
^src-native$
^html$
24 changes: 16 additions & 8 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,20 +1,25 @@
Package: SparkR
Type: Package
Title: R frontend for Spark
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2013-09-09
Author: The Apache Software Foundation
Maintainer: Shivaram Venkataraman <[email protected]>
Imports:
methods
Date: 2016-07-07
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "[email protected]"),
person("Xiangrui", "Meng", role = "aut",
email = "[email protected]"),
person("Felix", "Cheung", role = "aut",
email = "[email protected]"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420&components=12325400&issuetype=4
Depends:
R (>= 3.0),
methods,
methods
Suggests:
testthat,
e1071,
survival
Description: R frontend for Spark
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
'schema.R'
Expand All @@ -26,16 +31,19 @@ Collate:
'pairRDD.R'
'DataFrame.R'
'SQLContext.R'
'WindowSpec.R'
'backend.R'
'broadcast.R'
'client.R'
'context.R'
'deserialize.R'
'functions.R'
'install.R'
'mllib.R'
'serialize.R'
'sparkR.R'
'stats.R'
'types.R'
'utils.R'
'window.R'
RoxygenNote: 5.0.1
Loading