Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
313 commits
Select commit Hold shift + click to select a range
c596154
[SPARK-15115][SQL] Reorganize whole stage codegen benchmark suites
rxin May 4, 2016
e868a15
[SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCa…
tdas May 4, 2016
45862f6
[SPARK-15126][SQL] RuntimeConfig.set should return Unit
rxin May 4, 2016
eeb18f6
[SPARK-15121] Improve logging of external shuffle handler
May 4, 2016
c0715f3
[SPARK-12299][CORE] Remove history serving functionality from Master
BryanCutler May 4, 2016
23789e3
[SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python/Java example.
dongjoon-hyun May 4, 2016
1e7d9bf
[SPARK-13001][CORE][MESOS] Prevent getting offers when reached max cores
sebastienrainville May 4, 2016
701c667
[SPARK-15116] In REPL we should create SparkSession first and get Spa…
cloud-fan May 4, 2016
aca46ec
[MINOR][SQL] Fix typo in DataFrameReader csv documentation
sethah May 4, 2016
fa3c550
[SPARK-14896][SQL] Deprecate HiveContext in python
May 5, 2016
d90359d
[SPARK-6339][SQL] Supports CREATE TEMPORARY VIEW tableIdentifier AS q…
clockfly May 5, 2016
689b0fc
[SPARK-14993][SQL] Fix Partition Discovery Inconsistency when Input i…
gatorsmile May 5, 2016
e12ec46
[SPARK-15131][SQL] Shutdown StateStore management thread when SparkCo…
tdas May 5, 2016
2023faf
[MINOR] remove dead code
davies May 5, 2016
0914296
[SPARK-15132][MINOR][SQL] Debug log for generated code should be prin…
sarutak May 5, 2016
e28d21d
[SPARK-15045] [CORE] Remove dead code in TaskMemoryManager.cleanUpAll…
abhi951990 May 5, 2016
433bc34
[SPARK-15123] upgrade org.json4s to 3.2.11 version
liningalex May 5, 2016
0c4e42b
[SPARK-12154] Upgrade to Jersey 2
mccheah May 5, 2016
743f07d
[SPARK-15106][PYSPARK][ML] Add PySpark package doc for ML component &…
holdenk May 5, 2016
666eb01
[SPARK-14589][SQL] Enhance DB2 JDBC Dialect docker tests
lresende May 5, 2016
80b49be
[SPARK-14915][CORE] Don't re-queue a task if another attempt has alre…
jasonmoore2k May 5, 2016
3468111
[SPARK-14139][SQL] RowEncoder should preserve schema nullability
cloud-fan May 5, 2016
4ec5d93
[SPARK-15148][SQL] Upgrade Univocity library from 2.0.2 to 2.1.0
HyukjinKwon May 5, 2016
c2b100e
[SPARK-15110] [SPARKR] Implement repartitionByColumn for SparkR DataF…
NarineK May 5, 2016
b063d9b
[MINOR][BUILD] Adds spark-warehouse/ to .gitignore
liancheng May 5, 2016
fe268ee
[SPARK-14124][SQL][FOLLOWUP] Implement Database-related DDL Commands
gatorsmile May 5, 2016
59fa480
[SPARK-15072][SQL][REPL][EXAMPLES] Remove SparkSession.withHiveSupport
techaddict May 5, 2016
e78b31b
[SPARK-15135][SQL] Make sure SparkSession thread safe
zsxwing May 5, 2016
8b4ab59
[SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and updat…
dongjoon-hyun May 5, 2016
19a14e8
[SPARK-15158][CORE] downgrade shouldRollover message to debug level
depend May 5, 2016
80a4bfa
[SPARK-9926] Parallelize partition logic in UnionRDD.
rdblue May 5, 2016
1064a33
[SPARK-14893][SQL] Re-enable HiveSparkSubmitSuite SPARK-8489 test aft…
dilipbiswal May 5, 2016
a1887f2
[SPARK-15152][DOC][MINOR] Scaladoc and Code style Improvements
jaceklaskowski May 5, 2016
7dc3fb6
[HOTFIX] Fix MLUtils compile
May 5, 2016
42f2ee6
[SPARK-11395][SPARKR] Support over and window specification in SparkR.
May 6, 2016
1ee621b
[SPARK-14738][BUILD] Separate docker integration tests from main build
lresende May 6, 2016
3f6a13c
[SPARK-14512] [DOC] Add python example for QuantileDiscretizer
zhengruifeng May 6, 2016
d7c7555
[SPARK-14962][SQL] Do not push down isnotnull/isnull on unsuportted t…
HyukjinKwon May 6, 2016
1e6b158
[SPARK-15108][SQL] Describe Permanent UDTF
gatorsmile May 6, 2016
22f9f5f
[SPARK-14050][ML] Add multiple languages support and additional metho…
burakkose May 6, 2016
dc1562e
[SPARK-14997][SQL] Fixed FileCatalog to return correct set of files w…
tdas May 6, 2016
d98dd72
[SPARK-1239] Improve fetching of map output statuses
May 7, 2016
f6d7292
[SPARK-15087][MINOR][DOC] Follow Up: Fix the Comments
techaddict May 7, 2016
4ccc564
[SPARK-15051][SQL] Create a TypedColumn alias
kevinyu98 May 7, 2016
49e6661
[SPARK-15122] [SQL] Fix TPC-DS 41 - Normalize predicates before pulli…
hvanhovell May 7, 2016
d0302a2
[MINOR][ML][PYSPARK] ALS example cleanup
May 7, 2016
9560bad
[DOC][MINOR] Fixed minor errors in feature.ml user guide doc
BryanCutler May 7, 2016
69f3edc
[SPARK-15178][CORE] Remove LazyFileRegion instead use netty's Default…
techaddict May 7, 2016
cf156e6
[SPARK-12479][SPARKR] sparkR collect on GroupedData throws R error "m…
sun-rui May 8, 2016
cb090df
[SPARK-15185][SQL] InMemoryCatalog: Silent Removal of an Existent Tab…
gatorsmile May 9, 2016
c0c5c26
[SPARK-15184][SQL] Fix Silent Removal of An Existent Temp Table by Re…
gatorsmile May 9, 2016
238b7b4
[SPARK-15211][SQL] Select features column from LibSVMRelation causes …
viirya May 9, 2016
eb0db90
[SPARK-14814][MLLIB] API: Java compatibility, docs
hhbyyh May 9, 2016
62333f2
[SPARK-15136][PYSPARK][DOC] Fix links to sphinx style and add a defau…
holdenk May 9, 2016
cbb4fa1
[MINOR][TEST][STREAMING] make "testDir" able to be claened after test.
wei-mao-intel May 9, 2016
8caaaed
[SPARK-14459][SQL] Detect relation partitioning and adjust the logica…
rdblue May 9, 2016
fb73663
[MINOR] [SPARKR] Update data-manipulation.R to use native csv reader
yanboliang May 9, 2016
5cdb7be
[SPARK-15093][SQL] create/delete/rename directory for InMemoryCatalog…
cloud-fan May 9, 2016
29bc8d2
[SPARK-15199][SQL] Disallow Dropping Build-in Functions
gatorsmile May 9, 2016
de6afc8
[SPARK-14127][SQL] Makes 'DESC [EXTENDED|FORMATTED] <table>' support …
liancheng May 9, 2016
6371197
[MINOR][DOCS] Remove remaining sqlContext in documentation at examples
HyukjinKwon May 9, 2016
1b4e99f
[SPARK-15223][DOCS] fix wrongly named config reference
philipphoffmann May 9, 2016
8f0ed28
[SPARK-15225][SQL] Replace SQLContext with SparkSession in Encoder do…
viirya May 9, 2016
3c6f686
[SPARK-15067][YARN] YARN executors are launched with fixed perm gen size
srowen May 9, 2016
1d56158
[MINOR][SQL] Enhance the exception message if checkpointLocation is n…
jerryshao May 9, 2016
c6d23b6
[SAPRK-15220][UI] add hyperlink to running application and completed …
wei-mao-intel May 9, 2016
f81d251
[SPARK-15210][SQL] Add missing @DeveloperApi annotation in sql.types
zhengruifeng May 9, 2016
e3f000a
[SPARK-15166][SQL] Move some hive-specific code from SparkSession
May 9, 2016
40d2468
[SPARK-10653][CORE] Remove unnecessary things from SparkEnv
ajbozarth May 9, 2016
bf53b96
[SPARK-15173][SQL] DataFrameWriter.insertInto should work with dataso…
cloud-fan May 9, 2016
3d69f87
[SPARK-14972] Improve performance of JSON schema inference's compatib…
JoshRosen May 9, 2016
6a5ec08
[SPARK-15209] Fix display of job descriptions with single quotes in w…
JoshRosen May 9, 2016
1bcbf61
[SPARK-15025][SQL] fix duplicate of PATH key in datasource table options
xwu0226 May 10, 2016
036c224
[SPARK-15234][SQL] Fix spark.catalog.listDatabases.show()
May 10, 2016
1d18a6d
[SPARK-15229][SQL] Make case sensitivity setting internal
rxin May 10, 2016
27bb51c
[SPARK-15187][SQL] Disallow Dropping Default Database
gatorsmile May 10, 2016
58f7742
[SPARK-15215][SQL] Fix Explain Parsing and Output
gatorsmile May 10, 2016
ff2b715
[SPARK-14542][CORE] PipeRDD should allow configurable buffer size for…
May 10, 2016
841666d
[SPARK-14127][SQL] "DESC <table>": Extracts schema information from t…
liancheng May 10, 2016
4aa9052
[SPARK-15154] [SQL] Change key types to Long in tests
robbinspg May 10, 2016
1a6272e
[SPARK-14773] [SPARK-15179] [SQL] Fix SQL building and enable Hive tests
hvanhovell May 10, 2016
a66ebbc
[SPARK-13382][DOCS][PYSPARK] Update pyspark testing notes in build docs
holdenk May 10, 2016
918bf6e
[SPARK-13670][LAUNCHER] Propagate error from launcher to shell.
May 10, 2016
af12b0a
[SPARK-11249][LAUNCHER] Throw error if app resource is not provided.
May 10, 2016
19a9c23
[SPARK-12837][CORE] reduce network IO for accumulators
cloud-fan May 10, 2016
5bf74b4
[SPARK-15037][SQL][MLLIB] Use SparkSession instead of SQLContext in S…
techaddict May 10, 2016
42db140
[SPARK-14603][SQL] Verification of Metadata Operations by Session Cat…
gatorsmile May 10, 2016
bd7fd14
[SPARK-15037][HOTFIX] Replace `sqlContext` and `sparkSession` with `s…
dongjoon-hyun May 10, 2016
a432e80
[SPARK-15037][HOTFIX] Don't create 2 SparkSessions in constructor
May 10, 2016
82f6959
[SPARK-15195][PYSPARK][DOCS] Update ml.tuning PyDocs
holdenk May 10, 2016
5a4a188
[SPARK-14642][SQL] import org.apache.spark.sql.expressions._ breaks u…
sbcd90 May 10, 2016
0ab1958
[SPARK-14986][SQL] Return correct result for empty LATERAL VIEW OUTER
hvanhovell May 10, 2016
95f2549
[SPARK-6005][TESTS] Fix flaky test: o.a.s.streaming.kafka.DirectKafka…
zsxwing May 10, 2016
1db027d
[SPARK-15249][SQL] Use FunctionResource instead of (String, String) i…
techaddict May 10, 2016
f021f34
[SPARK-14936][BUILD][TESTS] FlumePollingStreamSuite is slow
keypointt May 10, 2016
d8c2da9
[SPARK-14837][SQL][STREAMING] Added support in file stream source for…
tdas May 10, 2016
5e3192a
[SPARK-14476][SQL] Improve the physical plan visualization by adding …
clockfly May 11, 2016
03dfe78
[SPARK-15261][SQL] Remove experimental tag from DataFrameReader/Writer
rxin May 11, 2016
0ecc105
[SPARK-15250][SQL] Remove deprecated json API in DataFrameReader
HyukjinKwon May 11, 2016
a675f5e
[SPARK-15265][SQL][MINOR] Fix Union query error message indentation
dongjoon-hyun May 11, 2016
1b446a4
[SPARK-15255][SQL] limit the length of name for cached DataFrame
May 11, 2016
d9288b8
[SPARK-15246][SPARK-4452][CORE] Fix code style and improve volatile for
lianhuiwang May 11, 2016
ca5ce53
[SPARK-15235][WEBUI] Corresponding row cannot be highlighted even tho…
sarutak May 11, 2016
a8637f4
[SPARK-15189][PYSPARK][DOCS] Update ml.evaluation PyDoc
holdenk May 11, 2016
2d3c69a
[SPARK-15231][SQL] Document the semantic of saveAsTable and insertInt…
zsxwing May 11, 2016
bee2ddb
[SPARK-15141][EXAMPLE][DOC] Update OneVsRest Examples
zhengruifeng May 11, 2016
73dd889
[SPARK-14340][EXAMPLE][DOC] Update Examples and User Guide for ml.Bis…
zhengruifeng May 11, 2016
36f711d
[SPARK-15149][EXAMPLE][DOC] update kmeans example
zhengruifeng May 11, 2016
1e7d8ba
[SPARK-14976][STREAMING] make StreamingContext.textFileStream support…
wei-mao-intel May 11, 2016
1753f65
[SPARK-15238] Clarify supported Python versions
nchammas May 11, 2016
3bd7a89
[SPARK-15150][EXAMPLE][DOC] Update LDA examples
zhengruifeng May 11, 2016
749c29b
[SPARK-14933][SQL] Failed to create view out of a parquet or orc table
xwu0226 May 11, 2016
0858a82
[SPARK-15268][SQL] Make JavaTypeInference work with UDTRegistration
viirya May 11, 2016
403ba65
[SPARK-14933][HOTFIX] Replace `sqlContext` with `spark`.
dongjoon-hyun May 11, 2016
381a825
[SPARK-15241] [SPARK-15242] [SQL] fix 2 decimal-related issues in Row…
cloud-fan May 11, 2016
1b90adc
[SPARK-15037] [SQL] [MLLIB] Part2: Use SparkSession instead of SQLCon…
techaddict May 11, 2016
e3703c4
[SPARK-15259] Sort time metric should not include spill and record in…
ericl May 11, 2016
56e1e2f
[SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact
koeninger May 11, 2016
6b36185
[SPARK-15248][SQL] Make MetastoreFileCatalog consider directories fro…
tdas May 11, 2016
83050dd
[SPARK-15260] Atomically resize memory pools
May 11, 2016
6e08eb4
[SPARK-12200][SQL] Add __contains__ implementation to Row
May 11, 2016
2454f6a
[SPARK-15262] Synchronize block manager / scheduler executor state
May 11, 2016
0699acc
[SPARK-15270] [SQL] Use SparkSession Builder to build a session with …
techaddict May 11, 2016
b1e14d9
[SPARK-15278] [SQL] Remove experimental tag from Python DataFrame
rxin May 11, 2016
4e56857
[SPARK-15257][SQL] Require CREATE EXTERNAL TABLE to specify LOCATION
May 11, 2016
f9ea545
[SPARK-15256] [SQL] [PySpark] Clarify DataFrameReader.jdbc() docstring
nchammas May 11, 2016
f763c14
[SPARK-15276][SQL] CREATE TABLE with LOCATION should imply EXTERNAL
May 12, 2016
f8804bb
[SPARK-15264][SPARK-15274][SQL] CSV Reader Error on Blank Column Names
May 12, 2016
114be70
[SPARK-15072][SQL][PYSPARK] FollowUp: Remove SparkSession.withHiveSup…
techaddict May 12, 2016
b2b04c6
[SPARK-15080][CORE] Break copyAndReset into copy and reset
techaddict May 12, 2016
0b14b3f
[SPARK-14346] SHOW CREATE TABLE for data source tables
liancheng May 12, 2016
7d18753
[SPARK-15072][SQL][PYSPARK][HOT-FIX] Remove SparkSession.withHiveSupp…
yhuai May 12, 2016
86acb5e
[SPARK-15031][SPARK-15134][EXAMPLE][DOC] Use SparkSession and update …
zhengruifeng May 12, 2016
beda393
[SPARK-15160][SQL] support data source table in InMemoryCatalog
cloud-fan May 12, 2016
6b69b8c
[SPARK-15281][PYSPARK][ML][TRIVIAL] Add impurity param to GBTRegresso…
holdenk May 12, 2016
9098b1a
[SPARK-15171][SQL] Deprecate registerTempTable and add dataset.create…
clockfly May 12, 2016
b3f1454
[HOTFIX] SQL test compilation error from merge conflict
May 10, 2016
68617e1
[SPARK-15094][SPARK-14803][SQL] Remove extra Project added in Elimina…
viirya May 12, 2016
9c5c901
[SPARK-14684][SPARK-15277][SQL] Partition Spec Validation in SessionC…
gatorsmile May 12, 2016
7a14d28
[SPARK-14897][SQL] upgrade to jetty 9.2.16
May 12, 2016
ac6e9a8
[SPARK-14421] Upgrades protobuf dependency to 2.6.1 for the new versi…
boneill42 May 12, 2016
31ea3c7
[SPARK-10605][SQL] Create native collect_list/collect_set aggregates
hvanhovell May 12, 2016
0d24fe0
[SPARK-13902][SCHEDULER] Make DAGScheduler not to create duplicate st…
ueshin May 12, 2016
54c04aa
[SPARK-15202][SPARKR] add dapplyCollect() method for DataFrame in Spa…
sun-rui May 13, 2016
d73ce36
[SPARK-15306][SQL] Move object expressions into expressions.objects p…
rxin May 13, 2016
51706f8
[SPARK-14541][SQL] Support IFNULL, NULLIF, NVL and NVL2
rxin May 13, 2016
7b925e5
[SPARK-13866] [SQL] Handle decimal type in CSV inference at CSV data …
HyukjinKwon May 13, 2016
b6b2c61
[SPARK-15188] Add missing thresholds param to NaiveBayes in PySpark
holdenk May 13, 2016
0076bf0
[MINOR][PYSPARK] update _shared_params_code_gen.py
zhengruifeng May 13, 2016
7affde2
[SPARK-15181][ML][PYSPARK] Python API for GLR summaries.
sethah May 13, 2016
86b8f8a
[SPARK-13961][ML] spark.ml ChiSqSelector and RFormula should support …
BenFradet May 13, 2016
43570c5
[SPARK-15310][SQL] Rename HiveTypeCoercion -> TypeCoercion
rxin May 13, 2016
3727e28
[SPARK-14900][ML] spark.ml classification metrics should include accu…
wangmiao1981 May 13, 2016
beaf703
[SPARK-15061][PYSPARK] Upgrade to Py4J 0.10.1
holdenk May 13, 2016
6c57685
[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient
srowen May 13, 2016
1390eca
Revert "[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient"
srowen May 13, 2016
d3110d8
[SPARK-15267][SQL] Refactor options for JDBC and ORC data sources and…
HyukjinKwon May 13, 2016
78bf9a1
[TRIVIAL] Add () to SparkSession's builder function
tejasapatil May 14, 2016
2d6f3bb
[SPARK-15197][DOCS] Added Scaladoc for countApprox and countByValueAp…
May 14, 2016
d305f72
[SPARK-15096][ML] LogisticRegression MultiClassSummarizer numClasses …
wangmiao1981 May 14, 2016
4f2f96f
[SPARK-15253][SQL] Support old table schema config key "spark.sql.sou…
clockfly May 16, 2016
5afde26
[SPARK-15305][ML][DOC] spark.ml document Bisectiong k-means has the i…
wangmiao1981 May 16, 2016
f937ce7
[SPARK-14979][ML][PYSPARK] Add examples for GeneralizedLinearRegression
yanboliang May 16, 2016
0dd1f87
[SPARK-14942][SQL][STREAMING] Reduce delay between batch construction…
lw-lin May 16, 2016
8e3ee68
[SPARK-14906][ML] Copy linalg in PySpark to new ML package
mengxr May 17, 2016
0d5e296
[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient
srowen May 15, 2016
6d10b28
[SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update org.apache.htt…
srowen May 16, 2016
1426235
[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, i…
srowen May 17, 2016
c0bcecf
[SPARK-15351][SQL] RowEncoder should support array as the external ty…
cloud-fan May 17, 2016
b031ea7
[SPARK-14434][ML] User guide doc and examples for GaussianMixture in …
wangmiao1981 May 17, 2016
273f3d0
[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki
srowen May 17, 2016
670f482
[SPARK-15318][ML][EXAMPLE] spark.ml Collaborative Filtering example d…
wangmiao1981 May 17, 2016
110876b
[SPARK-15165] [SQL] Codegen can break because toCommentSafeString is …
sarutak May 17, 2016
adc1c26
[SPARK-14346][SQL][FOLLOW-UP] add tests for CREAT TABLE USING with pa…
cloud-fan May 17, 2016
af37bdd
[SPARK-10216][SQL] Avoid creating empty files during overwriting with…
HyukjinKwon May 17, 2016
025b3e9
[SPARK-15182][ML] Copy MLlib doc to ML: ml.feature.tf, idf
hhbyyh May 17, 2016
1ad3bbd
[MINOR][DOCS] Replace remaining 'sqlContext' in ScalaDoc/JavaDoc.
dongjoon-hyun May 17, 2016
ff1cfce
[SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline…
May 17, 2016
c0bb771
[SPARK-15244] [PYTHON] Type of column name created with createDataFra…
dongjoon-hyun May 17, 2016
7b62b7c
[SPARK-11735][CORE][SQL] Add a check in the constructor of SQLContext…
zsxwing May 17, 2016
2dddec4
[SPARK-14346][SQL] Native SHOW CREATE TABLE for Hive tables/views
liancheng May 17, 2016
1db3741
[SPARK-14346] Fix scala-2.10 build
yhuai May 18, 2016
5f5270e
[SPARK-15171][SQL] Remove the references to deprecated method dataset…
clockfly May 18, 2016
c8be3da
Prepare branch for 2.0.0-preview.
rxin May 18, 2016
8f5a04b
Preparing Spark release 2.0.0-preview
pwendell May 18, 2016
b545009
Preparing development version 2.0.0-SNAPSHOT
pwendell May 18, 2016
fc97ff5
[SPARK-14978][PYSPARK] PySpark TrainValidationSplitModel should suppo…
taku-k May 18, 2016
c66da74
[SPARK-15334][SQL] HiveClient facade not compatible with Hive 0.12
clockfly May 18, 2016
35c25be
[SPARK-15307][SQL] speed up listing files for data source
May 18, 2016
14751cd
[SPARK-15322][MLLIB][CORE][SQL] update deprecate accumulator usage in…
WeichenXu123 May 18, 2016
a122a3e
[SPARK-15334][SQL][HOTFIX] Fixes compilation error for Scala 2.10
liancheng May 18, 2016
fe0a068
[SPARK-15346][MLLIB] Reduce duplicate computation in picking initial …
mouendless May 18, 2016
7ae006f
[SPARK-15357] Cooperative spilling should check consumer memory mode
May 18, 2016
67c5472
[MINOR][SQL] Remove unused pattern matching variables in Optimizers.
dongjoon-hyun May 18, 2016
d005f76
[SPARK-15342] [SQL] [PYSPARK] PySpark test for non ascii column name …
viirya May 18, 2016
0da8bce
[SPARK-14891][ML] Add schema validation for ALS
May 18, 2016
d65707b
[SPARK-15373][WEB UI] Spark UI should show consistent timezones.
dongjoon-hyun May 18, 2016
4c0af3b
[SPARK-15392][SQL] fix default value of size estimation of logical plan
May 18, 2016
36acf88
[SPARK-15323][SPARK-14463][SQL] Fix reading of partitioned format=tex…
jurriaan May 18, 2016
f578445
[SPARK-15192][SQL] null check for SparkSession.createDataFrame
cloud-fan May 19, 2016
760e7ac
[SPARK-15297][SQL] Fix Set -V Command
gatorsmile May 19, 2016
595ed8d
[SPARK-14463][SQL] Document the semantics for read.text
rxin May 19, 2016
a1948a0
[SPARK-15395][CORE] Use getHostString to create RpcAddress
zsxwing May 19, 2016
34c743c
[SPARK-15381] [SQL] physical object operator should define reference …
cloud-fan May 19, 2016
b2a4dac
[SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working …
HyukjinKwon May 19, 2016
ff115f5
[SPARK-14939][SQL] Add FoldablePropagation optimizer
dongjoon-hyun May 19, 2016
282a2a7
[SPARK-15362][ML] Make spark.ml KMeansModel load backwards compatible
yanboliang May 19, 2016
9f2730b
[SPARK-15292][ML] ML 2.0 QA: Scala APIs audit for classification
yanboliang May 19, 2016
bd609b0
[SPARK-14613][ML] Add @Since into the matrix and vector classes in sp…
May 19, 2016
2604ead
[SPARK-15390] fix broadcast with 100 millions rows
May 19, 2016
496f6d0
[SPARK-14603][SQL][FOLLOWUP] Verification of Metadata Operations by S…
gatorsmile May 19, 2016
96a473a
[SPARK-15300] Fix writer lock conflict when remove a block
May 19, 2016
9c817d0
[SPARK-15387][SQL] SessionCatalog in SimpleAnalyzer does not need to …
sarutak May 19, 2016
554e0f3
[SPARK-15322][SQL][FOLLOW-UP] Update deprecated accumulator usage int…
HyukjinKwon May 19, 2016
97fd9a0
[SPARK-15316][PYSPARK][ML] Add linkPredictionCol to GeneralizedLinear…
holdenk May 19, 2016
4f8639f
[SPARK-14346][SQL] Lists unsupported Hive features in SHOW CREATE TAB…
liancheng May 19, 2016
62e5158
[SPARK-15317][CORE] Don't store accumulators for every task in listeners
zsxwing May 19, 2016
d1b5df8
[SPARK-15392][SQL] fix default value of size estimation of logical plan
May 19, 2016
4257ba3
Fix the compiler error introduced by #13153 for Scala 2.10
zsxwing May 19, 2016
833dbf9
[SPARK-15411][ML] Add @since to ml.stat.MultivariateOnlineSummarizer.…
May 19, 2016
ebf30ed
[SPARK-15361][ML] ML 2.0 QA: Scala APIs audit for ml.clustering
yanboliang May 19, 2016
758253f
[SPARK-15414][MLLIB] Make the mllib,ml linalg type conversion APIs pu…
techaddict May 20, 2016
2c939e5
[SPARK-15375][SQL][STREAMING] Add ConsoleSink to structure streaming
jerryshao May 20, 2016
b0aff55
[SPARK-15341][DOC][ML] Add documentation for "model.write" to clarify…
yanboliang May 20, 2016
e53a8f2
[MINOR][ML][PYSPARK] ml.evaluation Scala and Python API sync
yanboliang May 20, 2016
7e25131
[SPARK-15416][SQL] Display a better message for not finding classes r…
zsxwing May 20, 2016
5fa2395
[SPARK-15296][MLLIB] Refactor All Java Tests that use SparkSession
techaddict May 20, 2016
c21c691
[SPARK-15321] Fix bug where Array[Timestamp] cannot be encoded/decode…
smungee May 20, 2016
e6810e9
[SPARK-11827][SQL] Adding java.math.BigInteger support in Java type i…
kevinyu98 May 20, 2016
52b967f
[SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and pro…
rxin May 20, 2016
c08739a
[SPARK-14990][SQL] Fix checkForSameTypeInputExpr (ignore nullability)
rxin May 20, 2016
7bb3335
[SPARK-14261][SQL] Memory leak in Spark Thrift Server
dosoft May 20, 2016
dcf36ad
[SPARK-15057][GRAPHX] Remove stale TODO comment for making `enum` in …
dongjoon-hyun May 3, 2016
1dc30f1
[DOC][MINOR] ml.feature Scala and Python API sync
BryanCutler May 19, 2016
642f009
[MINOR] Fix Typos
zhengruifeng May 15, 2016
2126fb0
[CORE][MINOR] Remove redundant set master in OutputCommitCoordinatorI…
techaddict May 19, 2016
1fc0f95
[HOTFIX] Test compilation error from 52b967f
May 20, 2016
dd0c7fb
Revert "[HOTFIX] Test compilation error from 52b967f"
rxin May 20, 2016
f8d0177
Revert "[SPARK-15392][SQL] fix default value of size estimation of lo…
davies May 18, 2016
2ef6457
[SPARK-15313][SQL] EmbedSerializerInFilter rule should keep exprIds o…
ueshin May 20, 2016
6128664
[HOTFIX] Add back intended change from SPARK-15392
May 20, 2016
47feebd
[SPARK-15335][SQL] Implement TRUNCATE TABLE Command
lianhuiwang May 20, 2016
8fb0877
[SPARK-15172][ML] Explicitly tell user initial coefficients is ignore…
May 9, 2016
e4e3e98
[SPARK-15363][ML][EXAMPLE] Example code shouldn't use VectorImplicits…
wangmiao1981 May 20, 2016
539dfa2
[SPARK-15398][ML] Update the warning message to recommend ML usage
zhengruifeng May 20, 2016
5f73f62
[SPARK-15394][ML][DOCS] User guide typos and grammar audit
sethah May 20, 2016
9963fd4
[SPARK-15339][ML] ML 2.0 QA: Scala APIs and code audit for regression
yanboliang May 20, 2016
4d13348
[SPARK-15367][SQL] Add refreshTable back
gatorsmile May 20, 2016
4e25d6e
[SPARK-15421][SQL] Validate DDL property values
May 20, 2016
53c09f0
[SPARK-15417][SQL][PYTHON] PySpark shell always uses in-memory catalog
May 20, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ metastore/
metastore_db/
sql/hive-thriftserver/test_warehouses
warehouse/
spark-warehouse/

# For R session data
.RData
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.9.2 - http://py4j.sourceforge.net/)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.1 - http://py4j.sourceforge.net/)
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
(BSD licence) sbt and sbt-launch-lib.bash
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
Expand Down
11 changes: 4 additions & 7 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ Common Development and Distribution License 1.0
The following components are provided under the Common Development and Distribution License 1.0. See project link for details.

(CDDL 1.0) Glassfish Jasper (org.mortbay.jetty:jsp-2.1:6.1.14 - http://jetty.mortbay.org/project/modules/jsp-2.1)
(CDDL 1.0) JAX-RS (https://jax-rs-spec.java.net/)
(CDDL 1.0) Servlet Specification 2.5 API (org.mortbay.jetty:servlet-api-2.5:6.1.14 - http://jetty.mortbay.org/project/modules/servlet-api-2.5)
(CDDL 1.0) (GPL2 w/ CPE) javax.annotation API (https://glassfish.java.net/nonav/public/CDDL+GPL.html)
(COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0) (GNU General Public Library) Streaming API for XML (javax.xml.stream:stax-api:1.0-2 - no url defined)
(Common Development and Distribution License (CDDL) v1.0) JavaBeans Activation Framework (JAF) (javax.activation:activation:1.1 - http://java.sun.com/products/javabeans/jaf/index.jsp)

Expand All @@ -22,15 +24,10 @@ Common Development and Distribution License 1.1

The following components are provided under the Common Development and Distribution License 1.1. See project link for details.

(CDDL 1.1) (GPL2 w/ CPE) org.glassfish.hk2 (https://hk2.java.net)
(CDDL 1.1) (GPL2 w/ CPE) JAXB API bundle for GlassFish V3 (javax.xml.bind:jaxb-api:2.2.2 - https://jaxb.dev.java.net/)
(CDDL 1.1) (GPL2 w/ CPE) JAXB RI (com.sun.xml.bind:jaxb-impl:2.2.3-1 - http://jaxb.java.net/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-core (com.sun.jersey:jersey-core:1.8 - https://jersey.dev.java.net/jersey-core/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-core (com.sun.jersey:jersey-core:1.9 - https://jersey.java.net/jersey-core/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-guice (com.sun.jersey.contribs:jersey-guice:1.9 - https://jersey.java.net/jersey-contribs/jersey-guice/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-json (com.sun.jersey:jersey-json:1.8 - https://jersey.dev.java.net/jersey-json/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-json (com.sun.jersey:jersey-json:1.9 - https://jersey.java.net/jersey-json/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-server (com.sun.jersey:jersey-server:1.8 - https://jersey.dev.java.net/jersey-server/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-server (com.sun.jersey:jersey-server:1.9 - https://jersey.java.net/jersey-server/)
(CDDL 1.1) (GPL2 w/ CPE) Jersey 2 (https://jersey.java.net)

========================================================================
Common Public License 1.0
Expand Down
2 changes: 2 additions & 0 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Collate:
'pairRDD.R'
'DataFrame.R'
'SQLContext.R'
'WindowSpec.R'
'backend.R'
'broadcast.R'
'client.R'
Expand All @@ -38,4 +39,5 @@ Collate:
'stats.R'
'types.R'
'utils.R'
'window.R'
RoxygenNote: 5.0.1
11 changes: 11 additions & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ exportMethods("arrange",
"covar_pop",
"crosstab",
"dapply",
"dapplyCollect",
"describe",
"dim",
"distinct",
Expand Down Expand Up @@ -216,6 +217,7 @@ exportMethods("%in%",
"next_day",
"ntile",
"otherwise",
"over",
"percent_rank",
"pmod",
"quarter",
Expand Down Expand Up @@ -315,3 +317,12 @@ export("structField",
"structType.jobj",
"structType.structField",
"print.structType")

exportClasses("WindowSpec")

export("partitionBy",
"rowsBetween",
"rangeBetween")

export("window.partitionBy",
"window.orderBy")
127 changes: 104 additions & 23 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -570,10 +570,17 @@ setMethod("unpersist",

#' Repartition
#'
#' Return a new SparkDataFrame that has exactly numPartitions partitions.
#'
#' The following options for repartition are possible:
#' \itemize{
#' \item{"Option 1"} {Return a new SparkDataFrame partitioned by
#' the given columns into `numPartitions`.}
#' \item{"Option 2"} {Return a new SparkDataFrame that has exactly `numPartitions`.}
#' \item{"Option 3"} {Return a new SparkDataFrame partitioned by the given column(s),
#' using `spark.sql.shuffle.partitions` as number of partitions.}
#'}
#' @param x A SparkDataFrame
#' @param numPartitions The number of partitions to use.
#' @param col The column by which the partitioning will be performed.
#'
#' @family SparkDataFrame functions
#' @rdname repartition
Expand All @@ -586,11 +593,31 @@ setMethod("unpersist",
#' path <- "path/to/file.json"
#' df <- read.json(sqlContext, path)
#' newDF <- repartition(df, 2L)
#' newDF <- repartition(df, numPartitions = 2L)
#' newDF <- repartition(df, col = df$"col1", df$"col2")
#' newDF <- repartition(df, 3L, col = df$"col1", df$"col2")
#'}
setMethod("repartition",
signature(x = "SparkDataFrame", numPartitions = "numeric"),
function(x, numPartitions) {
sdf <- callJMethod(x@sdf, "repartition", numToInt(numPartitions))
signature(x = "SparkDataFrame"),
function(x, numPartitions = NULL, col = NULL, ...) {
if (!is.null(numPartitions) && is.numeric(numPartitions)) {
# number of partitions and columns both are specified
if (!is.null(col) && class(col) == "Column") {
cols <- list(col, ...)
jcol <- lapply(cols, function(c) { c@jc })
sdf <- callJMethod(x@sdf, "repartition", numToInt(numPartitions), jcol)
} else {
# only number of partitions is specified
sdf <- callJMethod(x@sdf, "repartition", numToInt(numPartitions))
}
} else if (!is.null(col) && class(col) == "Column") {
# only columns are specified
cols <- list(col, ...)
jcol <- lapply(cols, function(c) { c@jc })
sdf <- callJMethod(x@sdf, "repartition", jcol)
} else {
stop("Please, specify the number of partitions and/or a column(s)")
}
dataFrame(sdf)
})

Expand Down Expand Up @@ -1126,9 +1153,27 @@ setMethod("summarize",
agg(x, ...)
})

dapplyInternal <- function(x, func, schema) {
packageNamesArr <- serialize(.sparkREnv[[".packages"]],
connection = NULL)

broadcastArr <- lapply(ls(.broadcastNames),
function(name) { get(name, .broadcastNames) })

sdf <- callJStatic(
"org.apache.spark.sql.api.r.SQLUtils",
"dapply",
x@sdf,
serialize(cleanClosure(func), connection = NULL),
packageNamesArr,
broadcastArr,
if (is.null(schema)) { schema } else { schema$jobj })
dataFrame(sdf)
}

#' dapply
#'
#' Apply a function to each partition of a DataFrame.
#' Apply a function to each partition of a SparkDataFrame.
#'
#' @param x A SparkDataFrame
#' @param func A function to be applied to each partition of the SparkDataFrame.
Expand Down Expand Up @@ -1170,21 +1215,57 @@ setMethod("summarize",
setMethod("dapply",
signature(x = "SparkDataFrame", func = "function", schema = "structType"),
function(x, func, schema) {
packageNamesArr <- serialize(.sparkREnv[[".packages"]],
connection = NULL)

broadcastArr <- lapply(ls(.broadcastNames),
function(name) { get(name, .broadcastNames) })

sdf <- callJStatic(
"org.apache.spark.sql.api.r.SQLUtils",
"dapply",
x@sdf,
serialize(cleanClosure(func), connection = NULL),
packageNamesArr,
broadcastArr,
schema$jobj)
dataFrame(sdf)
dapplyInternal(x, func, schema)
})

#' dapplyCollect
#'
#' Apply a function to each partition of a SparkDataFrame and collect the result back
#’ to R as a data.frame.
#'
#' @param x A SparkDataFrame
#' @param func A function to be applied to each partition of the SparkDataFrame.
#' func should have only one parameter, to which a data.frame corresponds
#' to each partition will be passed.
#' The output of func should be a data.frame.
#' @family SparkDataFrame functions
#' @rdname dapply
#' @name dapplyCollect
#' @export
#' @examples
#' \dontrun{
#' df <- createDataFrame (sqlContext, iris)
#' ldf <- dapplyCollect(df, function(x) { x })
#'
#' # filter and add a column
#' df <- createDataFrame (
#' sqlContext,
#' list(list(1L, 1, "1"), list(2L, 2, "2"), list(3L, 3, "3")),
#' c("a", "b", "c"))
#' ldf <- dapplyCollect(
#' df,
#' function(x) {
#' y <- x[x[1] > 1, ]
#' y <- cbind(y, y[1] + 1L)
#' })
#' # the result
#' # a b c d
#' # 2 2 2 3
#' # 3 3 3 4
#' }
setMethod("dapplyCollect",
signature(x = "SparkDataFrame", func = "function"),
function(x, func) {
df <- dapplyInternal(x, func, NULL)

content <- callJMethod(df@sdf, "collect")
# content is a list of items of struct type. Each item has a single field
# which is a serialized data.frame corresponds to one partition of the
# SparkDataFrame.
ldfs <- lapply(content, function(x) { unserialize(x[[1]]) })
ldf <- do.call(rbind, ldfs)
row.names(ldf) <- NULL
ldf
})

############################## RDD Map Functions ##################################
Expand Down Expand Up @@ -1722,8 +1803,8 @@ setMethod("arrange",
#' @export
setMethod("orderBy",
signature(x = "SparkDataFrame", col = "characterOrColumn"),
function(x, col) {
arrange(x, col)
function(x, col, ...) {
arrange(x, col, ...)
})

#' Filter
Expand Down
8 changes: 6 additions & 2 deletions R/pkg/R/RDD.R
Original file line number Diff line number Diff line change
Expand Up @@ -1023,9 +1023,13 @@ setMethod("keyBy",
#' @aliases repartition,RDD
#' @noRd
setMethod("repartition",
signature(x = "RDD", numPartitions = "numeric"),
signature(x = "RDD"),
function(x, numPartitions) {
coalesce(x, numPartitions, TRUE)
if (!is.null(numPartitions) && is.numeric(numPartitions)) {
coalesce(x, numPartitions, TRUE)
} else {
stop("Please, specify the number of partitions")
}
})

#' Return a new RDD that is reduced into numPartitions partitions.
Expand Down
2 changes: 2 additions & 0 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,8 @@ parquetFile <- function(sqlContext, ...) {
#' Create a SparkDataFrame from a text file.
#'
#' Loads a text file and returns a SparkDataFrame with a single string column named "value".
#' If the directory structure of the text files contains partitioning information, those are
#' ignored in the resulting DataFrame.
#' Each line in the text file is a new row in the resulting SparkDataFrame.
#'
#' @param sqlContext SQLContext to use
Expand Down
Loading