Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1164 commits
Select commit Hold shift + click to select a range
0b5d028
[SPARK-6602][Core] Update MapOutputTrackerMasterActor to MapOutputTra…
zsxwing Apr 6, 2015
49f3882
[SPARK-6673] spark-shell.cmd can't start in Windows even when spark w…
tsudukim Apr 6, 2015
9fe4125
SPARK-6569 [STREAMING] Down-grade same-offset message in Kafka stream…
srowen Apr 6, 2015
30363ed
[MLlib] [SPARK-6713] Iterators in columnSimilarities for mapPartition…
Apr 6, 2015
e40ea87
[Minor] [SQL] [SPARK-6729] Minor fix for DriverQuirks get
vlyubin Apr 7, 2015
a0846c4
[SPARK-6716] Change SparkContext.DRIVER_IDENTIFIER from <driver> to d…
JoshRosen Apr 7, 2015
6f0d55d
[SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py
Apr 7, 2015
ae980eb
[SPARK-6736][GraphX][Doc]Example of Graph#aggregateMessages has error
sasakitoa Apr 7, 2015
b65bad6
[SPARK-3591][YARN]fire and forget for YARN cluster mode
WangTaoTheTonic Apr 7, 2015
7162ecf
[SPARK-6733][ Scheduler]Added scala.language.existentials
Apr 7, 2015
2c32bef
Replace use of .size with .length for Arrays
sksamuel Apr 7, 2015
1232215
[SPARK-6750] Upgrade ScalaStyle to 0.7.
rxin Apr 7, 2015
596ba77
[SPARK-6568] spark-shell.cmd --jars option does not accept the jar th…
tsudukim Apr 7, 2015
e6f08fb
Revert "[SPARK-6568] spark-shell.cmd --jars option does not accept th…
mengxr Apr 7, 2015
fc957dc
[SPARK-6720][MLLIB] PySpark MultivariateStatisticalSummary unit test …
Lewuathe Apr 7, 2015
77bcceb
[SPARK-6748] [SQL] Makes QueryPlan.schema a lazy val
liancheng Apr 7, 2015
c83e039
[SPARK-6737] Fix memory leak in OutputCommitCoordinator
JoshRosen Apr 7, 2015
d138aa8
[SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression
Apr 8, 2015
8d2a36c
[SPARK-6754] Remove unnecessary TaskContextHelper
kayousterhout Apr 8, 2015
15e0d2b
[SPARK-6765] Fix test code style for streaming.
rxin Apr 8, 2015
f7e21dd
[SPARK-6506] [pyspark] Do not try to retrieve SPARK_HOME when not nee…
Apr 8, 2015
9d44ddc
[SPARK-6753] Clone SparkConf in ShuffleSuite tests
kayousterhout Apr 8, 2015
8d812f9
[SPARK-6765] Fix test code style for graphx.
rxin Apr 8, 2015
66159c3
[SPARK-6765] Fix test code style for mllib.
rxin Apr 8, 2015
6ada4f6
[SPARK-6781] [SQL] use sqlContext in python shell
Apr 8, 2015
2f482d7
[SPARK-6767][SQL] Fixed Query DSL error in spark sql Readme
Apr 8, 2015
86403f5
[SPARK-5242]: Add --private-ips flag to EC2 script
Apr 8, 2015
55a92ef
[SPARK-4346][SPARK-3596][YARN] Commonize the monitor logic
Apr 8, 2015
9418280
[SQL][minor] remove duplicated resolveGetField and update comment
cloud-fan Apr 8, 2015
7d7384c
[SPARK-6451][SQL] supported code generation for CombineSum
gvramana Apr 9, 2015
891ada5
[SPARK-6696] [SQL] Adds HiveContext.refreshTable to PySpark
liancheng Apr 9, 2015
1b2aab8
[SPARK-6765] Fix test code style for SQL
rxin Apr 9, 2015
2fe0a1a
[SPARK-5654] Integrate SparkR
shivaram Apr 9, 2015
b9c51c0
[SPARK-6343] Doc driver-worker network reqs
parente Apr 9, 2015
53f6bb1
SPARK-4924 addendum. Minor assembly directory fix in load-spark-env-sh
Raschild Apr 9, 2015
470d745
[minor] [examples] Avoid packaging duplicate classes.
Apr 9, 2015
7d92db3
[SPARK-6758]block the right jetty package in log
WangTaoTheTonic Apr 9, 2015
a0411ae
[SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API
yanboliang Apr 9, 2015
9c67049
[Spark-6693][MLlib]add tostring with max lines and width for matrix
hhbyyh Apr 9, 2015
b5c51c8
[SPARK-3074] [PySpark] support groupByKey() with single huge key
davies Apr 10, 2015
e236081
[SPARK-6577] [MLlib] [PySpark] SparseMatrix should be supported in Py…
MechCoder Apr 10, 2015
3290d2d
[SPARK-6211][Streaming] Add Python Kafka API unit test
jerryshao Apr 10, 2015
18ca089
[SPARK-6766][Streaming] Fix issue about StreamingListenerBatchSubmitt…
zsxwing Apr 10, 2015
9f5ed99
[SPARK-6773][Tests]Fix RAT checks still passed issue when download ra…
sisihj Apr 10, 2015
b9baa4c
[SQL] [SPARK-6794] Use kryo-based SparkSqlSerializer for GeneralHashe…
vlyubin Apr 10, 2015
0375134
[SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.
foxik Apr 10, 2015
4740d6a
[SPARK-6216] [PySpark] check the python version in worker
Apr 10, 2015
68ecdb7
[SPARK-6850] [SparkR] use one partition when we need to compare the w…
Apr 10, 2015
23d5f88
[SPARK-6851][SQL] Create new instance for each converted parquet rela…
marmbrus Apr 10, 2015
67d0688
[SQL] [SPARK-6620] Speed up toDF() and rdd() functions by constructin…
vlyubin Apr 10, 2015
95a0759
[Minor][Core] Fix typo
viirya Apr 11, 2015
694aef0
[hotfix] [build] Make sure JAVA_HOME is set for tests.
Apr 11, 2015
3ceb810
[SPARK-6835] [SQL] Fix bug of Hive UDTF in Lateral View (ClassNotFound)
chenghao-intel Apr 11, 2015
198cf2a
[SPARK-6858][SQL] Register Java HashMap for SparkSqlSerializer
viirya Apr 11, 2015
5f7b7cd
[SPARK-6611][SQL] Add support for INTEGER as synonym of INT.
smola Apr 11, 2015
6437e7c
[SPARK-6863] Fix formatting on SQL programming guide.
smola Apr 11, 2015
7dbd371
[Minor][SQL] Fix typo in sql
gchen Apr 11, 2015
2f53588
[SPARK-6199] [SQL] Support CTE in HiveContext and SQLContext
Apr 12, 2015
1f39a61
[Spark-5068][SQL]Fix bug query data when path doesn't exist for HiveC…
lazyman500 Apr 12, 2015
48cc840
[SPARK-6179][SQL] Add token for "SHOW PRINCIPALS role_name" and "SHOW…
pzzs Apr 12, 2015
352a5da
[SPARK-6379][SQL] Support a functon to call user-defined functions re…
maropu Apr 12, 2015
d2383fb
[SQL] Handle special characters in the authority of a Path's URI.
yhuai Apr 12, 2015
6d4e854
[SPARK-6367][SQL] Use the proper data type for those expressions that…
yhuai Apr 12, 2015
5c2844c
[SQL][minor] move `resolveGetField` into a object
cloud-fan Apr 12, 2015
dea5dac
[HOTFIX] Add explicit return types to fix lint errors
JoshRosen Apr 12, 2015
1205f7e
SPARK-6710 GraphX Fixed Wrong initial bias in GraphX SVDPlusPlus
michaelmalak Apr 12, 2015
0cc8fcb
MAINTENANCE: Automated closing of pull requests.
pwendell Apr 12, 2015
5d8f7b9
[SPARK-6677] [SQL] [PySpark] fix cached classes
Apr 12, 2015
e9445b1
[SPARK-6866][Build] Remove duplicated dependency in launcher/pom.xml
gchen Apr 12, 2015
ddc1743
[SPARK-6843][core]Add volatile for the "state"
zhichao-li Apr 12, 2015
6ac8eea
[SPARK-6431][Streaming][Kafka] Error message for partition metadata r…
koeninger Apr 12, 2015
04bcd67
[MINOR] a typo: coalesce
adrian-wang Apr 12, 2015
a1fe59d
[SPARK-6765] Fix test code style for core.
rxin Apr 13, 2015
fc17661
[SPARK-6643][MLLIB] Implement StandardScalerModel missing methods
Lewuathe Apr 13, 2015
d3792f5
[SPARK-4081] [mllib] VectorIndexer
jkbradley Apr 13, 2015
685ddcf
[SPARK-5886][ML] Add StringIndexer as a feature transformer
mengxr Apr 13, 2015
9294044
[SPARK-5885][MLLIB] Add VectorAssembler as a feature transformer
mengxr Apr 13, 2015
68d1faa
[SPARK-6562][SQL] DataFrame.replace
rxin Apr 13, 2015
950645d
[SPARK-6868][YARN] Fix broken container log link on executor page whe…
deanchen Apr 13, 2015
cadd7d7
[SPARK-6762]Fix potential resource leaks in CheckPoint CheckpointWrit…
zhichao-li Apr 13, 2015
14ce3ea
[SPARK-6860][Streaming][WebUI] Fix the possible inconsistency of Stre…
zsxwing Apr 13, 2015
9d117ce
[SPARK-6440][CORE]Handle IPv6 addresses properly when constructing URI
nyaapa Apr 13, 2015
240ea03
[SPARK-6671] Add status command for spark daemons
pchanumolu Apr 13, 2015
202ebf0
[SPARK-6870][Yarn] Catch InterruptedException when yarn application s…
Sephiroth-Lin Apr 13, 2015
b29663e
[SPARK-6352] [SQL] Add DirectParquetOutputCommitter
Apr 13, 2015
77620be
[SPARK-6207] [YARN] [SQL] Adds delegation tokens for metastore to conf.
Apr 13, 2015
c5b0b29
[SPARK-6765] Enable scalastyle on test code.
rxin Apr 13, 2015
6cc5b3e
[SPARK-6662][YARN] Allow variable substitution in spark.yarn.historyS…
Apr 13, 2015
1e340c3
[SPARK-5988][MLlib] add save/load for PowerIterationClusteringModel
yinxusen Apr 13, 2015
85ee0ca
[SPARK-6130] [SQL] support if not exists for insert overwrite into pa…
adrian-wang Apr 13, 2015
3a205bb
[SQL][SPARK-6742]: Don't push down predicates which reference partiti…
Apr 13, 2015
2a55cb4
[SPARK-5972] [MLlib] Cache residuals and gradient in GBT during train…
MechCoder Apr 13, 2015
e63a86a
[SPARK-6872] [SQL] add copy in external sort
adrian-wang Apr 13, 2015
c5602bd
[SPARK-5941] [SQL] Unit Test loads the table `src` twice for leftsemi…
chenghao-intel Apr 13, 2015
c4ab255
[SPARK-5931][CORE] Use consistent naming for time properties
Apr 13, 2015
d7f2c19
[SPARK-6881][SparkR] Changes the checkpoint directory name.
hlin09 Apr 13, 2015
5b8b324
[SPARK-6303][SQL] Remove unnecessary Average in GeneratedAggregate
viirya Apr 14, 2015
4898dfa
[SPARK-6877][SQL] Add code generation support for Min
viirya Apr 14, 2015
435b877
[Spark-4848] Allow different Worker configurations in standalone cluster
Apr 14, 2015
3782e1f
[SQL] [Minor] Fix for SqlApp.scala
scwf Apr 14, 2015
b45059d
[SPARK-5794] [SQL] fix add jar
adrian-wang Apr 14, 2015
0ba3fdd
[Minor][SparkR] Minor refactor and removes redundancy related to clea…
hlin09 Apr 14, 2015
971b95b
[SPARK-5957][ML] better handling of parameters
mengxr Apr 14, 2015
77eeb10
[WIP][HOTFIX][SPARK-4123]: Fix bug in PR dependency (all deps. remove…
Apr 14, 2015
628a72f
[SPARK-6731] Bump version of apache commons-math3
punya Apr 14, 2015
51b306b
SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception
Apr 14, 2015
320bca4
[SPARK-6081] Support fetching http/https uris in driver runner.
tnachen Apr 14, 2015
f63b44a
[SPARK-6894]spark.executor.extraLibraryOptions => spark.executor.extr…
WangTaoTheTonic Apr 14, 2015
dcf8a9f
[CORE] SPARK-6880: Fixed null check when all the dependent stages are…
Apr 14, 2015
25998e4
[SPARK-2033] Automatically cleanup checkpoint
witgo Apr 14, 2015
8f8dc45
SPARK-1706: Allow multiple executors per worker in Standalone mode
CodingCat Apr 14, 2015
b075e4b
[SPARK-6700] [yarn] Re-enable flaky test.
Apr 14, 2015
6adb8bc
[SPARK-6905] Upgrade to snappy-java 1.1.1.7
JoshRosen Apr 14, 2015
6577437
[SPARK-5808] [build] Package pyspark files in sbt assembly.
Apr 14, 2015
4d4b249
[SPARK-6769][YARN][TEST] Usage of the ListenerBus in YarnClusterSuite…
sarutak Apr 14, 2015
a76b921
Revert "[SPARK-6352] [SQL] Add DirectParquetOutputCommitter"
JoshRosen Apr 14, 2015
6de282e
[SPARK-6796][Streaming][WebUI] Add "Active Batches" and "Completed Ba…
zsxwing Apr 14, 2015
9717389
[SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES.
Apr 15, 2015
30a6e0d
[SPARK-5634] [core] Show correct message in HS when no incomplete app…
Apr 15, 2015
6be9189
[SPARK-6871][SQL] WITH clause in CTE can not following another WITH c…
viirya Apr 15, 2015
29aabdd
[HOTFIX] [SPARK-6896] [SQL] fix compile error in hive-thriftserver
adrian-wang Apr 15, 2015
6c5ed8a
SPARK-6861 [BUILD] Scalastyle config prevents building Maven child mo…
srowen Apr 15, 2015
f11288d
[SPARK-6886] [PySpark] fix big closure with shuffle
Apr 15, 2015
b75b307
[SPARK-6730][SQL] Allow using keyword as identifier in OPTIONS
viirya Apr 15, 2015
e3e4e9a
[SPARK-6800][SQL] Update doc for JDBCRelation's columnPartition
viirya Apr 15, 2015
785f955
[SPARK-6887][SQL] ColumnBuilder misses FloatType
yhuai Apr 15, 2015
8584276
[SPARK-6638] [SQL] Improve performance of StringType in SQL
Apr 15, 2015
cf38fe0
[SPARK-6844][SQL] Clean up accumulators used in InMemoryRelation when…
viirya Apr 15, 2015
557a797
[SPARK-6937][MLLIB] Fixed bug in PICExample in which the radius were …
sboeschhuawei Apr 15, 2015
4754e16
[SPARK-6898][SQL] completely support special chars in column names
cloud-fan Apr 15, 2015
585638e
[SPARK-2213] [SQL] sort merge join for spark sql
adrian-wang Apr 15, 2015
d5f1b96
[SPARK-2312] Logging Unhandled messages
isaias Apr 15, 2015
8a53de1
[SPARK-5277][SQL] - SparkSqlSerializer doesn't always register user s…
Apr 15, 2015
52c3439
SPARK-6938: All require statements now have an informative error mess…
Apr 16, 2015
57cd1e8
[SPARK-6893][ML] default pipeline parameter handling in python
mengxr Apr 16, 2015
8370550
[Streaming][minor] Remove additional quote and unneeded imports
jerryshao Apr 16, 2015
6179a94
SPARK-4783 [CORE] System.exit() calls in SparkContext disrupt applica…
srowen Apr 16, 2015
de4fa6b
[SPARK-4194] [core] Make SparkContext initialization exception-safe.
Apr 16, 2015
3ae37b9
[SPARK-6694][SQL]SparkSQL CLI must be able to specify an option --dat…
adachij2002 Apr 16, 2015
ef3fb80
[SPARK-6934][Core] Use 'spark.akka.askTimeout' for the ask timeout
zsxwing Apr 16, 2015
55f553a
[SPARK-6855] [SPARKR] Set R includes to get the right collate order.
shivaram Apr 16, 2015
04e44b3
[SPARK-4897] [PySpark] Python 3 support
Apr 16, 2015
5fe4343
SPARK-6927 [SQL] Sorting Error when codegen on
Apr 17, 2015
6183b5e
[SPARK-6911] [SQL] improve accessor for nested types
Apr 17, 2015
d966086
[SQL][Minor] Fix foreachUp of treenode
scwf Apr 17, 2015
1e43851
[SPARK-6899][SQL] Fix type mismatch when using codegen with Average o…
viirya Apr 17, 2015
e5949c2
[SPARK-6966][SQL] Use correct ClassLoader for JDBC Driver
marmbrus Apr 17, 2015
8220d52
[SPARK-6972][SQL] Add Coalesce to DataFrame
marmbrus Apr 17, 2015
f7a2564
SPARK-6846 [WEBUI] Stage kill URL easy to accidentally trigger and po…
srowen Apr 17, 2015
4527761
[SPARK-6046] [core] Reorganize deprecated config support in SparkConf.
Apr 17, 2015
f6a9a57
[SPARK-6952] Handle long args when detecting PID reuse
Apr 17, 2015
dc48ba9
[SPARK-6604][PySpark]Specify ip of python server scoket
Sephiroth-Lin Apr 17, 2015
c84d916
[SPARK-6957] [SPARK-6958] [SQL] improve API compatibility to pandas
Apr 17, 2015
50ab8a6
[SPARK-2669] [yarn] Distribute client configuration to AM.
Apr 17, 2015
a83571a
[SPARK-6113] [ml] Stabilize DecisionTree API
jkbradley Apr 17, 2015
59e206d
[SPARK-6807] [SparkR] Merge recent SparkR-pkg changes
Apr 17, 2015
d305e68
SPARK-6988 : Fix documentation regarding DataFrames using the Java API
Apr 17, 2015
a452c59
Minor fix to SPARK-6958: Improve Python docstring for DataFrame.sort.
rxin Apr 17, 2015
c5ed510
[SPARK-6703][Core] Provide a way to discover existing SparkContext's
Apr 18, 2015
6fbeb82
[SPARK-6350][Mesos] Make mesosExecutorCores configurable in mesos "fi…
jongyoul Apr 18, 2015
1991337
[SPARK-5933] [core] Move config deprecation warnings to SparkConf.
Apr 18, 2015
d850b4b
[SPARK-6975][Yarn] Fix argument validation error
jerryshao Apr 18, 2015
5f095d5
SPARK-6992 : Fix documentation example for Spark SQL on StructType
Apr 18, 2015
327ebf0
[core] [minor] Make sure ConnectionManager stops.
Apr 18, 2015
28683b4
[SPARK-6219] Reuse pep8.py
nchammas Apr 18, 2015
729885e
Fixed doc
gaurav324 Apr 19, 2015
8fbd45c
SPARK-6993 : Add default min, max methods for JavaDoubleRDD
Apr 19, 2015
0424da6
[SPARK-6963][CORE]Flaky test: o.a.s.ContextCleanerSuite automatically…
witgo Apr 19, 2015
fa73da0
[SPARK-6998][MLlib] Make StreamingKMeans 'Serializable'
zsxwing Apr 20, 2015
d8e1b7b
[SPARK-6983][Streaming] Update ReceiverTrackerActor to use the new Rp…
zsxwing Apr 20, 2015
c776ee8
[SPARK-6979][Streaming] Replace JobScheduler.eventActor and JobGenera…
zsxwing Apr 20, 2015
6fe690d
[doc][mllib] Fix typo of the page title in Isotonic regression documents
dobashim Apr 20, 2015
1be2070
[SPARK-5924] Add the ability to specify withMean or withStd parameter…
jrabary Apr 20, 2015
968ad97
[SPARK-7003] Improve reliability of connection failure detection betw…
aarondav Apr 20, 2015
7717661
[SPARK-6661] Python type errors should print type, not object
31z4 Apr 20, 2015
1ebceaa
[Minor][MLlib] Incorrect path to test data is used in DecisionTreeExa…
viirya Apr 20, 2015
97fda73
fixed doc
ericchiang Apr 20, 2015
517bdf3
[doc][streaming] Fixed broken link in mllib section
BenFradet Apr 20, 2015
ce7ddab
[SPARK-6368][SQL] Build a specialized serializer for Exchange operator.
yhuai Apr 21, 2015
c736220
[SPARK-6635][SQL] DataFrame.withColumn should replace columns with id…
viirya Apr 21, 2015
8136810
[SPARK-6490][Core] Add spark.rpc.* and deprecate spark.akka.*
zsxwing Apr 21, 2015
ab9128f
[SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression
Apr 21, 2015
1f2f723
[SPARK-5990] [MLLIB] Model import/export for IsotonicRegression
yanboliang Apr 21, 2015
5fea3e5
[SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOver…
Apr 21, 2015
c035c0f
[SPARK-5360] [SPARK-6606] Eliminate duplicate objects in serialized C…
kayousterhout Apr 21, 2015
c25ca7c
SPARK-3276 Added a new configuration spark.streaming.minRememberDuration
emres Apr 21, 2015
45c47fa
[SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix
MechCoder Apr 21, 2015
04bf34e
[SPARK-7011] Build(compilation) fails with scala 2.11 option, because…
ScrapCodes Apr 21, 2015
2e8c6ca
[SPARK-6994] Allow to fetch field values by name in sql.Row
Apr 21, 2015
03fd921
[SQL][minor] make it more clear that we only need to re-throw GetFiel…
cloud-fan Apr 21, 2015
6265cba
[SPARK-6969][SQL] Refresh the cached table when REFRESH TABLE is used
yhuai Apr 21, 2015
2a24bf9
[SPARK-6996][SQL] Support map types in java beans
Apr 21, 2015
7662ec2
[SPARK-5817] [SQL] Fix bug of udtf with column names
chenghao-intel Apr 21, 2015
f83c0f1
[SPARK-3386] Share and reuse SerializerInstances in shuffle paths
JoshRosen Apr 21, 2015
a70e849
[minor] [build] Set java options when generating mima ignores.
Apr 21, 2015
7fe6142
[SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls
MechCoder Apr 21, 2015
686dd74
[SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark
mengxr Apr 21, 2015
ae036d0
[Minor][MLLIB] Fix a minor formatting bug in toString method in Node.…
Apr 21, 2015
b063a61
Avoid warning message about invalid refuse_seconds value in Mesos >=0…
Apr 22, 2015
e72c16e
[SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races.
Apr 22, 2015
3134c3f
[SPARK-6953] [PySpark] speed up python tests
rxin Apr 22, 2015
41ef78a
Closes #5427
rxin Apr 22, 2015
a0761ec
[SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XX…
texasmichelle Apr 22, 2015
3a3f710
[SPARK-6490][Docs] Add docs for rpc configurations
zsxwing Apr 22, 2015
70f9f8f
[MINOR] Comment improvements in ExternalSorter.
pwendell Apr 22, 2015
607eff0
[SPARK-6113] [ML] Small cleanups after original tree API PR
jkbradley Apr 22, 2015
bdc5c16
[SPARK-6889] [DOCS] CONTRIBUTING.md updates to accompany contribution…
srowen Apr 22, 2015
33b8562
[SPARK-7052][Core] Add ThreadUtils and move thread methods from Utils…
zsxwing Apr 22, 2015
cdf0328
[SQL] Rename some apply functions.
rxin Apr 22, 2015
fbe7106
[SPARK-7039][SQL]JDBCRDD: Add support on type NVARCHAR
szheng79 Apr 22, 2015
baf865d
[SPARK-7059][SQL] Create a DataFrame join API to facilitate equijoin.
rxin Apr 22, 2015
f4f3998
[SPARK-6827] [MLLIB] Wrap FPGrowthModel.freqItemsets and make it cons…
yanboliang Apr 23, 2015
04525c0
[SPARK-6967] [SQL] fix date type convertion in jdbcrdd
adrian-wang Apr 23, 2015
b69c4f9
Disable flaky test: ReceiverSuite "block generator throttling".
rxin Apr 23, 2015
1b85e08
[MLlib] UnaryTransformer nullability should not depend on PrimitiveType.
rxin Apr 23, 2015
d206860
[SPARK-7066][MLlib] VectorAssembler should use NumericType not Native…
rxin Apr 23, 2015
03e85b4
[SPARK-7046] Remove InputMetrics from BlockResult
kayousterhout Apr 23, 2015
d9e70f3
[HOTFIX][SQL] Fix broken cached test
viirya Apr 23, 2015
2d33323
[MLlib] Add support for BooleanType to VectorAssembler.
rxin Apr 23, 2015
29163c5
[SPARK-7068][SQL] Remove PrimitiveType
rxin Apr 23, 2015
f60bece
[SPARK-7069][SQL] Rename NativeType -> AtomicType.
rxin Apr 23, 2015
a7d65d3
[HOTFIX] [SQL] Fix compilation for scala 2.11.
ScrapCodes Apr 23, 2015
975f53e
[minor][streaming]fixed scala string interpolation error
Apr 23, 2015
cc48e63
[SPARK-7044] [SQL] Fix the deadlock in script transformation
chenghao-intel Apr 23, 2015
534f2a4
[SPARK-6752][Streaming] Allow StreamingContext to be recreated from c…
tdas Apr 23, 2015
c1213e6
[SPARK-7055][SQL]Use correct ClassLoader for JDBC Driver in JDBCRDD.g…
Apr 23, 2015
6afde2c
[SPARK-7058] Include RDD deserialization time in "task deserializatio…
JoshRosen Apr 23, 2015
3e91cc2
[SPARK-7085][MLlib] Fix miniBatchFraction parameter in train method c…
Apr 23, 2015
baa83a9
[SPARK-6879] [HISTORYSERVER] check if app is completed before clean i…
WangTaoTheTonic Apr 23, 2015
6d0749c
[SPARK-7087] [BUILD] Fix path issue change version script
Apr 23, 2015
1ed46a6
[SPARK-7070] [MLLIB] LDA.setBeta should call setTopicConcentration.
mengxr Apr 23, 2015
6220d93
[SQL] Break dataTypes.scala into multiple files.
rxin Apr 23, 2015
73db132
[SPARK-6818] [SPARKR] Support column deletion in SparkR DataFrame API.
Apr 23, 2015
336f7f5
[SPARK-7037] [CORE] Inconsistent behavior for non-spark config proper…
Apr 24, 2015
2d010f7
[SPARK-7060][SQL] Add alias function to python dataframe
yhuai Apr 24, 2015
67bccbd
Update sql-programming-guide.md
kgeis Apr 24, 2015
d3a302d
[SQL] Fixed expression data type matching.
rxin Apr 24, 2015
4c722d7
Fixed a typo from the previous commit.
rxin Apr 24, 2015
8509519
[SPARK-5894] [ML] Add polynomial mapper
yinxusen Apr 24, 2015
78b39c7
[SPARK-7115] [MLLIB] skip the very first 1 in poly expansion
mengxr Apr 24, 2015
6e57d57
[SPARK-6528] [ML] Add IDF transformer
yinxusen Apr 24, 2015
ebb77b2
[SPARK-7033] [SPARKR] Clean usage of split. Use partition instead whe…
Apr 24, 2015
caf0136
[SPARK-6852] [SPARKR] Accept numeric as numPartitions in SparkR.
Apr 24, 2015
438859e
[SPARK-6122] [CORE] Upgrade tachyon-client version to 0.6.3
calvinjia Apr 24, 2015
d874f8b
[PySpark][Minor] Update sql example, so that can read file correctly
Sephiroth-Lin Apr 25, 2015
59b7cfc
[SPARK-7136][Docs] Spark SQL and DataFrame Guide fix example file and…
dbsiegel Apr 25, 2015
6ea08f3
some comments
scwf Apr 25, 2015
aa10046
comment for parser
scwf Apr 25, 2015
65fb190
comment for hiveql
scwf Apr 25, 2015
d40318d
comment for sqlcontext
scwf Apr 25, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
*.iml
*.iws
*.pyc
*.pyo
.idea/
.idea_modules/
build/*.jar
Expand Down Expand Up @@ -62,6 +63,8 @@ ec2/lib/
rat-results.txt
scalastyle.txt
scalastyle-output.xml
R-unit-tests.log
R/unit-tests.out

# For Hive
metastore_db/
Expand Down
4 changes: 4 additions & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
target
cache
.gitignore
.gitattributes
.project
Expand All @@ -18,6 +19,7 @@ fairscheduler.xml.template
spark-defaults.conf.template
log4j.properties
log4j.properties.template
metrics.properties
metrics.properties.template
slaves
slaves.template
Expand Down Expand Up @@ -65,3 +67,5 @@ logs
.*scalastyle-output.xml
.*dependency-reduced-pom.xml
known_translations
DESCRIPTION
NAMESPACE
22 changes: 13 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
## Contributing to Spark

Contributions via GitHub pull requests are gladly accepted from their original
author. Along with any pull requests, please state that the contribution is
your original work and that you license the work to the project under the
project's open source license. Whether or not you state this explicitly, by
submitting any copyrighted material via pull request, email, or other means
you agree to license the material under the project's open source license and
warrant that you have the legal authority to do so.
*Before opening a pull request*, review the
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
- Is the change being proposed clearly explained and motivated?

Please see the [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
for more information.
When you contribute code, you affirm that the contribution is your original work and that you
license the work to the project under the project's open source license. Whether or not you
state this explicitly, by submitting any copyrighted material via pull request, email, or
other means you agree to license the material under the project's open source license and
warrant that you have the legal authority to do so.
16 changes: 16 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -771,6 +771,22 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

========================================================================
For TestTimSort (core/src/test/java/org/apache/spark/util/collection/TestTimSort.java):
========================================================================
Copyright (C) 2015 Stijn de Gouw

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

========================================================================
For LimitedInputStream
Expand Down
6 changes: 6 additions & 0 deletions R/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*.o
*.so
*.Rd
lib
pkg/man
pkg/html
12 changes: 12 additions & 0 deletions R/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# SparkR Documentation

SparkR documentation is generated using in-source comments annotated using using
`roxygen2`. After making changes to the documentation, to generate man pages,
you can run the following from an R console in the SparkR home directory

library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))

You can verify if your changes are good by running

R CMD check pkg/
67 changes: 67 additions & 0 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# R on Spark

SparkR is an R package that provides a light-weight frontend to use Spark from R.

### SparkR development

#### Build Spark

Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-PsparkR` profile to build the R package. For example to use the default Hadoop versions you can run
```
build/mvn -DskipTests -Psparkr package
```

#### Running sparkR

You can start using SparkR by launching the SparkR shell with

./bin/sparkR

The `sparkR` script automatically creates a SparkContext with Spark by default in
local mode. To specify the Spark master of a cluster for the automatically created
SparkContext, you can run

./bin/sparkR --master "local[2]"

To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR`

#### Using SparkR from RStudio

If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```
# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
To run one of them, use `./bin/sparkR <filename> <args>`. For example:

./bin/sparkR examples/src/main/r/pi.R local[2]

You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh

### Running on YARN
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
```
export YARN_CONF_DIR=/etc/hadoop/conf
./bin/spark-submit --master yarn examples/src/main/r/pi.R 4
```
13 changes: 13 additions & 0 deletions R/WINDOWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Building SparkR on Windows

To build SparkR on Windows, the following steps are required

1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
include Rtools and R in `PATH`.
2. Install
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
`JAVA_HOME` in the system environment variables.
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
directory in Maven in `PATH`.
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`
46 changes: 46 additions & 0 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Script to create API docs for SparkR
# This requires `devtools` and `knitr` to be installed on the machine.

# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html

# Figure out where the script is
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
pushd $FWDIR

# Generate Rd file
Rscript -e 'library(devtools); devtools::document(pkg="./pkg", roclets=c("rd"))'

# Install the package
./install-dev.sh

# Now create HTML files

# knit_rd puts html in current working directory
mkdir -p pkg/html
pushd pkg/html

Rscript -e 'library(SparkR, lib.loc="../../lib"); library(knitr); knit_rd("SparkR")'

popd

popd
27 changes: 27 additions & 0 deletions R/install-dev.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
@echo off

rem
rem Licensed to the Apache Software Foundation (ASF) under one or more
rem contributor license agreements. See the NOTICE file distributed with
rem this work for additional information regarding copyright ownership.
rem The ASF licenses this file to You under the Apache License, Version 2.0
rem (the "License"); you may not use this file except in compliance with
rem the License. You may obtain a copy of the License at
rem
rem http://www.apache.org/licenses/LICENSE-2.0
rem
rem Unless required by applicable law or agreed to in writing, software
rem distributed under the License is distributed on an "AS IS" BASIS,
rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
rem See the License for the specific language governing permissions and
rem limitations under the License.
rem

rem Install development version of SparkR
rem

set SPARK_HOME=%~dp0..

MKDIR %SPARK_HOME%\R\lib

R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\
36 changes: 36 additions & 0 deletions R/install-dev.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This scripts packages the SparkR source files (R and C files) and
# creates a package that can be loaded in R. The package is by default installed to
# $FWDIR/lib and the package can be loaded by using the following command in R:
#
# library(SparkR, lib.loc="$FWDIR/lib")
#
# NOTE(shivaram): Right now we use $SPARK_HOME/R/lib to be the installation directory
# to load the SparkR package on the worker nodes.


FWDIR="$(cd `dirname $0`; pwd)"
LIB_DIR="$FWDIR/lib"

mkdir -p $LIB_DIR

# Install R
R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/
28 changes: 28 additions & 0 deletions R/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the file target/unit-tests.log
log4j.rootCategory=INFO, file
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.append=true
log4j.appender.file.file=R-unit-tests.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n

# Ignore messages below warning level from Jetty, because it's a bit verbose
log4j.logger.org.eclipse.jetty=WARN
org.eclipse.jetty.LEVEL=WARN
35 changes: 35 additions & 0 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Package: SparkR
Type: Package
Title: R frontend for Spark
Version: 1.4.0
Date: 2013-09-09
Author: The Apache Software Foundation
Maintainer: Shivaram Venkataraman <[email protected]>
Imports:
methods
Depends:
R (>= 3.0),
methods,
Suggests:
testthat
Description: R frontend for Spark
License: Apache License (== 2.0)
Collate:
'generics.R'
'jobj.R'
'RDD.R'
'pairRDD.R'
'schema.R'
'column.R'
'group.R'
'DataFrame.R'
'SQLContext.R'
'backend.R'
'broadcast.R'
'client.R'
'context.R'
'deserialize.R'
'serialize.R'
'sparkR.R'
'utils.R'
'zzz.R'
Loading