Skip to content

Commit 2e69f11

Browse files
committed
Merge branch 'master' of https://github.com/apache/spark into master_nravi
2 parents efd688a + fd0b32c commit 2e69f11

File tree

1,416 files changed

+69225
-17739
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,416 files changed

+69225
-17739
lines changed

.gitignore

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
*~
2+
*.#*
3+
*#*#
24
*.swp
35
*.ipr
46
*.iml
57
*.iws
68
.idea/
9+
.idea_modules/
710
sbt/*.jar
811
.settings
912
.cache
@@ -15,10 +18,11 @@ out/
1518
third_party/libmesos.so
1619
third_party/libmesos.dylib
1720
conf/java-opts
18-
conf/spark-env.sh
19-
conf/streaming-env.sh
20-
conf/log4j.properties
21-
conf/spark-defaults.conf
21+
conf/*.sh
22+
conf/*.cmd
23+
conf/*.properties
24+
conf/*.conf
25+
conf/*.xml
2226
docs/_site
2327
docs/api
2428
target/
@@ -49,10 +53,11 @@ unit-tests.log
4953
/lib/
5054
rat-results.txt
5155
scalastyle.txt
52-
conf/*.conf
56+
scalastyle-output.xml
5357

5458
# For Hive
5559
metastore_db/
5660
metastore/
5761
warehouse/
5862
TempStatsStore/
63+
sql/hive-thriftserver/test_warehouses

.rat-excludes

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,16 +20,19 @@ log4j.properties.template
2020
metrics.properties.template
2121
slaves
2222
spark-env.sh
23+
spark-env.cmd
2324
spark-env.sh.template
2425
log4j-defaults.properties
2526
bootstrap-tooltip.js
2627
jquery-1.11.1.min.js
2728
sorttable.js
29+
.*avsc
2830
.*txt
2931
.*json
3032
.*data
3133
.*log
3234
cloudpickle.py
35+
heapq3.py
3336
join.py
3437
SparkExprTyper.scala
3538
SparkILoop.scala
@@ -55,3 +58,5 @@ dist/*
5558
.*ipr
5659
.*iws
5760
logs
61+
.*scalastyle-output.xml
62+
.*dependency-reduced-pom.xml

CONTRIBUTING.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
## Contributing to Spark
2+
3+
Contributions via GitHub pull requests are gladly accepted from their original
4+
author. Along with any pull requests, please state that the contribution is
5+
your original work and that you license the work to the project under the
6+
project's open source license. Whether or not you state this explicitly, by
7+
submitting any copyrighted material via pull request, email, or other means
8+
you agree to license the material under the project's open source license and
9+
warrant that you have the legal authority to do so.
10+
11+
Please see the [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
12+
for more information.

LICENSE

Lines changed: 305 additions & 3 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 25 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,31 @@
11
# Apache Spark
22

3-
Lightning-Fast Cluster Computing - <http://spark.apache.org/>
3+
Spark is a fast and general cluster computing system for Big Data. It provides
4+
high-level APIs in Scala, Java, and Python, and an optimized engine that
5+
supports general computation graphs for data analysis. It also supports a
6+
rich set of higher-level tools including Spark SQL for SQL and structured
7+
data processing, MLlib for machine learning, GraphX for graph processing,
8+
and Spark Streaming for stream processing.
9+
10+
<http://spark.apache.org/>
411

512

613
## Online Documentation
714

815
You can find the latest Spark documentation, including a programming
9-
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
16+
guide, on the [project web page](http://spark.apache.org/documentation.html).
1017
This README file only contains basic setup instructions.
1118

1219
## Building Spark
1320

14-
Spark is built on Scala 2.10. To build Spark and its example programs, run:
21+
Spark is built using [Apache Maven](http://maven.apache.org/).
22+
To build Spark and its example programs, run:
1523

16-
./sbt/sbt assembly
24+
mvn -DskipTests clean package
1725

1826
(You do not need to do this if you downloaded a pre-built package.)
27+
More detailed documentation is available from the project site, at
28+
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
1929

2030
## Interactive Scala Shell
2131

@@ -62,65 +72,26 @@ Many of the example programs print usage help if no params are given.
6272
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
6373
can be run using:
6474

65-
./sbt/sbt test
75+
./dev/run-tests
76+
77+
Please see the guidance on how to
78+
[run all automated tests](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting).
6679

6780
## A Note About Hadoop Versions
6881

6982
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
7083
storage systems. Because the protocols have changed in different versions of
7184
Hadoop, you must build Spark against the same version that your cluster runs.
72-
You can change the version by setting `-Dhadoop.version` when building Spark.
73-
74-
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
75-
versions without YARN, use:
76-
77-
# Apache Hadoop 1.2.1
78-
$ sbt/sbt -Dhadoop.version=1.2.1 assembly
79-
80-
# Cloudera CDH 4.2.0 with MapReduce v1
81-
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
82-
83-
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
84-
with YARN, also set `-Pyarn`:
85-
86-
# Apache Hadoop 2.0.5-alpha
87-
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
88-
89-
# Cloudera CDH 4.2.0 with MapReduce v2
90-
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
91-
92-
# Apache Hadoop 2.2.X and newer
93-
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
94-
95-
When developing a Spark application, specify the Hadoop version by adding the
96-
"hadoop-client" artifact to your project's dependencies. For example, if you're
97-
using Hadoop 1.2.1 and build your application using SBT, add this entry to
98-
`libraryDependencies`:
99-
100-
"org.apache.hadoop" % "hadoop-client" % "1.2.1"
101-
102-
If your project is built with Maven, add this to your POM file's `<dependencies>` section:
103-
104-
<dependency>
105-
<groupId>org.apache.hadoop</groupId>
106-
<artifactId>hadoop-client</artifactId>
107-
<version>1.2.1</version>
108-
</dependency>
10985

86+
Please refer to the build documentation at
87+
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
88+
for detailed guidance on building for a particular distribution of Hadoop, including
89+
building for particular Hive and Hive Thriftserver distributions. See also
90+
["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html)
91+
for guidance on building a Spark application that works with a particular
92+
distribution.
11093

11194
## Configuration
11295

11396
Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
11497
in the online documentation for an overview on how to configure Spark.
115-
116-
117-
## Contributing to Spark
118-
119-
Contributions via GitHub pull requests are gladly accepted from their original
120-
author. Along with any pull requests, please state that the contribution is
121-
your original work and that you license the work to the project under the
122-
project's open source license. Whether or not you state this explicitly, by
123-
submitting any copyrighted material via pull request, email, or other means
124-
you agree to license the material under the project's open source license and
125-
warrant that you have the legal authority to do so.
126-

assembly/pom.xml

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<parent>
2222
<groupId>org.apache.spark</groupId>
2323
<artifactId>spark-parent</artifactId>
24-
<version>1.1.0-SNAPSHOT</version>
24+
<version>1.2.0-SNAPSHOT</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

@@ -39,9 +39,16 @@
3939
<deb.pkg.name>spark</deb.pkg.name>
4040
<deb.install.path>/usr/share/spark</deb.install.path>
4141
<deb.user>root</deb.user>
42+
<deb.bin.filemode>744</deb.bin.filemode>
4243
</properties>
4344

4445
<dependencies>
46+
<!-- Promote Guava to compile scope in this module so it's included while shading. -->
47+
<dependency>
48+
<groupId>com.google.guava</groupId>
49+
<artifactId>guava</artifactId>
50+
<scope>compile</scope>
51+
</dependency>
4552
<dependency>
4653
<groupId>org.apache.spark</groupId>
4754
<artifactId>spark-core_${scala.binary.version}</artifactId>
@@ -81,6 +88,20 @@
8188

8289
<build>
8390
<plugins>
91+
<plugin>
92+
<groupId>org.apache.maven.plugins</groupId>
93+
<artifactId>maven-deploy-plugin</artifactId>
94+
<configuration>
95+
<skip>true</skip>
96+
</configuration>
97+
</plugin>
98+
<plugin>
99+
<groupId>org.apache.maven.plugins</groupId>
100+
<artifactId>maven-install-plugin</artifactId>
101+
<configuration>
102+
<skip>true</skip>
103+
</configuration>
104+
</plugin>
84105
<!-- Use the shade plugin to create a big JAR with all the dependencies -->
85106
<plugin>
86107
<groupId>org.apache.maven.plugins</groupId>
@@ -112,6 +133,18 @@
112133
<goal>shade</goal>
113134
</goals>
114135
<configuration>
136+
<relocations>
137+
<relocation>
138+
<pattern>com.google</pattern>
139+
<shadedPattern>org.spark-project.guava</shadedPattern>
140+
<includes>
141+
<include>com.google.common.**</include>
142+
</includes>
143+
<excludes>
144+
<exclude>com.google.common.base.Optional**</exclude>
145+
</excludes>
146+
</relocation>
147+
</relocations>
115148
<transformers>
116149
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
117150
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
@@ -162,6 +195,11 @@
162195
<artifactId>spark-hive_${scala.binary.version}</artifactId>
163196
<version>${project.version}</version>
164197
</dependency>
198+
<dependency>
199+
<groupId>org.apache.spark</groupId>
200+
<artifactId>spark-hive-thriftserver_${scala.binary.version}</artifactId>
201+
<version>${project.version}</version>
202+
</dependency>
165203
</dependencies>
166204
</profile>
167205
<profile>
@@ -276,7 +314,7 @@
276314
<user>${deb.user}</user>
277315
<group>${deb.user}</group>
278316
<prefix>${deb.install.path}/bin</prefix>
279-
<filemode>744</filemode>
317+
<filemode>${deb.bin.filemode}</filemode>
280318
</mapper>
281319
</data>
282320
<data>

bagel/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,14 @@
2121
<parent>
2222
<groupId>org.apache.spark</groupId>
2323
<artifactId>spark-parent</artifactId>
24-
<version>1.1.0-SNAPSHOT</version>
24+
<version>1.2.0-SNAPSHOT</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-bagel_2.10</artifactId>
3030
<properties>
31-
<sbt.project.name>bagel</sbt.project.name>
31+
<sbt.project.name>bagel</sbt.project.name>
3232
</properties>
3333
<packaging>jar</packaging>
3434
<name>Spark Project Bagel</name>

bagel/src/main/scala/org/apache/spark/bagel/Bagel.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ object Bagel extends Logging {
7272
var verts = vertices
7373
var msgs = messages
7474
var noActivity = false
75+
var lastRDD: RDD[(K, (V, Array[M]))] = null
7576
do {
7677
logInfo("Starting superstep " + superstep + ".")
7778
val startTime = System.currentTimeMillis
@@ -83,6 +84,10 @@ object Bagel extends Logging {
8384
val superstep_ = superstep // Create a read-only copy of superstep for capture in closure
8485
val (processed, numMsgs, numActiveVerts) =
8586
comp[K, V, M, C](sc, grouped, compute(_, _, aggregated, superstep_), storageLevel)
87+
if (lastRDD != null) {
88+
lastRDD.unpersist(false)
89+
}
90+
lastRDD = processed
8691

8792
val timeTaken = System.currentTimeMillis - startTime
8893
logInfo("Superstep %d took %d s".format(superstep, timeTaken / 1000))

bagel/src/test/scala/org/apache/spark/bagel/BagelSuite.scala

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@ import org.scalatest.time.SpanSugar._
2424
import org.apache.spark._
2525
import org.apache.spark.storage.StorageLevel
2626

27-
import scala.language.postfixOps
28-
2927
class TestVertex(val active: Boolean, val age: Int) extends Vertex with Serializable
3028
class TestMessage(val targetId: String) extends Message[String] with Serializable
3129

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
#!/usr/bin/env bash
2+
3+
#
14
# Licensed to the Apache Software Foundation (ASF) under one or more
25
# contributor license agreements. See the NOTICE file distributed with
36
# this work for additional information regarding copyright ownership.
@@ -12,21 +15,16 @@
1215
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1316
# See the License for the specific language governing permissions and
1417
# limitations under the License.
15-
16-
language: scala
17-
scala:
18-
- "2.10.3"
19-
jdk:
20-
- oraclejdk7
21-
env:
22-
matrix:
23-
- TEST="scalastyle assembly/assembly"
24-
- TEST="catalyst/test sql/test streaming/test mllib/test graphx/test bagel/test"
25-
- TEST=hive/test
26-
cache:
27-
directories:
28-
- $HOME/.m2
29-
- $HOME/.ivy2
30-
- $HOME/.sbt
31-
script:
32-
- "sbt ++$TRAVIS_SCALA_VERSION $TEST"
18+
#
19+
20+
#
21+
# Shell script for starting BeeLine
22+
23+
# Enter posix mode for bash
24+
set -o posix
25+
26+
# Figure out where Spark is installed
27+
FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
28+
29+
CLASS="org.apache.hive.beeline.BeeLine"
30+
exec "$FWDIR/bin/spark-class" $CLASS "$@"

0 commit comments

Comments
 (0)