Skip to content

Commit 142a857

Browse files
committed
merge master
2 parents d58a087 + 81fcdd2 commit 142a857

File tree

747 files changed

+17736
-4490
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

747 files changed

+17736
-4490
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ conf/spark-env.sh
1919
conf/streaming-env.sh
2020
conf/log4j.properties
2121
conf/spark-defaults.conf
22+
conf/hive-site.xml
2223
docs/_site
2324
docs/api
2425
target/
@@ -56,3 +57,4 @@ metastore_db/
5657
metastore/
5758
warehouse/
5859
TempStatsStore/
60+
sql/hive-thriftserver/test_warehouses

.rat-excludes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ slaves
2222
spark-env.sh
2323
spark-env.sh.template
2424
log4j-defaults.properties
25+
bootstrap-tooltip.js
26+
jquery-1.11.1.min.js
2527
sorttable.js
2628
.*txt
2729
.*json

LICENSE

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,7 @@ Written by Pavel Binko, Dino Ferrero Merlino, Wolfgang Hoschek, Tony Johnson, An
442442

443443

444444
========================================================================
445-
Fo SnapTree:
445+
For SnapTree:
446446
========================================================================
447447

448448
SNAPTREE LICENSE
@@ -482,6 +482,24 @@ OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
482482
SUCH DAMAGE.
483483

484484

485+
========================================================================
486+
For Timsort (core/src/main/java/org/apache/spark/util/collection/Sorter.java):
487+
========================================================================
488+
Copyright (C) 2008 The Android Open Source Project
489+
490+
Licensed under the Apache License, Version 2.0 (the "License");
491+
you may not use this file except in compliance with the License.
492+
You may obtain a copy of the License at
493+
494+
http://www.apache.org/licenses/LICENSE-2.0
495+
496+
Unless required by applicable law or agreed to in writing, software
497+
distributed under the License is distributed on an "AS IS" BASIS,
498+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
499+
See the License for the specific language governing permissions and
500+
limitations under the License.
501+
502+
485503
========================================================================
486504
BSD-style licenses
487505
========================================================================

README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
# Apache Spark
22

3-
Lightning-Fast Cluster Computing - <http://spark.apache.org/>
3+
Spark is a fast and general cluster computing system for Big Data. It provides
4+
high-level APIs in Scala, Java, and Python, and an optimized engine that
5+
supports general computation graphs for data analysis. It also supports a
6+
rich set of higher-level tools including Spark SQL for SQL and structured
7+
data processing, MLLib for machine learning, GraphX for graph processing,
8+
and Spark Streaming.
9+
10+
<http://spark.apache.org/>
411

512

613
## Online Documentation
@@ -69,29 +76,28 @@ can be run using:
6976
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
7077
storage systems. Because the protocols have changed in different versions of
7178
Hadoop, you must build Spark against the same version that your cluster runs.
72-
You can change the version by setting the `SPARK_HADOOP_VERSION` environment
73-
when building Spark.
79+
You can change the version by setting `-Dhadoop.version` when building Spark.
7480

7581
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
7682
versions without YARN, use:
7783

7884
# Apache Hadoop 1.2.1
79-
$ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
85+
$ sbt/sbt -Dhadoop.version=1.2.1 assembly
8086

8187
# Cloudera CDH 4.2.0 with MapReduce v1
82-
$ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly
88+
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
8389

8490
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
85-
with YARN, also set `SPARK_YARN=true`:
91+
with YARN, also set `-Pyarn`:
8692

8793
# Apache Hadoop 2.0.5-alpha
88-
$ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
94+
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
8995

9096
# Cloudera CDH 4.2.0 with MapReduce v2
91-
$ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly
97+
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
9298

9399
# Apache Hadoop 2.2.X and newer
94-
$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
100+
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
95101

96102
When developing a Spark application, specify the Hadoop version by adding the
97103
"hadoop-client" artifact to your project's dependencies. For example, if you're

assembly/pom.xml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,14 @@
3232
<packaging>pom</packaging>
3333

3434
<properties>
35+
<sbt.project.name>assembly</sbt.project.name>
3536
<spark.jar.dir>scala-${scala.binary.version}</spark.jar.dir>
3637
<spark.jar.basename>spark-assembly-${project.version}-hadoop${hadoop.version}.jar</spark.jar.basename>
3738
<spark.jar>${project.build.directory}/${spark.jar.dir}/${spark.jar.basename}</spark.jar>
3839
<deb.pkg.name>spark</deb.pkg.name>
3940
<deb.install.path>/usr/share/spark</deb.install.path>
4041
<deb.user>root</deb.user>
42+
<deb.bin.filemode>744</deb.bin.filemode>
4143
</properties>
4244

4345
<dependencies>
@@ -163,6 +165,16 @@
163165
</dependency>
164166
</dependencies>
165167
</profile>
168+
<profile>
169+
<id>hive-thriftserver</id>
170+
<dependencies>
171+
<dependency>
172+
<groupId>org.apache.spark</groupId>
173+
<artifactId>spark-hive-thriftserver_${scala.binary.version}</artifactId>
174+
<version>${project.version}</version>
175+
</dependency>
176+
</dependencies>
177+
</profile>
166178
<profile>
167179
<id>spark-ganglia-lgpl</id>
168180
<dependencies>
@@ -275,7 +287,7 @@
275287
<user>${deb.user}</user>
276288
<group>${deb.user}</group>
277289
<prefix>${deb.install.path}/bin</prefix>
278-
<filemode>744</filemode>
290+
<filemode>${deb.bin.filemode}</filemode>
279291
</mapper>
280292
</data>
281293
<data>

bagel/pom.xml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-bagel_2.10</artifactId>
30+
<properties>
31+
<sbt.project.name>bagel</sbt.project.name>
32+
</properties>
3033
<packaging>jar</packaging>
3134
<name>Spark Project Bagel</name>
3235
<url>http://spark.apache.org/</url>

bagel/src/main/scala/org/apache/spark/bagel/Bagel.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ object Bagel extends Logging {
7272
var verts = vertices
7373
var msgs = messages
7474
var noActivity = false
75+
var lastRDD: RDD[(K, (V, Array[M]))] = null
7576
do {
7677
logInfo("Starting superstep " + superstep + ".")
7778
val startTime = System.currentTimeMillis
@@ -83,6 +84,10 @@ object Bagel extends Logging {
8384
val superstep_ = superstep // Create a read-only copy of superstep for capture in closure
8485
val (processed, numMsgs, numActiveVerts) =
8586
comp[K, V, M, C](sc, grouped, compute(_, _, aggregated, superstep_), storageLevel)
87+
if (lastRDD != null) {
88+
lastRDD.unpersist(false)
89+
}
90+
lastRDD = processed
8691

8792
val timeTaken = System.currentTimeMillis - startTime
8893
logInfo("Superstep %d took %d s".format(superstep, timeTaken / 1000))

bin/beeline

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#!/usr/bin/env bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
# Figure out where Spark is installed
21+
FWDIR="$(cd `dirname $0`/..; pwd)"
22+
23+
# Find the java binary
24+
if [ -n "${JAVA_HOME}" ]; then
25+
RUNNER="${JAVA_HOME}/bin/java"
26+
else
27+
if [ `command -v java` ]; then
28+
RUNNER="java"
29+
else
30+
echo "JAVA_HOME is not set" >&2
31+
exit 1
32+
fi
33+
fi
34+
35+
# Compute classpath using external script
36+
classpath_output=$($FWDIR/bin/compute-classpath.sh)
37+
if [[ "$?" != "0" ]]; then
38+
echo "$classpath_output"
39+
exit 1
40+
else
41+
CLASSPATH=$classpath_output
42+
fi
43+
44+
CLASS="org.apache.hive.beeline.BeeLine"
45+
exec "$RUNNER" -cp "$CLASSPATH" $CLASS "$@"

bin/compute-classpath.sh

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ if [ -n "$SPARK_PREPEND_CLASSES" ]; then
5252
CLASSPATH="$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SCALA_VERSION/classes"
5353
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes"
5454
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes"
55+
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive-thriftserver/target/scala-$SCALA_VERSION/classes"
5556
CLASSPATH="$CLASSPATH:$FWDIR/yarn/stable/target/scala-$SCALA_VERSION/classes"
5657
fi
5758

@@ -81,10 +82,10 @@ ASSEMBLY_JAR=$(ls "$assembly_folder"/spark-assembly*hadoop*.jar 2>/dev/null)
8182
# Verify that versions of java used to build the jars and run Spark are compatible
8283
jar_error_check=$("$JAR_CMD" -tf "$ASSEMBLY_JAR" nonexistent/class/path 2>&1)
8384
if [[ "$jar_error_check" =~ "invalid CEN header" ]]; then
84-
echo "Loading Spark jar with '$JAR_CMD' failed. "
85-
echo "This is likely because Spark was compiled with Java 7 and run "
86-
echo "with Java 6. (see SPARK-1703). Please use Java 7 to run Spark "
87-
echo "or build Spark with Java 6."
85+
echo "Loading Spark jar with '$JAR_CMD' failed. " 1>&2
86+
echo "This is likely because Spark was compiled with Java 7 and run " 1>&2
87+
echo "with Java 6. (see SPARK-1703). Please use Java 7 to run Spark " 1>&2
88+
echo "or build Spark with Java 6." 1>&2
8889
exit 1
8990
fi
9091

bin/pyspark

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ export SPARK_HOME="$FWDIR"
2626
SCALA_VERSION=2.10
2727

2828
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
29-
echo "Usage: ./bin/pyspark [options]"
29+
echo "Usage: ./bin/pyspark [options]" 1>&2
3030
$FWDIR/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
3131
exit 0
3232
fi
@@ -36,8 +36,8 @@ if [ ! -f "$FWDIR/RELEASE" ]; then
3636
# Exit if the user hasn't compiled Spark
3737
ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*.jar >& /dev/null
3838
if [[ $? != 0 ]]; then
39-
echo "Failed to find Spark assembly in $FWDIR/assembly/target" >&2
40-
echo "You need to build Spark before running this program" >&2
39+
echo "Failed to find Spark assembly in $FWDIR/assembly/target" 1>&2
40+
echo "You need to build Spark before running this program" 1>&2
4141
exit 1
4242
fi
4343
fi

0 commit comments

Comments
 (0)