Skip to content

Commit 1c8efbc

Browse files
committed
Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
Conflicts: project/SparkBuild.scala python/pyspark/context.py
2 parents eb40036 + e6ed13f commit 1c8efbc

File tree

464 files changed

+14259
-7247
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

464 files changed

+14259
-7247
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
11
*~
22
*.swp
3+
*.ipr
34
*.iml
5+
*.iws
46
.idea/
7+
sbt/*.jar
58
.settings
69
.cache
710
/build/

README.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,22 @@ This README file only contains basic setup instructions.
1313
## Building
1414

1515
Spark requires Scala 2.10. The project is built using Simple Build Tool (SBT),
16-
which is packaged with it. To build Spark and its example programs, run:
16+
which can be obtained [here](http://www.scala-sbt.org). If SBT is installed we
17+
will use the system version of sbt otherwise we will attempt to download it
18+
automatically. To build Spark and its example programs, run:
1719

18-
sbt/sbt assembly
20+
./sbt/sbt assembly
1921

2022
Once you've built Spark, the easiest way to start using it is the shell:
2123

22-
./spark-shell
24+
./bin/spark-shell
2325

24-
Or, for the Python API, the Python shell (`./pyspark`).
26+
Or, for the Python API, the Python shell (`./bin/pyspark`).
2527

2628
Spark also comes with several sample programs in the `examples` directory.
27-
To run one of them, use `./run-example <class> <params>`. For example:
29+
To run one of them, use `./bin/run-example <class> <params>`. For example:
2830

29-
./run-example org.apache.spark.examples.SparkLR local[2]
31+
./bin/run-example org.apache.spark.examples.SparkLR local[2]
3032

3133
will run the Logistic Regression example locally on 2 CPUs.
3234

@@ -36,7 +38,13 @@ All of the Spark samples take a `<master>` parameter that is the cluster URL
3638
to connect to. This can be a mesos:// or spark:// URL, or "local" to run
3739
locally with one thread, or "local[N]" to run locally with N threads.
3840

41+
## Running tests
3942

43+
Testing first requires [Building](#building) Spark. Once Spark is built, tests
44+
can be run using:
45+
46+
`./sbt/sbt test`
47+
4048
## A Note About Hadoop Versions
4149

4250
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
@@ -54,7 +62,7 @@ versions without YARN, use:
5462
# Cloudera CDH 4.2.0 with MapReduce v1
5563
$ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly
5664

57-
For Apache Hadoop 2.0.X, 2.1.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
65+
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
5866
with YARN, also set `SPARK_YARN=true`:
5967

6068
# Apache Hadoop 2.0.5-alpha
@@ -63,10 +71,8 @@ with YARN, also set `SPARK_YARN=true`:
6371
# Cloudera CDH 4.2.0 with MapReduce v2
6472
$ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly
6573

66-
When building for Hadoop 2.2.X and newer, you'll need to include the additional `new-yarn` profile:
67-
6874
# Apache Hadoop 2.2.X and newer
69-
$ mvn -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0 -Pnew-yarn
75+
$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
7076

7177
When developing a Spark application, specify the Hadoop version by adding the
7278
"hadoop-client" artifact to your project's dependencies. For example, if you're

assembly/lib/PY4J_LICENSE.txt

Lines changed: 0 additions & 27 deletions
This file was deleted.

assembly/lib/PY4J_VERSION.txt

Lines changed: 0 additions & 1 deletion
This file was deleted.
-101 KB
Binary file not shown.

assembly/lib/net/sf/py4j/py4j/0.7/py4j-0.7.pom

Lines changed: 0 additions & 9 deletions
This file was deleted.

assembly/lib/net/sf/py4j/py4j/maven-metadata-local.xml

Lines changed: 0 additions & 12 deletions
This file was deleted.

assembly/pom.xml

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -41,33 +41,33 @@
4141
<dependencies>
4242
<dependency>
4343
<groupId>org.apache.spark</groupId>
44-
<artifactId>spark-core_2.10</artifactId>
44+
<artifactId>spark-core_${scala.binary.version}</artifactId>
4545
<version>${project.version}</version>
4646
</dependency>
4747
<dependency>
4848
<groupId>org.apache.spark</groupId>
49-
<artifactId>spark-bagel_2.10</artifactId>
49+
<artifactId>spark-bagel_${scala.binary.version}</artifactId>
5050
<version>${project.version}</version>
5151
</dependency>
5252
<dependency>
5353
<groupId>org.apache.spark</groupId>
54-
<artifactId>spark-mllib_2.10</artifactId>
54+
<artifactId>spark-mllib_${scala.binary.version}</artifactId>
5555
<version>${project.version}</version>
5656
</dependency>
5757
<dependency>
5858
<groupId>org.apache.spark</groupId>
59-
<artifactId>spark-repl_2.10</artifactId>
59+
<artifactId>spark-repl_${scala.binary.version}</artifactId>
6060
<version>${project.version}</version>
6161
</dependency>
6262
<dependency>
6363
<groupId>org.apache.spark</groupId>
64-
<artifactId>spark-streaming_2.10</artifactId>
64+
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
6565
<version>${project.version}</version>
6666
</dependency>
6767
<dependency>
6868
<groupId>net.sf.py4j</groupId>
6969
<artifactId>py4j</artifactId>
70-
<version>0.7</version>
70+
<version>0.8.1</version>
7171
</dependency>
7272
</dependencies>
7373

@@ -79,7 +79,7 @@
7979
<artifactId>maven-shade-plugin</artifactId>
8080
<configuration>
8181
<shadedArtifactAttached>false</shadedArtifactAttached>
82-
<outputFile>${project.build.directory}/scala-2.10/${project.artifactId}-${project.version}-hadoop${hadoop.version}.jar</outputFile>
82+
<outputFile>${project.build.directory}/scala-${scala.binary.version}/${project.artifactId}-${project.version}-hadoop${hadoop.version}.jar</outputFile>
8383
<artifactSet>
8484
<includes>
8585
<include>*:*</include>
@@ -108,12 +108,12 @@
108108
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
109109
<resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
110110
</transformer>
111-
</transformers>
112-
<transformers>
113-
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
114111
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
115112
<resource>reference.conf</resource>
116113
</transformer>
114+
<transformer implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
115+
<resource>log4j.properties</resource>
116+
</transformer>
117117
</transformers>
118118
</configuration>
119119
</execution>
@@ -124,11 +124,21 @@
124124

125125
<profiles>
126126
<profile>
127-
<id>hadoop2-yarn</id>
127+
<id>yarn-alpha</id>
128+
<dependencies>
129+
<dependency>
130+
<groupId>org.apache.spark</groupId>
131+
<artifactId>spark-yarn-alpha_${scala.binary.version}</artifactId>
132+
<version>${project.version}</version>
133+
</dependency>
134+
</dependencies>
135+
</profile>
136+
<profile>
137+
<id>yarn</id>
128138
<dependencies>
129139
<dependency>
130140
<groupId>org.apache.spark</groupId>
131-
<artifactId>spark-yarn_2.10</artifactId>
141+
<artifactId>spark-yarn_${scala.binary.version}</artifactId>
132142
<version>${project.version}</version>
133143
</dependency>
134144
</dependencies>

assembly/src/main/assembly/assembly.xml

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -39,23 +39,20 @@
3939
</fileSet>
4040
<fileSet>
4141
<directory>
42-
${project.parent.basedir}/bin/
42+
${project.parent.basedir}/sbin/
4343
</directory>
44-
<outputDirectory>/bin</outputDirectory>
44+
<outputDirectory>/sbin</outputDirectory>
4545
<includes>
4646
<include>**/*</include>
4747
</includes>
4848
</fileSet>
4949
<fileSet>
5050
<directory>
51-
${project.parent.basedir}
51+
${project.parent.basedir}/bin/
5252
</directory>
5353
<outputDirectory>/bin</outputDirectory>
5454
<includes>
55-
<include>run-example*</include>
56-
<include>spark-class*</include>
57-
<include>spark-shell*</include>
58-
<include>spark-executor*</include>
55+
<include>**/*</include>
5956
</includes>
6057
</fileSet>
6158
</fileSets>

bagel/pom.xml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
<dependencies>
3535
<dependency>
3636
<groupId>org.apache.spark</groupId>
37-
<artifactId>spark-core_2.10</artifactId>
37+
<artifactId>spark-core_${scala.binary.version}</artifactId>
3838
<version>${project.version}</version>
3939
</dependency>
4040
<dependency>
@@ -43,18 +43,18 @@
4343
</dependency>
4444
<dependency>
4545
<groupId>org.scalatest</groupId>
46-
<artifactId>scalatest_2.10</artifactId>
46+
<artifactId>scalatest_${scala.binary.version}</artifactId>
4747
<scope>test</scope>
4848
</dependency>
4949
<dependency>
5050
<groupId>org.scalacheck</groupId>
51-
<artifactId>scalacheck_2.10</artifactId>
51+
<artifactId>scalacheck_${scala.binary.version}</artifactId>
5252
<scope>test</scope>
5353
</dependency>
5454
</dependencies>
5555
<build>
56-
<outputDirectory>target/scala-2.10/classes</outputDirectory>
57-
<testOutputDirectory>target/scala-2.10/test-classes</testOutputDirectory>
56+
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
57+
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
5858
<plugins>
5959
<plugin>
6060
<groupId>org.scalatest</groupId>

0 commit comments

Comments
 (0)