@@ -13,16 +13,19 @@ and Spark Streaming for stream processing.
1313## Online Documentation
1414
1515You can find the latest Spark documentation, including a programming
16- guide, on the project webpage at < http://spark.apache.org/documentation.html > .
16+ guide, on the [ project web page ] ( http://spark.apache.org/documentation.html ) .
1717This README file only contains basic setup instructions.
1818
1919## Building Spark
2020
21- Spark is built on Scala 2.10. To build Spark and its example programs, run:
21+ Spark is built using [ Apache Maven] ( http://maven.apache.org/ ) .
22+ To build Spark and its example programs, run:
2223
23- ./sbt/sbt assembly
24+ mvn -DskipTests clean package
2425
2526(You do not need to do this if you downloaded a pre-built package.)
27+ More detailed documentation is available from the project site, at
28+ [ "Building Spark"] ( http://spark.apache.org/docs/latest/building-spark.html ) .
2629
2730## Interactive Scala Shell
2831
@@ -71,73 +74,24 @@ can be run using:
7174
7275 ./dev/run-tests
7376
77+ Please see the guidance on how to
78+ [ run all automated tests] ( https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting ) .
79+
7480## A Note About Hadoop Versions
7581
7682Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
7783storage systems. Because the protocols have changed in different versions of
7884Hadoop, you must build Spark against the same version that your cluster runs.
79- You can change the version by setting ` -Dhadoop.version ` when building Spark.
80-
81- For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
82- versions without YARN, use:
83-
84- # Apache Hadoop 1.2.1
85- $ sbt/sbt -Dhadoop.version=1.2.1 assembly
86-
87- # Cloudera CDH 4.2.0 with MapReduce v1
88- $ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
89-
90- For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
91- with YARN, also set ` -Pyarn ` :
92-
93- # Apache Hadoop 2.0.5-alpha
94- $ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
95-
96- # Cloudera CDH 4.2.0 with MapReduce v2
97- $ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
98-
99- # Apache Hadoop 2.2.X and newer
100- $ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
101-
102- When developing a Spark application, specify the Hadoop version by adding the
103- "hadoop-client" artifact to your project's dependencies. For example, if you're
104- using Hadoop 1.2.1 and build your application using SBT, add this entry to
105- ` libraryDependencies ` :
106-
107- "org.apache.hadoop" % "hadoop-client" % "1.2.1"
10885
109- If your project is built with Maven, add this to your POM file's ` <dependencies> ` section:
110-
111- <dependency>
112- <groupId>org.apache.hadoop</groupId>
113- <artifactId>hadoop-client</artifactId>
114- <version>1.2.1</version>
115- </dependency>
116-
117-
118- ## A Note About Thrift JDBC server and CLI for Spark SQL
119-
120- Spark SQL supports Thrift JDBC server and CLI.
121- See sql-programming-guide.md for more information about using the JDBC server and CLI.
122- You can use those features by setting ` -Phive ` when building Spark as follows.
123-
124- $ sbt/sbt -Phive assembly
86+ Please refer to the build documentation at
87+ [ "Specifying the Hadoop Version"] ( http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version )
88+ for detailed guidance on building for a particular distribution of Hadoop, including
89+ building for particular Hive and Hive Thriftserver distributions. See also
90+ [ "Third Party Hadoop Distributions"] ( http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html )
91+ for guidance on building a Spark application that works with a particular
92+ distribution.
12593
12694## Configuration
12795
12896Please refer to the [ Configuration guide] ( http://spark.apache.org/docs/latest/configuration.html )
12997in the online documentation for an overview on how to configure Spark.
130-
131-
132- ## Contributing to Spark
133-
134- Contributions via GitHub pull requests are gladly accepted from their original
135- author. Along with any pull requests, please state that the contribution is
136- your original work and that you license the work to the project under the
137- project's open source license. Whether or not you state this explicitly, by
138- submitting any copyrighted material via pull request, email, or other means
139- you agree to license the material under the project's open source license and
140- warrant that you have the legal authority to do so.
141-
142- Please see [ Contributing to Spark wiki page] ( https://cwiki.apache.org/SPARK/Contributing+to+Spark )
143- for more information.
0 commit comments