|
1 | 1 | # Apache Spark |
2 | 2 |
|
3 | | -Lightning-Fast Cluster Computing - <http://spark.apache.org/> |
| 3 | +Spark is a fast and general cluster computing system for Big Data. It provides |
| 4 | +high-level APIs in Scala, Java, and Python, and an optimized engine that |
| 5 | +supports general computation graphs for data analysis. It also supports a |
| 6 | +rich set of higher-level tools including Spark SQL for SQL and structured |
| 7 | +data processing, MLlib for machine learning, GraphX for graph processing, |
| 8 | +and Spark Streaming for stream processing. |
| 9 | + |
| 10 | +<http://spark.apache.org/> |
4 | 11 |
|
5 | 12 |
|
6 | 13 | ## Online Documentation |
7 | 14 |
|
8 | 15 | You can find the latest Spark documentation, including a programming |
9 | | -guide, on the project webpage at <http://spark.apache.org/documentation.html>. |
| 16 | +guide, on the [project web page](http://spark.apache.org/documentation.html). |
10 | 17 | This README file only contains basic setup instructions. |
11 | 18 |
|
12 | 19 | ## Building Spark |
13 | 20 |
|
14 | | -Spark is built on Scala 2.10. To build Spark and its example programs, run: |
| 21 | +Spark is built using [Apache Maven](http://maven.apache.org/). |
| 22 | +To build Spark and its example programs, run: |
15 | 23 |
|
16 | | - ./sbt/sbt assembly |
| 24 | + mvn -DskipTests clean package |
17 | 25 |
|
18 | 26 | (You do not need to do this if you downloaded a pre-built package.) |
| 27 | +More detailed documentation is available from the project site, at |
| 28 | +["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html). |
19 | 29 |
|
20 | 30 | ## Interactive Scala Shell |
21 | 31 |
|
@@ -62,65 +72,26 @@ Many of the example programs print usage help if no params are given. |
62 | 72 | Testing first requires [building Spark](#building-spark). Once Spark is built, tests |
63 | 73 | can be run using: |
64 | 74 |
|
65 | | - ./sbt/sbt test |
| 75 | + ./dev/run-tests |
| 76 | + |
| 77 | +Please see the guidance on how to |
| 78 | +[run all automated tests](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting). |
66 | 79 |
|
67 | 80 | ## A Note About Hadoop Versions |
68 | 81 |
|
69 | 82 | Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported |
70 | 83 | storage systems. Because the protocols have changed in different versions of |
71 | 84 | Hadoop, you must build Spark against the same version that your cluster runs. |
72 | | -You can change the version by setting `-Dhadoop.version` when building Spark. |
73 | | - |
74 | | -For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop |
75 | | -versions without YARN, use: |
76 | | - |
77 | | - # Apache Hadoop 1.2.1 |
78 | | - $ sbt/sbt -Dhadoop.version=1.2.1 assembly |
79 | | - |
80 | | - # Cloudera CDH 4.2.0 with MapReduce v1 |
81 | | - $ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly |
82 | | - |
83 | | -For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions |
84 | | -with YARN, also set `-Pyarn`: |
85 | | - |
86 | | - # Apache Hadoop 2.0.5-alpha |
87 | | - $ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly |
88 | | - |
89 | | - # Cloudera CDH 4.2.0 with MapReduce v2 |
90 | | - $ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly |
91 | | - |
92 | | - # Apache Hadoop 2.2.X and newer |
93 | | - $ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly |
94 | | - |
95 | | -When developing a Spark application, specify the Hadoop version by adding the |
96 | | -"hadoop-client" artifact to your project's dependencies. For example, if you're |
97 | | -using Hadoop 1.2.1 and build your application using SBT, add this entry to |
98 | | -`libraryDependencies`: |
99 | | - |
100 | | - "org.apache.hadoop" % "hadoop-client" % "1.2.1" |
101 | | - |
102 | | -If your project is built with Maven, add this to your POM file's `<dependencies>` section: |
103 | | - |
104 | | - <dependency> |
105 | | - <groupId>org.apache.hadoop</groupId> |
106 | | - <artifactId>hadoop-client</artifactId> |
107 | | - <version>1.2.1</version> |
108 | | - </dependency> |
109 | 85 |
|
| 86 | +Please refer to the build documentation at |
| 87 | +["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version) |
| 88 | +for detailed guidance on building for a particular distribution of Hadoop, including |
| 89 | +building for particular Hive and Hive Thriftserver distributions. See also |
| 90 | +["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html) |
| 91 | +for guidance on building a Spark application that works with a particular |
| 92 | +distribution. |
110 | 93 |
|
111 | 94 | ## Configuration |
112 | 95 |
|
113 | 96 | Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html) |
114 | 97 | in the online documentation for an overview on how to configure Spark. |
115 | | - |
116 | | - |
117 | | -## Contributing to Spark |
118 | | - |
119 | | -Contributions via GitHub pull requests are gladly accepted from their original |
120 | | -author. Along with any pull requests, please state that the contribution is |
121 | | -your original work and that you license the work to the project under the |
122 | | -project's open source license. Whether or not you state this explicitly, by |
123 | | -submitting any copyrighted material via pull request, email, or other means |
124 | | -you agree to license the material under the project's open source license and |
125 | | -warrant that you have the legal authority to do so. |
126 | | - |
0 commit comments