|
1 | 1 | # Apache Spark |
2 | 2 |
|
3 | | -Lightning-Fast Cluster Computing - <http://spark.apache.org/> |
| 3 | +Spark is a fast and general cluster computing system for Big Data. It provides |
| 4 | +high-level APIs in Scala, Java, and Python, and an optimized engine that |
| 5 | +supports general computation graphs for data analysis. It also supports a |
| 6 | +rich set of higher-level tools including Spark SQL for SQL and structured |
| 7 | +data processing, MLLib for machine learning, GraphX for graph processing, |
| 8 | +and Spark Streaming. |
| 9 | + |
| 10 | +<http://spark.apache.org/> |
4 | 11 |
|
5 | 12 |
|
6 | 13 | ## Online Documentation |
@@ -69,29 +76,28 @@ can be run using: |
69 | 76 | Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported |
70 | 77 | storage systems. Because the protocols have changed in different versions of |
71 | 78 | Hadoop, you must build Spark against the same version that your cluster runs. |
72 | | -You can change the version by setting the `SPARK_HADOOP_VERSION` environment |
73 | | -when building Spark. |
| 79 | +You can change the version by setting `-Dhadoop.version` when building Spark. |
74 | 80 |
|
75 | 81 | For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop |
76 | 82 | versions without YARN, use: |
77 | 83 |
|
78 | 84 | # Apache Hadoop 1.2.1 |
79 | | - $ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly |
| 85 | + $ sbt/sbt -Dhadoop.version=1.2.1 assembly |
80 | 86 |
|
81 | 87 | # Cloudera CDH 4.2.0 with MapReduce v1 |
82 | | - $ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly |
| 88 | + $ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly |
83 | 89 |
|
84 | 90 | For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions |
85 | | -with YARN, also set `SPARK_YARN=true`: |
| 91 | +with YARN, also set `-Pyarn`: |
86 | 92 |
|
87 | 93 | # Apache Hadoop 2.0.5-alpha |
88 | | - $ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly |
| 94 | + $ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly |
89 | 95 |
|
90 | 96 | # Cloudera CDH 4.2.0 with MapReduce v2 |
91 | | - $ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly |
| 97 | + $ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly |
92 | 98 |
|
93 | 99 | # Apache Hadoop 2.2.X and newer |
94 | | - $ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly |
| 100 | + $ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly |
95 | 101 |
|
96 | 102 | When developing a Spark application, specify the Hadoop version by adding the |
97 | 103 | "hadoop-client" artifact to your project's dependencies. For example, if you're |
|
0 commit comments