Skip to content

Commit 643c49c

Browse files
committed
[SPARK-11305][DOCS] Remove Third-Party Hadoop Distributions Doc Page
Remove Hadoop third party distro page, and move Hadoop cluster config info to configuration page CC pwendell Author: Sean Owen <[email protected]> Closes #9298 from srowen/SPARK-11305.
1 parent aa494a9 commit 643c49c

File tree

6 files changed

+19
-129
lines changed

6 files changed

+19
-129
lines changed

README.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -87,10 +87,7 @@ Hadoop, you must build Spark against the same version that your cluster runs.
8787
Please refer to the build documentation at
8888
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
8989
for detailed guidance on building for a particular distribution of Hadoop, including
90-
building for particular Hive and Hive Thriftserver distributions. See also
91-
["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html)
92-
for guidance on building a Spark application that works with a particular
93-
distribution.
90+
building for particular Hive and Hive Thriftserver distributions.
9491

9592
## Configuration
9693

docs/_layouts/global.html

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,6 @@
112112
<li><a href="job-scheduling.html">Job Scheduling</a></li>
113113
<li><a href="security.html">Security</a></li>
114114
<li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
115-
<li><a href="hadoop-third-party-distributions.html">3<sup>rd</sup>-Party Hadoop Distros</a></li>
116115
<li class="divider"></li>
117116
<li><a href="building-spark.html">Building Spark</a></li>
118117
<li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li>

docs/configuration.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1674,3 +1674,18 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can config
16741674
To specify a different configuration directory other than the default "SPARK_HOME/conf",
16751675
you can set SPARK_CONF_DIR. Spark will use the the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc)
16761676
from this directory.
1677+
1678+
# Inheriting Hadoop Cluster Configuration
1679+
1680+
If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that
1681+
should be included on Spark's classpath:
1682+
1683+
* `hdfs-site.xml`, which provides default behaviors for the HDFS client.
1684+
* `core-site.xml`, which sets the default filesystem name.
1685+
1686+
The location of these configuration files varies across CDH and HDP versions, but
1687+
a common location is inside of `/etc/hadoop/conf`. Some tools, such as Cloudera Manager, create
1688+
configurations on-the-fly, but offer a mechanisms to download copies of them.
1689+
1690+
To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh`
1691+
to a location containing the configuration files.

docs/hadoop-third-party-distributions.md

Lines changed: 0 additions & 117 deletions
This file was deleted.

docs/index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,6 @@ options for deployment:
117117
* [Job Scheduling](job-scheduling.html): scheduling resources across and within Spark applications
118118
* [Security](security.html): Spark security support
119119
* [Hardware Provisioning](hardware-provisioning.html): recommendations for cluster hardware
120-
* [3<sup>rd</sup> Party Hadoop Distributions](hadoop-third-party-distributions.html): using common Hadoop distributions
121120
* Integration with other storage systems:
122121
* [OpenStack Swift](storage-openstack-swift.html)
123122
* [Building Spark](building-spark.html): build Spark using the Maven system

docs/programming-guide.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,7 @@ To write a Spark application, you need to add a Maven dependency on Spark. Spark
3434
version = {{site.SPARK_VERSION}}
3535

3636
In addition, if you wish to access an HDFS cluster, you need to add a dependency on
37-
`hadoop-client` for your version of HDFS. Some common HDFS version tags are listed on the
38-
[third party distributions](hadoop-third-party-distributions.html) page.
37+
`hadoop-client` for your version of HDFS.
3938

4039
groupId = org.apache.hadoop
4140
artifactId = hadoop-client
@@ -66,8 +65,7 @@ To write a Spark application in Java, you need to add a dependency on Spark. Spa
6665
version = {{site.SPARK_VERSION}}
6766

6867
In addition, if you wish to access an HDFS cluster, you need to add a dependency on
69-
`hadoop-client` for your version of HDFS. Some common HDFS version tags are listed on the
70-
[third party distributions](hadoop-third-party-distributions.html) page.
68+
`hadoop-client` for your version of HDFS.
7169

7270
groupId = org.apache.hadoop
7371
artifactId = hadoop-client
@@ -93,8 +91,7 @@ This script will load Spark's Java/Scala libraries and allow you to submit appli
9391
You can also use `bin/pyspark` to launch an interactive Python shell.
9492

9593
If you wish to access HDFS data, you need to use a build of PySpark linking
96-
to your version of HDFS. Some common HDFS version tags are listed on the
97-
[third party distributions](hadoop-third-party-distributions.html) page.
94+
to your version of HDFS.
9895
[Prebuilt packages](http://spark.apache.org/downloads.html) are also available on the Spark homepage
9996
for common HDFS versions.
10097

0 commit comments

Comments
 (0)