From 333fdfa7fd910c72d11b1c28fa41f582d1161b1b Mon Sep 17 00:00:00 2001 From: HyukjinKwon Date: Fri, 20 Nov 2020 09:58:30 +0900 Subject: [PATCH 1/2] Document 'without' value for HADOOP_VERSION in pip installation --- python/docs/source/getting_started/install.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 4039698d3995..653e821e04b6 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -48,7 +48,7 @@ If you want to install extra dependencies for a specific componenet, you can ins pip install pyspark[sql] -For PySpark with a different Hadoop version, you can install it by using ``HADOOP_VERSION`` environment variables as below: +For PySpark with/without a specific Hadoop version, you can install it by using ``HADOOP_VERSION`` environment variables as below: .. code-block:: bash @@ -68,8 +68,8 @@ It is recommended to use ``-v`` option in ``pip`` to track the installation and HADOOP_VERSION=2.7 pip install pyspark -v -Supported versions of Hadoop are ``HADOOP_VERSION=2.7`` and ``HADOOP_VERSION=3.2`` (default). -Note that this installation of PySpark with a different version of Hadoop is experimental. It can change or be removed between minor releases. +Supported values in ``HADOOP_VERSION`` are ``without``, ``2.7`` and ``3.2`` (default). +Note that this installation way of PySpark with/without a specific Hadoop version is experimental. It can change or be removed between minor releases. Using Conda From 4826a68e73449cbeb4460121ee3496dcf68b2757 Mon Sep 17 00:00:00 2001 From: HyukjinKwon Date: Fri, 20 Nov 2020 10:59:13 +0900 Subject: [PATCH 2/2] address comments --- python/docs/source/getting_started/install.rst | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 653e821e04b6..9c9ff7fa7844 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -68,7 +68,12 @@ It is recommended to use ``-v`` option in ``pip`` to track the installation and HADOOP_VERSION=2.7 pip install pyspark -v -Supported values in ``HADOOP_VERSION`` are ``without``, ``2.7`` and ``3.2`` (default). +Supported values in ``HADOOP_VERSION`` are: + +- ``without``: Spark pre-built with user-provided Apache Hadoop +- ``2.7``: Spark pre-built for Apache Hadoop 2.7 +- ``3.2``: Spark pre-built for Apache Hadoop 3.2 and later (default) + Note that this installation way of PySpark with/without a specific Hadoop version is experimental. It can change or be removed between minor releases.