-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-30879][DOCS] Refine workflow for building docs #27534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f113896
be8a792
5275521
c6e7695
4ca628e
a1b44f2
5e06418
8954672
b66c077
f3cc9d4
172fb22
bc15b86
25fdaef
2b1b42f
8d05f6d
ebba668
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,9 +20,9 @@ | |
| # Includes: | ||
| # * Java 8 | ||
| # * Ivy | ||
| # * Python (2.7.15/3.6.7) | ||
| # * Python 3.7 | ||
| # * Ruby 2.7 | ||
| # * R-base/R-base-dev (3.6.1) | ||
| # * Ruby 2.3 build utilities | ||
|
|
||
| FROM ubuntu:18.04 | ||
|
|
||
|
|
@@ -33,15 +33,11 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true | |
| # These arguments are just for reuse and not really meant to be customized. | ||
| ARG APT_INSTALL="apt-get install --no-install-recommends -y" | ||
|
|
||
| ARG BASE_PIP_PKGS="setuptools wheel" | ||
| ARG PIP_PKGS="pyopenssl numpy sphinx" | ||
| ARG PIP_PKGS="sphinx==2.3.1 mkdocs==1.0.4 numpy==1.18.1" | ||
| ARG GEM_PKGS="jekyll:4.0.0 jekyll-redirect-from:0.16.0 rouge:3.15.0" | ||
|
|
||
| # Install extra needed repos and refresh. | ||
| # - CRAN repo | ||
| # - Ruby repo (for doc generation) | ||
| # | ||
| # This is all in a single "RUN" command so that if anything changes, "apt update" is run to fetch | ||
| # the most current package versions (instead of potentially using old versions cached by docker). | ||
| RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \ | ||
| echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' >> /etc/apt/sources.list && \ | ||
| gpg --keyserver keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9 && \ | ||
|
|
@@ -50,36 +46,43 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \ | |
| rm -rf /var/lib/apt/lists/* && \ | ||
| apt-get clean && \ | ||
| apt-get update && \ | ||
| $APT_INSTALL software-properties-common && \ | ||
| apt-add-repository -y ppa:brightbox/ruby-ng && \ | ||
| apt-get update && \ | ||
| # Install openjdk 8. | ||
| $APT_INSTALL openjdk-8-jdk && \ | ||
| update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java && \ | ||
| # Install build / source control tools | ||
| $APT_INSTALL curl wget git maven ivy subversion make gcc lsof libffi-dev \ | ||
| pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev && \ | ||
| pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev | ||
|
|
||
| ENV PATH "$PATH:/root/.pyenv/bin:/root/.pyenv/shims" | ||
| RUN curl -L https://github.com/pyenv/pyenv-installer/raw/dd3f7d0914c5b4a416ca71ffabdf2954f2021596/bin/pyenv-installer | bash | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, I suspect it was not tested due to the limitation described in the PR description: Generating SQL API Markdown files.
20/03/31 06:41:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/opt/spark-rm/output/spark/sql/gen-sql-api-docs.py", line 21, in <module>
from pyspark.java_gateway import launch_gateway
File "/opt/spark-rm/output/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
File "/opt/spark-rm/output/spark/python/lib/pyspark.zip/pyspark/context.py", line 22, in <module>
ImportError: No module named threading
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.Seems the installed Python is weird. Let me revert this to make RC preparation easier.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, that's strange, because I was able to build enough of the Dockerfile to get past this point, and I did not hit this problem. What I couldn't test was to run everything via |
||
| RUN $APT_INSTALL libbz2-dev libreadline-dev libsqlite3-dev | ||
| RUN pyenv install 3.7.6 | ||
| RUN pyenv global 3.7.6 | ||
| RUN python --version | ||
| RUN pip install --upgrade pip | ||
| RUN pip --version | ||
| RUN pip install $PIP_PKGS | ||
|
|
||
| ENV PATH "$PATH:/root/.rbenv/bin:/root/.rbenv/shims" | ||
| RUN curl -fsSL https://github.com/rbenv/rbenv-installer/raw/108c12307621a0aa06f19799641848dde1987deb/bin/rbenv-installer | bash | ||
| RUN rbenv install 2.7.0 | ||
| RUN rbenv global 2.7.0 | ||
| RUN ruby --version | ||
| RUN $APT_INSTALL g++ | ||
| RUN gem --version | ||
| RUN gem install --no-document $GEM_PKGS | ||
|
|
||
| RUN \ | ||
| curl -sL https://deb.nodesource.com/setup_11.x | bash && \ | ||
| $APT_INSTALL nodejs && \ | ||
| # Install needed python packages. Use pip for installing packages (for consistency). | ||
| $APT_INSTALL libpython3-dev python3-pip && \ | ||
| # Change default python version to python3. | ||
| update-alternatives --install /usr/bin/python python /usr/bin/python2.7 1 && \ | ||
| update-alternatives --install /usr/bin/python python /usr/bin/python3.6 2 && \ | ||
| update-alternatives --set python /usr/bin/python3.6 && \ | ||
| pip3 install $BASE_PIP_PKGS && \ | ||
| pip3 install $PIP_PKGS && \ | ||
| # Install R packages and dependencies used when building. | ||
| # R depends on pandoc*, libssl (which are installed above). | ||
| $APT_INSTALL nodejs | ||
|
|
||
| # Install R packages and dependencies used when building. | ||
| # R depends on pandoc*, libssl (which are installed above). | ||
| RUN \ | ||
| $APT_INSTALL r-base r-base-dev && \ | ||
| $APT_INSTALL texlive-latex-base texlive texlive-fonts-extra texinfo qpdf && \ | ||
| Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), repos='https://cloud.r-project.org/')" && \ | ||
| Rscript -e "devtools::install_github('jimhester/lintr')" && \ | ||
| # Install tools needed to build the documentation. | ||
| $APT_INSTALL ruby2.3 ruby2.3-dev mkdocs && \ | ||
| gem install jekyll --no-rdoc --no-ri -v 3.8.6 && \ | ||
| gem install jekyll-redirect-from -v 0.15.0 && \ | ||
| gem install rouge | ||
| Rscript -e "devtools::install_github('jimhester/lintr')" | ||
|
|
||
| WORKDIR /opt/spark-rm/output | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,19 +31,49 @@ whichever version of Spark you currently have checked out of revision control. | |
| The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java, | ||
| Python, R and SQL. | ||
|
|
||
| You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and | ||
| [Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python) | ||
| installed. Also install the following libraries: | ||
| You need to have Ruby 2 (preferably Ruby 2.6+) and Python 3 (preferably Python 3.7+) installed. | ||
|
|
||
| You'll also need to install the following libraries: | ||
|
|
||
| ```sh | ||
| gem install jekyll:4.0.0 jekyll-redirect-from:0.16.0 rouge:3.15.0 | ||
| ``` | ||
|
|
||
| ### Using rbenv and pyenv | ||
|
|
||
| A handy way to install and manage various versions of Ruby and Python is with [`rbenv`] and [`pyenv`]. | ||
|
|
||
| [`rbenv`]: https://github.com/rbenv/rbenv | ||
| [`pyenv`]: https://github.com/pyenv/pyenv | ||
|
|
||
| On macOS you can install them with Homebrew: | ||
|
|
||
| ```sh | ||
| $ sudo gem install jekyll jekyll-redirect-from rouge | ||
| brew install rbenv pyenv | ||
| ``` | ||
|
|
||
| Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to replace gem with gem2.0. | ||
| To activate them, you'll need to run these commands or add them to the end of your `.bash_profile`: | ||
|
|
||
| ```sh | ||
| eval "$(rbenv init -)" | ||
| eval "$(pyenv init -)" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One alternative to running these So, for example, |
||
| ``` | ||
|
|
||
| You can now use them to install specific versions of Ruby and Python and associate them with | ||
| the Spark home directory. Whenever you navigate to this directory or any of its subdirectories, these versions of Ruby and Python will be automatically activated. | ||
|
|
||
| ```sh | ||
| rbenv install 2.7.0 | ||
| pyenv install 3.7.6 | ||
|
|
||
| cd /path/to/spark/root | ||
| rbenv local 2.7.0 | ||
| pyenv local 3.7.6 | ||
| ``` | ||
|
|
||
| ### R Documentation | ||
|
|
||
| If you'd like to generate R documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html) | ||
| If you'd like to generate R documentation, you'll need to install R, [install Pandoc](https://pandoc.org/installing.html), | ||
| and install these libraries: | ||
|
|
||
| ```sh | ||
|
|
@@ -58,7 +88,7 @@ Note: Other versions of roxygen2 might work in SparkR documentation generation b | |
| To generate API docs for any language, you'll need to install these libraries: | ||
|
|
||
| ```sh | ||
| $ sudo pip install sphinx mkdocs numpy | ||
| pip install sphinx==2.3.1 mkdocs==1.0.4 numpy==1.18.1 | ||
HyukjinKwon marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ## Generating the Documentation HTML | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun and @wangyum might better to review.