diff --git a/README.md b/README.md
index ed2788c8acbe..b9253cdf3ed0 100644
--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@
# DataFusion
-
+
DataFusion is an extensible query execution framework, written in
Rust, that uses [Apache Arrow](https://arrow.apache.org) as its
diff --git a/datafusion/docs/cli.md b/datafusion/docs/cli.md
deleted file mode 100644
index d62dcdd5b4f1..000000000000
--- a/datafusion/docs/cli.md
+++ /dev/null
@@ -1,102 +0,0 @@
-
-
-# DataFusion CLI
-
-The DataFusion CLI is a command-line interactive SQL utility that allows queries to be executed against CSV and Parquet files. It is a convenient way to try DataFusion out with your own data sources.
-
-## Run using Cargo
-
-Use the following commands to clone this repository and run the CLI. This will require the Rust toolchain to be installed. Rust can be installed from [https://rustup.rs/](https://rustup.rs/).
-
-```bash
-git clone https://github.com/apache/arrow-datafusion
-cd arrow-datafusion/datafusion-cli
-cargo run --release
-```
-
-## Run using Docker
-
-Use the following commands to clone this repository and build a Docker image containing the CLI tool. Note that there is `.dockerignore` file in the root of the repository that may need to be deleted in order for this to work.
-
-```bash
-git clone https://github.com/apache/arrow-datafusion
-cd arrow-datafusion
-docker build -f datafusion-cli/Dockerfile . --tag datafusion-cli
-docker run -it -v $(your_data_location):/data datafusion-cli
-```
-
-## Usage
-
-```
-DataFusion 4.0.0-SNAPSHOT
-DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries
-against CSV and Parquet files as well as querying directly against in-memory data.
-
-USAGE:
- datafusion-cli [FLAGS] [OPTIONS]
-
-FLAGS:
- -h, --help Prints help information
- -q, --quiet Reduce printing other than the results and work quietly
- -V, --version Prints version information
-
-OPTIONS:
- -c, --batch-size The batch size of each query, or use DataFusion default
- -p, --data-path Path to your data, default to current directory
- -f, --file Execute commands from file, then exit
- --format Output format [default: table] [possible values: csv, tsv, table, json, ndjson]
-```
-
-Type `exit` or `quit` to exit the CLI.
-
-## Registering Parquet Data Sources
-
-Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is not necessary to provide schema information for Parquet files.
-
-```sql
-CREATE EXTERNAL TABLE taxi
-STORED AS PARQUET
-LOCATION '/mnt/nyctaxi/tripdata.parquet';
-```
-
-## Registering CSV Data Sources
-
-CSV data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is necessary to provide schema information for CSV files since DataFusion does not automatically infer the schema when using SQL to query CSV files.
-
-```sql
-CREATE EXTERNAL TABLE test (
- c1 VARCHAR NOT NULL,
- c2 INT NOT NULL,
- c3 SMALLINT NOT NULL,
- c4 SMALLINT NOT NULL,
- c5 INT NOT NULL,
- c6 BIGINT NOT NULL,
- c7 SMALLINT NOT NULL,
- c8 INT NOT NULL,
- c9 BIGINT NOT NULL,
- c10 VARCHAR NOT NULL,
- c11 FLOAT NOT NULL,
- c12 DOUBLE NOT NULL,
- c13 VARCHAR NOT NULL
-)
-STORED AS CSV
-WITH HEADER ROW
-LOCATION '/path/to/aggregate_test_100.csv';
-```
diff --git a/docs/user-guide/book.toml b/docs/.gitignore
similarity index 87%
rename from docs/user-guide/book.toml
rename to docs/.gitignore
index efb9212dfdfd..765c378eb3b9 100644
--- a/docs/user-guide/book.toml
+++ b/docs/.gitignore
@@ -15,9 +15,6 @@
# specific language governing permissions and limitations
# under the License.
-[book]
-authors = ["Apache Arrow"]
-language = "en"
-multilingual = false
-src = "src"
-title = "DataFusion User Guide"
+build
+source/python/generated
+venv/
diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 000000000000..6bce19911da5
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,38 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+#
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS ?=
+SPHINXBUILD ?= sphinx-build
+SOURCEDIR = source
+BUILDDIR = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+ @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+ @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/user-guide/README.md b/docs/README.md
similarity index 71%
rename from docs/user-guide/README.md
rename to docs/README.md
index 6698e5631d93..4aa9ea92c9b9 100644
--- a/docs/user-guide/README.md
+++ b/docs/README.md
@@ -17,15 +17,19 @@
under the License.
-->
-# DataFusion User Guide Source
+# DataFusion docs
-This directory contains the sources for the DataFusion user guide.
+## Dependencies
-## Generate HTML
+It's recommended to install build dependencies and build the the documentation
+inside a Python virtualenv.
-To generate the user guide in HTML format, run the following commands:
+- Python
+- `pip install -r requirements.txt`
+- Datafusion python package. You can install the latest version by running `maturin develop` inside `../python` directory.
+
+## Build
```bash
-cargo install mdbook
-mdbook build
+make html
```
diff --git a/docs/make.bat b/docs/make.bat
new file mode 100644
index 000000000000..ded5b4a3e2b6
--- /dev/null
+++ b/docs/make.bat
@@ -0,0 +1,52 @@
+@rem Licensed to the Apache Software Foundation (ASF) under one
+@rem or more contributor license agreements. See the NOTICE file
+@rem distributed with this work for additional information
+@rem regarding copyright ownership. The ASF licenses this file
+@rem to you under the Apache License, Version 2.0 (the
+@rem "License"); you may not use this file except in compliance
+@rem with the License. You may obtain a copy of the License at
+@rem
+@rem http://www.apache.org/licenses/LICENSE-2.0
+@rem
+@rem Unless required by applicable law or agreed to in writing,
+@rem software distributed under the License is distributed on an
+@rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+@rem KIND, either express or implied. See the License for the
+@rem specific language governing permissions and limitations
+@rem under the License.
+
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+ set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+ echo.
+ echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+ echo.installed, then set the SPHINXBUILD environment variable to point
+ echo.to the full path of the 'sphinx-build' executable. Alternatively you
+ echo.may add the Sphinx directory to PATH.
+ echo.
+ echo.If you don't have Sphinx installed, grab it from
+ echo.http://sphinx-doc.org/
+ exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/docs/requirements.txt b/docs/requirements.txt
new file mode 100644
index 000000000000..0f18a11554cd
--- /dev/null
+++ b/docs/requirements.txt
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+sphinx==2.4.4
+pydata-sphinx-theme
+myst-parser<1
+maturin<0.12
diff --git a/datafusion/docs/images/DataFusion-Logo-Background-White.png b/docs/source/_static/images/DataFusion-Logo-Background-White.png
similarity index 100%
rename from datafusion/docs/images/DataFusion-Logo-Background-White.png
rename to docs/source/_static/images/DataFusion-Logo-Background-White.png
diff --git a/datafusion/docs/images/DataFusion-Logo-Background-White.svg b/docs/source/_static/images/DataFusion-Logo-Background-White.svg
similarity index 100%
rename from datafusion/docs/images/DataFusion-Logo-Background-White.svg
rename to docs/source/_static/images/DataFusion-Logo-Background-White.svg
diff --git a/datafusion/docs/images/DataFusion-Logo-Dark.png b/docs/source/_static/images/DataFusion-Logo-Dark.png
similarity index 100%
rename from datafusion/docs/images/DataFusion-Logo-Dark.png
rename to docs/source/_static/images/DataFusion-Logo-Dark.png
diff --git a/datafusion/docs/images/DataFusion-Logo-Dark.svg b/docs/source/_static/images/DataFusion-Logo-Dark.svg
similarity index 100%
rename from datafusion/docs/images/DataFusion-Logo-Dark.svg
rename to docs/source/_static/images/DataFusion-Logo-Dark.svg
diff --git a/datafusion/docs/images/DataFusion-Logo-Light.png b/docs/source/_static/images/DataFusion-Logo-Light.png
similarity index 100%
rename from datafusion/docs/images/DataFusion-Logo-Light.png
rename to docs/source/_static/images/DataFusion-Logo-Light.png
diff --git a/datafusion/docs/images/DataFusion-Logo-Light.svg b/docs/source/_static/images/DataFusion-Logo-Light.svg
similarity index 100%
rename from datafusion/docs/images/DataFusion-Logo-Light.svg
rename to docs/source/_static/images/DataFusion-Logo-Light.svg
diff --git a/docs/source/_static/theme_overrides.css b/docs/source/_static/theme_overrides.css
new file mode 100644
index 000000000000..1e972cc6fc4f
--- /dev/null
+++ b/docs/source/_static/theme_overrides.css
@@ -0,0 +1,93 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+
+/* Customizing with theme CSS variables */
+
+:root {
+ --pst-color-active-navigation: 215, 70, 51;
+ --pst-color-link-hover: 215, 70, 51;
+ --pst-color-headerlink: 215, 70, 51;
+ /* Use normal text color (like h3, ..) instead of primary color */
+ --pst-color-h1: var(--color-text-base);
+ --pst-color-h2: var(--color-text-base);
+ /* Use softer blue from bootstrap's default info color */
+ --pst-color-info: 23, 162, 184;
+ --pst-header-height: 0px;
+}
+
+code {
+ color: rgb(215, 70, 51);
+}
+
+.footer {
+ text-align: center;
+}
+
+/* Ensure the logo is properly displayed */
+
+.navbar-brand {
+ height: auto;
+ width: auto;
+}
+
+a.navbar-brand img {
+ height: auto;
+ width: auto;
+ max-height: 15vh;
+ max-width: 100%;
+}
+
+
+/* This is the bootstrap CSS style for "table-striped". Since the theme does
+not yet provide an easy way to configure this globaly, it easier to simply
+include this snippet here than updating each table in all rst files to
+add ":class: table-striped" */
+
+.table tbody tr:nth-of-type(odd) {
+ background-color: rgba(0, 0, 0, 0.05);
+}
+
+
+/* Limit the max height of the sidebar navigation section. Because in our
+custimized template, there is more content above the navigation, i.e.
+larger logo: if we don't decrease the max-height, it will overlap with
+the footer.
+Details: min(15vh, 110px) for the logo size, 8rem for search box etc*/
+
+@media (min-width:720px) {
+ @supports (position:-webkit-sticky) or (position:sticky) {
+ .bd-links {
+ max-height: calc(100vh - min(15vh, 110px) - 8rem)
+ }
+ }
+}
+
+
+/* Fix table text wrapping in RTD theme,
+ * see https://rackerlabs.github.io/docs-rackspace/tools/rtd-tables.html
+ */
+
+@media screen {
+ table.docutils td {
+ /* !important prevents the common CSS stylesheets from overriding
+ this as on RTD they are loaded after this stylesheet */
+ white-space: normal !important;
+ }
+}
diff --git a/docs/source/_templates/docs-sidebar.html b/docs/source/_templates/docs-sidebar.html
new file mode 100644
index 000000000000..bc2bf0092204
--- /dev/null
+++ b/docs/source/_templates/docs-sidebar.html
@@ -0,0 +1,19 @@
+
+
+
+
+
+
+
+
diff --git a/docs/source/_templates/layout.html b/docs/source/_templates/layout.html
new file mode 100644
index 000000000000..a9d0f30bcf8e
--- /dev/null
+++ b/docs/source/_templates/layout.html
@@ -0,0 +1,5 @@
+{% extends "pydata_sphinx_theme/layout.html" %}
+
+{# Silence the navbar #}
+{% block docs_navbar %}
+{% endblock %}
diff --git a/docs/source/cli/index.rst b/docs/source/cli/index.rst
new file mode 100644
index 000000000000..93ae173ec43f
--- /dev/null
+++ b/docs/source/cli/index.rst
@@ -0,0 +1,113 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=======================
+DataFusion Command-line
+=======================
+
+The Arrow DataFusion CLI is a command-line interactive SQL utility that allows
+queries to be executed against CSV and Parquet files. It is a convenient way to
+try DataFusion out with your own data sources.
+
+Run using Cargo
+===============
+
+Use the following commands to clone this repository and run the CLI. This will require the Rust toolchain to be installed. Rust can be installed from `https://rustup.rs `_.
+
+.. code-block:: bash
+
+ git clone https://github.com/apache/arrow-datafusion
+ cd arrow-datafusion/datafusion-cli
+ cargo run --release
+
+
+Run using Docker
+================
+
+Use the following commands to clone this repository and build a Docker image containing the CLI tool. Note that there is :code:`.dockerignore` file in the root of the repository that may need to be deleted in order for this to work.
+
+.. code-block:: bash
+
+ git clone https://github.com/apache/arrow-datafusion
+ cd arrow-datafusion
+ docker build -f datafusion-cli/Dockerfile . --tag datafusion-cli
+ docker run -it -v $(your_data_location):/data datafusion-cli
+
+
+Usage
+=====
+
+.. code-block:: bash
+
+ DataFusion 5.0.0-SNAPSHOT
+ DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries
+ against CSV and Parquet files as well as querying directly against in-memory data.
+
+ USAGE:
+ datafusion-cli [FLAGS] [OPTIONS]
+
+ FLAGS:
+ -h, --help Prints help information
+ -q, --quiet Reduce printing other than the results and work quietly
+ -V, --version Prints version information
+
+ OPTIONS:
+ -c, --batch-size The batch size of each query, or use DataFusion default
+ -p, --data-path Path to your data, default to current directory
+ -f, --file Execute commands from file, then exit
+ --format Output format [default: table] [possible values: csv, tsv, table, json, ndjson]
+
+Type `exit` or `quit` to exit the CLI.
+
+
+Registering Parquet Data Sources
+================================
+
+Parquet data sources can be registered by executing a :code:`CREATE EXTERNAL TABLE` SQL statement. It is not necessary to provide schema information for Parquet files.
+
+.. code-block:: sql
+
+ CREATE EXTERNAL TABLE taxi
+ STORED AS PARQUET
+ LOCATION '/mnt/nyctaxi/tripdata.parquet';
+
+
+Registering CSV Data Sources
+============================
+
+CSV data sources can be registered by executing a :code:`CREATE EXTERNAL TABLE` SQL statement. It is necessary to provide schema information for CSV files since DataFusion does not automatically infer the schema when using SQL to query CSV files.
+
+.. code-block:: sql
+
+ CREATE EXTERNAL TABLE test (
+ c1 VARCHAR NOT NULL,
+ c2 INT NOT NULL,
+ c3 SMALLINT NOT NULL,
+ c4 SMALLINT NOT NULL,
+ c5 INT NOT NULL,
+ c6 BIGINT NOT NULL,
+ c7 SMALLINT NOT NULL,
+ c8 INT NOT NULL,
+ c9 BIGINT NOT NULL,
+ c10 VARCHAR NOT NULL,
+ c11 FLOAT NOT NULL,
+ c12 DOUBLE NOT NULL,
+ c13 VARCHAR NOT NULL
+ )
+ STORED AS CSV
+ WITH HEADER ROW
+ LOCATION '/path/to/aggregate_test_100.csv';
diff --git a/docs/source/conf.py b/docs/source/conf.py
new file mode 100644
index 000000000000..797123948f0c
--- /dev/null
+++ b/docs/source/conf.py
@@ -0,0 +1,100 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+import datafusion
+
+# -- Project information -----------------------------------------------------
+
+project = 'Arrow Datafusion'
+copyright = '2021, Apache Software Foundation'
+author = 'Arrow Datafusion Authors'
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+ 'sphinx.ext.autodoc',
+ 'sphinx.ext.autosummary',
+ 'sphinx.ext.doctest',
+ 'sphinx.ext.ifconfig',
+ 'sphinx.ext.mathjax',
+ 'sphinx.ext.viewcode',
+ 'sphinx.ext.napoleon',
+ 'myst_parser',
+]
+
+source_suffix = {
+ '.rst': 'restructuredtext',
+ '.md': 'markdown',
+}
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+# Show members for classes in .. autosummary
+autodoc_default_options = {
+ "members": None,
+ "undoc-members": None,
+ "show-inheritance": None,
+ "inherited-members": None,
+}
+
+autosummary_generate = True
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'pydata_sphinx_theme'
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+html_logo = "_static/images/DataFusion-Logo-Background-White.png"
+
+html_css_files = ["theme_overrides.css"]
+
+html_sidebars = {
+ "**": ["docs-sidebar.html"],
+}
diff --git a/docs/source/index.rst b/docs/source/index.rst
new file mode 100644
index 000000000000..eeb89d038ecf
--- /dev/null
+++ b/docs/source/index.rst
@@ -0,0 +1,65 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=======================
+Apache Arrow Datafusion
+=======================
+
+Table of content
+================
+
+.. _toc.usage:
+
+.. toctree::
+ :maxdepth: 1
+ :caption: Supported Environments
+
+ Rust
+ Python
+ Command line
+
+.. _toc.guide:
+
+.. toctree::
+ :maxdepth: 1
+ :caption: User Guide
+
+ user-guide/introduction
+ user-guide/example-usage
+ user-guide/library
+ user-guide/cli
+ user-guide/sql/index
+ user-guide/distributed/index
+ user-guide/faq
+
+.. _toc.specs:
+
+.. toctree::
+ :maxdepth: 1
+ :caption: Specification
+
+ specification/invariants
+ specification/output-field-name-semantic
+
+.. _toc.readme:
+
+.. toctree::
+ :maxdepth: 1
+ :caption: README
+
+ Datafusion
+ Ballista
diff --git a/docs/source/python/api.rst b/docs/source/python/api.rst
new file mode 100644
index 000000000000..f81753e082e4
--- /dev/null
+++ b/docs/source/python/api.rst
@@ -0,0 +1,30 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _api:
+
+*************
+API Reference
+*************
+
+.. toctree::
+ :maxdepth: 2
+
+ api/dataframe
+ api/execution_context
+ api/expression
+ api/functions
diff --git a/docs/source/python/api/dataframe.rst b/docs/source/python/api/dataframe.rst
new file mode 100644
index 000000000000..0a3c4c8b1c34
--- /dev/null
+++ b/docs/source/python/api/dataframe.rst
@@ -0,0 +1,27 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _api.dataframe:
+.. currentmodule:: datafusion
+
+DataFrame
+=========
+
+.. autosummary::
+ :toctree: ../generated/
+
+ DataFrame
diff --git a/docs/source/python/api/execution_context.rst b/docs/source/python/api/execution_context.rst
new file mode 100644
index 000000000000..7f8c840ca0ad
--- /dev/null
+++ b/docs/source/python/api/execution_context.rst
@@ -0,0 +1,27 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _api.execution_context:
+.. currentmodule:: datafusion
+
+ExecutionContext
+================
+
+.. autosummary::
+ :toctree: ../generated/
+
+ ExecutionContext
diff --git a/docs/source/python/api/expression.rst b/docs/source/python/api/expression.rst
new file mode 100644
index 000000000000..45923fb5447f
--- /dev/null
+++ b/docs/source/python/api/expression.rst
@@ -0,0 +1,27 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _api.expression:
+.. currentmodule:: datafusion
+
+Expression
+==========
+
+.. autosummary::
+ :toctree: ../generated/
+
+ Expression
diff --git a/docs/source/python/api/functions.rst b/docs/source/python/api/functions.rst
new file mode 100644
index 000000000000..6f10d826e38a
--- /dev/null
+++ b/docs/source/python/api/functions.rst
@@ -0,0 +1,27 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _api.functions:
+.. currentmodule:: datafusion
+
+Functions
+=========
+
+.. autosummary::
+ :toctree: ../generated/
+
+ functions
diff --git a/docs/source/python/index.rst b/docs/source/python/index.rst
new file mode 100644
index 000000000000..56f9097ffdbd
--- /dev/null
+++ b/docs/source/python/index.rst
@@ -0,0 +1,192 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================
+DataFusion in Python
+====================
+
+This is a Python library that binds to `Apache Arrow `_ in-memory query engine `DataFusion `_.
+
+Like pyspark, it allows you to build a plan through SQL or a DataFrame API against in-memory data, parquet or CSV files, run it in a multi-threaded environment, and obtain the result back in Python.
+
+It also allows you to use UDFs and UDAFs for complex operations.
+
+The major advantage of this library over other execution engines is that this library achieves zero-copy between Python and its execution engine: there is no cost in using UDFs, UDAFs, and collecting the results to Python apart from having to lock the GIL when running those operations.
+
+Its query engine, DataFusion, is written in `Rust `_), which makes strong assumptions about thread safety and lack of memory leaks.
+
+Technically, zero-copy is achieved via the `c data interface `_.
+
+How to use it
+=============
+
+Simple usage:
+
+.. code-block:: python
+
+ import datafusion
+ import pyarrow
+
+ # an alias
+ f = datafusion.functions
+
+ # create a context
+ ctx = datafusion.ExecutionContext()
+
+ # create a RecordBatch and a new DataFrame from it
+ batch = pyarrow.RecordBatch.from_arrays(
+ [pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
+ names=["a", "b"],
+ )
+ df = ctx.create_dataframe([[batch]])
+
+ # create a new statement
+ df = df.select(
+ f.col("a") + f.col("b"),
+ f.col("a") - f.col("b"),
+ )
+
+ # execute and collect the first (and only) batch
+ result = df.collect()[0]
+
+ assert result.column(0) == pyarrow.array([5, 7, 9])
+ assert result.column(1) == pyarrow.array([-3, -3, -3])
+
+
+UDFs
+----
+
+.. code-block:: python
+
+ def is_null(array: pyarrow.Array) -> pyarrow.Array:
+ return array.is_null()
+
+ udf = f.udf(is_null, [pyarrow.int64()], pyarrow.bool_())
+
+ df = df.select(udf(f.col("a")))
+
+
+UDAF
+----
+
+.. code-block:: python
+
+ import pyarrow
+ import pyarrow.compute
+
+
+ class Accumulator:
+ """
+ Interface of a user-defined accumulation.
+ """
+ def __init__(self):
+ self._sum = pyarrow.scalar(0.0)
+
+ def to_scalars(self) -> [pyarrow.Scalar]:
+ return [self._sum]
+
+ def update(self, values: pyarrow.Array) -> None:
+ # not nice since pyarrow scalars can't be summed yet. This breaks on `None`
+ self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(values).as_py())
+
+ def merge(self, states: pyarrow.Array) -> None:
+ # not nice since pyarrow scalars can't be summed yet. This breaks on `None`
+ self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(states).as_py())
+
+ def evaluate(self) -> pyarrow.Scalar:
+ return self._sum
+
+
+ df = ...
+
+ udaf = f.udaf(Accumulator, pyarrow.float64(), pyarrow.float64(), [pyarrow.float64()])
+
+ df = df.aggregate(
+ [],
+ [udaf(f.col("a"))]
+ )
+
+
+How to install (from pip)
+=========================
+
+.. code-block:: shell
+
+ pip install datafusion
+
+
+How to develop
+==============
+
+This assumes that you have rust and cargo installed. We use the workflow recommended by `pyo3 `_ and `maturin `_.
+
+Bootstrap:
+
+.. code-block:: shell
+
+ # fetch this repo
+ git clone git@github.com:apache/arrow-datafusion.git
+
+ cd arrow-datafusion/python
+
+ # prepare development environment (used to build wheel / install in development)
+ python3 -m venv venv
+ # activate the venv
+ source venv/bin/activate
+ pip install -r requirements.txt
+
+
+Whenever rust code changes (your changes or via `git pull`):
+
+.. code-block:: shell
+
+ # make sure you activate the venv using "source venv/bin/activate" first
+ maturin develop
+ python -m pytest
+
+
+How to update dependencies
+==========================
+
+To change test dependencies, change the `requirements.in` and run
+
+.. code-block:: shell
+
+ # install pip-tools (this can be done only once), also consider running in venv
+ pip install pip-tools
+
+ # change requirements.in and then run
+ pip-compile --generate-hashes
+
+
+To update dependencies, run
+
+.. code-block:: shell
+
+ pip-compile update
+
+
+More details about pip-tools `here `_
+
+
+API reference
+=============
+
+.. toctree::
+ :maxdepth: 2
+
+ api
diff --git a/docs/specification/invariants.md b/docs/source/specification/invariants.md
similarity index 100%
rename from docs/specification/invariants.md
rename to docs/source/specification/invariants.md
diff --git a/docs/specification/output-field-name-semantic.md b/docs/source/specification/output-field-name-semantic.md
similarity index 100%
rename from docs/specification/output-field-name-semantic.md
rename to docs/source/specification/output-field-name-semantic.md
diff --git a/docs/user-guide/src/cli.md b/docs/source/user-guide/cli.md
similarity index 97%
rename from docs/user-guide/src/cli.md
rename to docs/source/user-guide/cli.md
index 28716b6ad023..cb95fba5113a 100644
--- a/docs/user-guide/src/cli.md
+++ b/docs/source/user-guide/cli.md
@@ -22,7 +22,7 @@
The DataFusion CLI allows SQL queries to be executed by an in-process DataFusion context, or by a distributed
Ballista context.
-```ignore
+```
USAGE:
datafusion-cli [FLAGS] [OPTIONS]
@@ -44,11 +44,11 @@ OPTIONS:
Create a CSV file to query.
-```bash,ignore
+```bash
$ echo "1,2" > data.csv
```
-```sql,ignore
+```bash
$ datafusion-cli
DataFusion CLI v5.1.0-SNAPSHOT
@@ -69,6 +69,6 @@ DataFusion CLI v5.1.0-SNAPSHOT
The DataFusion CLI can also connect to a Ballista scheduler for query execution.
-```bash,ignore
+```bash
datafusion-cli --host localhost --port 50050
```
diff --git a/docs/source/user-guide/distributed/clients/index.rst b/docs/source/user-guide/distributed/clients/index.rst
new file mode 100644
index 000000000000..c9eb1e1f524b
--- /dev/null
+++ b/docs/source/user-guide/distributed/clients/index.rst
@@ -0,0 +1,25 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Clients
+=======
+
+.. toctree::
+ :maxdepth: 2
+
+ rust
+ python
diff --git a/docs/user-guide/src/distributed/client-python.md b/docs/source/user-guide/distributed/clients/python.md
similarity index 100%
rename from docs/user-guide/src/distributed/client-python.md
rename to docs/source/user-guide/distributed/clients/python.md
diff --git a/docs/user-guide/src/distributed/client-rust.md b/docs/source/user-guide/distributed/clients/rust.md
similarity index 99%
rename from docs/user-guide/src/distributed/client-rust.md
rename to docs/source/user-guide/distributed/clients/rust.md
index 4e6ecf5abcc1..ccf19aa70e3c 100644
--- a/docs/user-guide/src/distributed/client-rust.md
+++ b/docs/source/user-guide/distributed/clients/rust.md
@@ -17,7 +17,7 @@
under the License.
-->
-## Ballista Rust Client
+# Ballista Rust Client
Ballista usage is very similar to DataFusion. Tha main difference is that the starting point is a `BallistaContext`
instead of the DataFusion `ExecutionContext`. Ballista uses the same DataFrame API as DataFusion.
diff --git a/docs/user-guide/src/distributed/cargo-install.md b/docs/source/user-guide/distributed/deployment/cargo-install.md
similarity index 96%
rename from docs/user-guide/src/distributed/cargo-install.md
rename to docs/source/user-guide/distributed/deployment/cargo-install.md
index 504154d97109..22a38d78f650 100644
--- a/docs/user-guide/src/distributed/cargo-install.md
+++ b/docs/source/user-guide/distributed/deployment/cargo-install.md
@@ -17,7 +17,7 @@
under the License.
-->
-## Deploying a standalone Ballista cluster using cargo install
+# Deploying a standalone Ballista cluster using cargo install
A simple way to start a local cluster for testing purposes is to use cargo to install
the scheduler and executor crates.
diff --git a/docs/user-guide/src/distributed/configuration.md b/docs/source/user-guide/distributed/deployment/configuration.md
similarity index 100%
rename from docs/user-guide/src/distributed/configuration.md
rename to docs/source/user-guide/distributed/deployment/configuration.md
diff --git a/docs/user-guide/src/distributed/docker-compose.md b/docs/source/user-guide/distributed/deployment/docker-compose.md
similarity index 100%
rename from docs/user-guide/src/distributed/docker-compose.md
rename to docs/source/user-guide/distributed/deployment/docker-compose.md
diff --git a/docs/user-guide/src/distributed/docker.md b/docs/source/user-guide/distributed/deployment/docker.md
similarity index 98%
rename from docs/user-guide/src/distributed/docker.md
rename to docs/source/user-guide/distributed/deployment/docker.md
index 4892ab83e76e..541a884684db 100644
--- a/docs/user-guide/src/distributed/docker.md
+++ b/docs/source/user-guide/distributed/deployment/docker.md
@@ -17,7 +17,7 @@
under the License.
-->
-## Starting a Ballista cluster using Docker
+# Starting a Ballista cluster using Docker
## Build Docker image
diff --git a/docs/source/user-guide/distributed/deployment/index.rst b/docs/source/user-guide/distributed/deployment/index.rst
new file mode 100644
index 000000000000..f5e41d010843
--- /dev/null
+++ b/docs/source/user-guide/distributed/deployment/index.rst
@@ -0,0 +1,29 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Start a Ballista Cluster
+========================
+
+.. toctree::
+ :maxdepth: 2
+
+ cargo-install
+ docker
+ docker-compose
+ kubernetes
+ raspberrypi
+ configuration
diff --git a/docs/user-guide/src/distributed/kubernetes.md b/docs/source/user-guide/distributed/deployment/kubernetes.md
similarity index 100%
rename from docs/user-guide/src/distributed/kubernetes.md
rename to docs/source/user-guide/distributed/deployment/kubernetes.md
diff --git a/docs/user-guide/src/distributed/raspberrypi.md b/docs/source/user-guide/distributed/deployment/raspberrypi.md
similarity index 100%
rename from docs/user-guide/src/distributed/raspberrypi.md
rename to docs/source/user-guide/distributed/deployment/raspberrypi.md
diff --git a/docs/source/user-guide/distributed/index.rst b/docs/source/user-guide/distributed/index.rst
new file mode 100644
index 000000000000..abb3c7b156b8
--- /dev/null
+++ b/docs/source/user-guide/distributed/index.rst
@@ -0,0 +1,26 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Ballista Distributed Compute
+============================
+
+.. toctree::
+ :maxdepth: 2
+
+ introduction
+ deployment/index
+ clients/index
diff --git a/docs/user-guide/src/distributed/introduction.md b/docs/source/user-guide/distributed/introduction.md
similarity index 99%
rename from docs/user-guide/src/distributed/introduction.md
rename to docs/source/user-guide/distributed/introduction.md
index aebf7002456b..77db6261c0fd 100644
--- a/docs/user-guide/src/distributed/introduction.md
+++ b/docs/source/user-guide/distributed/introduction.md
@@ -17,7 +17,7 @@
under the License.
-->
-## Overview
+# Overview
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is
built on an architecture that allows other programming languages to be supported as first-class citizens without paying
diff --git a/docs/user-guide/src/example-usage.md b/docs/source/user-guide/example-usage.md
similarity index 100%
rename from docs/user-guide/src/example-usage.md
rename to docs/source/user-guide/example-usage.md
diff --git a/docs/user-guide/src/faq.md b/docs/source/user-guide/faq.md
similarity index 100%
rename from docs/user-guide/src/faq.md
rename to docs/source/user-guide/faq.md
diff --git a/docs/user-guide/src/introduction.md b/docs/source/user-guide/introduction.md
similarity index 99%
rename from docs/user-guide/src/introduction.md
rename to docs/source/user-guide/introduction.md
index 7ba3c963cc86..e16504091571 100644
--- a/docs/user-guide/src/introduction.md
+++ b/docs/source/user-guide/introduction.md
@@ -17,7 +17,7 @@
under the License.
-->
-# DataFusion
+# Introduction
DataFusion is an extensible query execution framework, written in
Rust, that uses [Apache Arrow](https://arrow.apache.org) as its
diff --git a/docs/user-guide/src/library.md b/docs/source/user-guide/library.md
similarity index 96%
rename from docs/user-guide/src/library.md
rename to docs/source/user-guide/library.md
index 1a1bbfbeb46e..bfaf741cc2f1 100644
--- a/docs/user-guide/src/library.md
+++ b/docs/source/user-guide/library.md
@@ -58,4 +58,6 @@ Finally, in order to build with the `simd` optimization `cargo nightly` is requi
set architecture you are building on you will want to configure the `target-cpu` as well, ideally
with `native` or at least `avx2`.
-`RUSTFLAGS='-C target-cpu=native' cargo +nightly run --release`
+```
+RUSTFLAGS='-C target-cpu=native' cargo +nightly run --release
+```
diff --git a/docs/user-guide/src/sql/datafusion-functions.md b/docs/source/user-guide/sql/datafusion-functions.md
similarity index 100%
rename from docs/user-guide/src/sql/datafusion-functions.md
rename to docs/source/user-guide/sql/datafusion-functions.md
diff --git a/docs/user-guide/src/sql/ddl.md b/docs/source/user-guide/sql/ddl.md
similarity index 100%
rename from docs/user-guide/src/sql/ddl.md
rename to docs/source/user-guide/sql/ddl.md
diff --git a/docs/source/user-guide/sql/index.rst b/docs/source/user-guide/sql/index.rst
new file mode 100644
index 000000000000..2489f6ba1f10
--- /dev/null
+++ b/docs/source/user-guide/sql/index.rst
@@ -0,0 +1,26 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+SQL Reference
+=============
+
+.. toctree::
+ :maxdepth: 2
+
+ select
+ ddl
+ DataFusion Functions
diff --git a/docs/user-guide/src/sql/select.md b/docs/source/user-guide/sql/select.md
similarity index 94%
rename from docs/user-guide/src/sql/select.md
rename to docs/source/user-guide/sql/select.md
index 348ffff2887f..49399c93c60d 100644
--- a/docs/user-guide/src/sql/select.md
+++ b/docs/source/user-guide/sql/select.md
@@ -37,7 +37,7 @@ DataFusion supports the following syntax for queries:
-# WITH clause
+## WITH clause
A with clause allows to give names for queries and reference them by name.
@@ -46,7 +46,7 @@ WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a)
SELECT a, b FROM x;
```
-# SELECT clause
+## SELECT clause
Example:
@@ -61,7 +61,7 @@ By default `ALL` will be used, which returns all the rows.
SELECT DISTINCT person, age FROM employees
```
-# FROM clause
+## FROM clause
Example:
@@ -69,7 +69,7 @@ Example:
SELECT t.a FROM table AS t
```
-# WHERE clause
+## WHERE clause
Example:
@@ -77,7 +77,7 @@ Example:
SELECT a FROM table WHERE a > 10
```
-# GROUP BY clause
+## GROUP BY clause
Example:
@@ -85,7 +85,7 @@ Example:
SELECT a, b, MAX(c) FROM table GROUP BY a, b
```
-# HAVING clause
+## HAVING clause
Example:
@@ -93,7 +93,7 @@ Example:
SELECT a, b, MAX(c) FROM table GROUP BY a, b HAVING MAX(c) > 10
```
-# UNION clause
+## UNION clause
Example:
@@ -111,7 +111,7 @@ SELECT
FROM table2
```
-# ORDER BY clause
+## ORDER BY clause
Orders the results by the referenced expression. By default it uses ascending order (`ASC`).
This order can be changed to descending by adding `DESC` after the order-by expressions.
@@ -124,7 +124,7 @@ SELECT age, person FROM table ORDER BY age DESC;
SELECT age, person FROM table ORDER BY age, person DESC;
```
-# LIMIT clause
+## LIMIT clause
Limits the number of rows to be a maximum of `count` rows. `count` should be a non-negative integer.
diff --git a/docs/user-guide/.gitignore b/docs/user-guide/.gitignore
deleted file mode 100644
index e9c072897d55..000000000000
--- a/docs/user-guide/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-book
\ No newline at end of file
diff --git a/docs/user-guide/src/SUMMARY.md b/docs/user-guide/src/SUMMARY.md
deleted file mode 100644
index 362103106fed..000000000000
--- a/docs/user-guide/src/SUMMARY.md
+++ /dev/null
@@ -1,43 +0,0 @@
-
-
-# Summary
-
-- [Introduction](introduction.md)
-- [Example Usage](example-usage.md)
-- [Use as a Library](library.md)
-- [DataFusion CLI](cli.md)
-- [SQL Reference](sql/introduction.md)
-
- - [SELECT](sql/select.md)
- - [DDL](sql/ddl.md)
- - [Datafusion Specific Functions](sql/datafusion-functions.md)
-
-- [Ballista Distributed Compute](distributed/introduction.md)
- - [Start a Ballista Cluster](distributed/deployment.md)
- - [Cargo Install](distributed/cargo-install.md)
- - [Docker](distributed/docker.md)
- - [Docker Compose](distributed/docker-compose.md)
- - [Kubernetes](distributed/kubernetes.md)
- - [Raspberry Pi](distributed/raspberrypi.md)
- - [Ballista Configuration](distributed/configuration.md)
- - [Clients](distributed/clients.md)
- - [Rust](distributed/client-rust.md)
- - [Python](distributed/client-python.md)
-- [Frequently Asked Questions](faq.md)
diff --git a/docs/user-guide/src/distributed/clients.md b/docs/user-guide/src/distributed/clients.md
deleted file mode 100644
index 7b69f195b1a2..000000000000
--- a/docs/user-guide/src/distributed/clients.md
+++ /dev/null
@@ -1,23 +0,0 @@
-
-
-## Clients
-
-- [Rust](client-rust.md)
-- [Python](client-python.md)
diff --git a/docs/user-guide/src/distributed/deployment.md b/docs/user-guide/src/distributed/deployment.md
deleted file mode 100644
index fee020c1abcd..000000000000
--- a/docs/user-guide/src/distributed/deployment.md
+++ /dev/null
@@ -1,28 +0,0 @@
-
-
-# Deployment
-
-There are multiple ways that a Ballista cluster can be deployed.
-
-- [Create a cluster using Cargo install](cargo-install.md)
-- [Create a cluster using Docker](docker.md)
-- [Create a cluster using Docker Compose](docker-compose.md)
-- [Create a cluster using Kubernetes](kubernetes.md)
-- [Create a cluster on Raspberry Pi](raspberrypi.md)
diff --git a/docs/user-guide/src/sql/introduction.md b/docs/user-guide/src/sql/introduction.md
deleted file mode 100644
index 89ed2777618d..000000000000
--- a/docs/user-guide/src/sql/introduction.md
+++ /dev/null
@@ -1,20 +0,0 @@
-
-
-# SQL Reference