@@ -185,3 +185,59 @@ the interface as describe in the :ref:`Custom Table Provider <io_custom_table_pr
185185section. This is an advanced topic, but a
186186`user example <https://github.com/apache/datafusion-python/tree/main/examples/ffi-table-provider >`_
187187is provided in the DataFusion repository.
188+
189+ Catalog
190+ =======
191+
192+ A common technique for organizing tables is using a three level hierarchical approach. DataFusion
193+ supports this form of organizing using the :py:class: `~datafusion.catalog.Catalog `,
194+ :py:class: `~datafusion.catalog.Schema `, and :py:class: `~datafusion.catalog.Table `. By default,
195+ a :py:class: `~datafusion.context.SessionContext ` comes with a single Catalog and a single Schema
196+ with the names ``datafusion `` and ``default ``, respectively.
197+
198+ The default implementation uses an in-memory approach to the catalog and schema. We have support
199+ for adding additional in-memory catalogs and schemas. This can be done like in the following
200+ example:
201+
202+ .. code-block :: python
203+
204+ from datafusion.catalog import Catalog, Schema
205+
206+ my_catalog = Catalog.memory_catalog()
207+ my_schema = Schema.memory_schema()
208+
209+ my_catalog.register_schema(" my_schema_name" , my_schema)
210+
211+ ctx.register_catalog(" my_catalog_name" , my_catalog)
212+
213+ You could then register tables in ``my_schema `` and access them either through the DataFrame
214+ API or via sql commands such as ``"SELECT * from my_catalog_name.my_schema_name.my_table" ``.
215+
216+ User Defined Catalog and Schema
217+ -------------------------------
218+
219+ If the in-memory catalogs are insufficient for your uses, there are two approaches you can take
220+ to implementing a custom catalog and/or schema. In the below discussion, we describe how to
221+ implement these for a Catalog, but the approach to implementing for a Schema is nearly
222+ identical.
223+
224+ DataFusion supports Catalogs written in either Rust or Python. If you write a Catalog in Rust,
225+ you will need to export it as a Python library via PyO3. There is a complete example of a
226+ catalog implemented this way in the
227+ `examples folder <https://github.com/apache/datafusion-python/tree/main/examples/ >`_
228+ of our repository. Writing catalog providers in Rust provides typically can lead to significant
229+ performance improvements over the Python based approach.
230+
231+ To implement a Catalog in Python, you will need to inherit from the abstract base class
232+ :py:class: `~datafusion.catalog.CatalogProvider `. There are examples in the
233+ `unit tests <https://github.com/apache/datafusion-python/tree/main/python/tests >`_ of
234+ implementing a basic Catalog in Python where we simply keep a dictionary of the
235+ registered Schemas.
236+
237+ One important note for developers is that when we have a Catalog defined in Python, we have
238+ two different ways of accessing this Catalog. First, we register the catalog with a Rust
239+ wrapper. This allows for any rust based code to call the Python functions as necessary.
240+ Second, if the user access the Catalog via the Python API, we identify this and return back
241+ the original Python object that implements the Catalog. This is an important distinction
242+ for developers because we do *not * return a Python wrapper around the Rust wrapper of the
243+ original Python object.
0 commit comments