Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 21 additions & 16 deletions plugins/spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,16 @@

The Polaris Spark plugin provides a SparkCatalog class, which communicates with the Polaris
REST endpoints, and provides implementations for Apache Spark's
[TableCatalog](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java),
[ViewCatalog](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java) classes.
[SupportsNamespaces](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java),
- [TableCatalog](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java)
- [ViewCatalog](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java)
- [SupportsNamespaces](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java)

Right now, the plugin only provides support for Spark 3.5, Scala version 2.12 and 2.13,
and depends on iceberg-spark-runtime 1.9.1.
Right now, the plugin only provides support for Spark 3.5, Scala version 2.12 and 2.13, and depends on iceberg-spark-runtime 1.9.1.

The Polaris Spark client supports catalog management for both Iceberg and Delta tables. It routes all Iceberg table
requests to the Iceberg REST endpoints and routes all Delta table requests to the Generic Table REST endpoints.

The Spark Client requires at least delta 3.2.1 to work with Delta tables, which requires at least Apache Spark 3.5.3.

# Start Spark with local Polaris service using the Polaris Spark plugin
The following command starts a Polaris server for local testing, it runs on localhost:8181 with default
Expand Down Expand Up @@ -112,15 +116,16 @@ bin/spark-shell \
--conf spark.sql.sources.useV1SourceList=''
```

# Limitations
The Polaris Spark client supports catalog management for both Iceberg and Delta tables, it routes all Iceberg table
requests to the Iceberg REST endpoints, and routes all Delta table requests to the Generic Table REST endpoints.
# Current Limitations
The following describes the current limitations of the Polaris Spark client:

The Spark Client requires at least delta 3.2.1 to work with Delta tables, which requires at least Apache Spark 3.5.3.
Following describes the current functionality limitations of the Polaris Spark client:
1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe`
is also not supported, since it relies on the CTAS support.
2) Create a Delta table without explicit location is not supported.
3) Rename a Delta table is not supported.
4) ALTER TABLE ... SET LOCATION is not supported for DELTA table.
5) For other non-Iceberg tables like csv, it is not supported today.
## General Limitations
1. The Polaris Spark client only supports Iceberg and Delta tables. It does not support other table formats like CSV, JSON, etc.
2. Generic tables (non-Iceberg tables) do not currently support credential vending.

## Delta Table Limitations
1. Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe`
is also not supported, since it relies on the CTAS support.
2. Create a Delta table without explicit location is not supported.
3. Rename a Delta table is not supported.
4. ALTER TABLE ... SET LOCATION is not supported for DELTA table.
4 changes: 2 additions & 2 deletions plugins/spark/v3.5/getting-started/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Getting Started with Apache Spark and Apache Polaris With Delta and Iceberg

This getting started guide provides a `docker-compose` file to set up [Apache Spark](https://spark.apache.org/) with Apache Polaris using
the new Polaris Spark Client.
the new Polaris Spark Client.

The Polaris Spark Client enables manage of both Delta and Iceberg tables using Apache Polaris.

Expand Down Expand Up @@ -48,7 +48,7 @@ To start the `docker-compose` file, run this command from the repo's root direct
docker-compose -f plugins/spark/v3.5/getting-started/docker-compose.yml up
```

This will spin up 2 container services
This will spin up 2 container services:
* The `polaris` service for running Apache Polaris using an in-memory metastore
* The `jupyter` service for running Jupyter notebook with PySpark

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,19 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* Helper class for integrating Delta table functionality with Polaris Spark Catalog.
*
* <p>This class is responsible for dynamically loading and configuring a Delta Catalog
* implementation to work with Polaris. It sets up the Delta Catalog as a delegating catalog
* extension with Polaris Spark Catalog as the delegate, enabling Delta table operations through
* Polaris.
*
* <p>The class uses reflection to configure the Delta Catalog to behave identically to Unity
* Catalog, as the current Delta Catalog implementation is hardcoded for Unity Catalog. This is a
* temporary workaround until Delta extends support for other catalog implementations (see
* https://github.com/delta-io/delta/issues/4306).
*/
public class DeltaHelper {
private static final Logger LOG = LoggerFactory.getLogger(DeltaHelper.class);

Expand Down
Loading