From 888ba372275864f7f19b0f27bce30aa8e76ef511 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Wed, 30 Apr 2025 16:59:59 -0700 Subject: [PATCH 01/12] update doc --- plugins/spark/README.md | 10 +- .../in-dev/unreleased/polaris-spark-client.md | 114 ++++++++++++++++++ 2 files changed, 123 insertions(+), 1 deletion(-) create mode 100644 site/content/in-dev/unreleased/polaris-spark-client.md diff --git a/plugins/spark/README.md b/plugins/spark/README.md index 0340ea9b7c..4c45ddfb43 100644 --- a/plugins/spark/README.md +++ b/plugins/spark/README.md @@ -30,6 +30,12 @@ and depends on iceberg-spark-runtime 1.8.1. # Build Plugin Jar A task createPolarisSparkJar is added to build a jar for the Polaris Spark plugin, the jar is named as: +`polaris-iceberg--spark-runtime-_-.jar`. For example: +`polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`. + +- `./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar` -- build jar for Spark 3.5 with Scala version 2.12. +- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` -- build jar for Spark 3.5 with Scala version 2.13. + The result jar is located at plugins/spark/v3.5/build//libs after the build. # Start Spark with Local Polaris Service using built Jar @@ -86,10 +92,12 @@ bin/spark-shell \ The Polaris Spark client supports catalog management for both Iceberg and Delta tables, it routes all Iceberg table requests to the Iceberg REST endpoints, and routes all Delta table requests to the Generic Table REST endpoints. -Following describes the current limitations of the Polaris Spark client: +The Spark Client requires at least delta 3.2.1 to work with Delta tables, which requires at least Apache Spark 3.5.3. +Following describes the current functionality limitations of the Polaris Spark client: 1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` is also not supported, since it relies on the CTAS support. 2) Create a Delta table without explicit location is not supported. 3) Rename a Delta table is not supported. 4) ALTER TABLE ... SET LOCATION/SET FILEFORMAT/ADD PARTITION is not supported for DELTA table. 5) For other non-iceberg tables like csv, there is no specific guarantee provided today. +6) Role-based RBAC support for Delta table write is not available. Create, Drop and List RBAC support is available. diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md new file mode 100644 index 0000000000..3f0668935b --- /dev/null +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -0,0 +1,114 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Polaris Spark Client +type: docs +weight: 400 +--- + +Apache Polaris now provides Catalog support for Generic Tables (non-iceberg tables), please check out +the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. + +Along with the Generic Table Catalog support, Polaris is also releasing a Spark Client, which help +providing an end-to-end solution for Apache Spark to manage Delta tables using Polaris. + +Note the Polaris Spark Client is able to handle both Iceberg and Delta tables, not just Delta. + +This pages documents how to build and use the Apache Polaris Spark Client before formal release. + +## Prerequisite +1. Check out the polaris repo +```shell +cd ~ +git clone https://github.com/apache/polaris.git +``` +2. Spark with version >= 3.5.3 and <= 3.5.5, recommended with 3.5.5. +```shell +cd ~ +wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz +mkdir spark-3.5 +tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 +cd spark-3.5 +``` + +All Spark Client code is available under plugins/spark of the polaris repo, and we currently only provide +support for Spark 3.5. + +## Quick Start with Local Polaris Service +If you want to quickly try out the functionality with a local Polaris service, you can follow the instructions +in plugins/spark/v3.5/getting-started/README.md. + +The getting-started will start two containers: +1) The `polaris` service for running Apache Polaris using an in-memory metastore +2) The `jupyter` service for running Jupyter notebook with PySpark (Spark 3.5.5 is used) + +The notebook SparkPolaris.ipynb provided under plugins/spark/v3.5/getting-started/notebooks provides examples +of basic commands, includes: +1) Connect to Polaris using Python client to create Catalog and Roles +2) Start Spark session using the Polaris Spark Client +3) Using Spark to perform table operations for both Delta and Iceberg + +## Start Spark against a deployed Polaris Service +If you want to start Spark with a deployed Polaris service, you can follow the following instructions. + +Before start, Make sure the service deployed is up-to-date, and Spark 3.5 with at least version 3.5.3 is installed. + +### Build Spark Client Jars +A task createPolarisSparkJar is added to project polaris-spark to help building jars for the Polaris Spark plugin, +the jar is named as: +`polaris-iceberg--spark-runtime-_-.jar`. +For example: +`polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`. + +Run the following to build a Spark Client jar that is compatible with Spark 3.5 and Scala 3.12. +```shell +cd ~/polaris +./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar +``` +If you want to build a Scala 2.13 compatible jar, you can use the following command: +- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` -- build jar for Spark 3.5 with Scala version 2.13. + +The result jar is located at plugins/spark/v3.5/build//libs after the build. You can also copy the +corresponding jar to any location your Spark will have access. + +### Connecting with Spark Using the built jar +The following CLI command can be used to start the spark with connection to the deployed Polaris service using +the Polaris Spark Client. + +```shell +bin/spark-shell \ +--jars \ +--packages org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ +--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ +--conf spark.sql.catalog..warehouse= \ +--conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ +--conf spark.sql.catalog..uri= \ +--conf spark.sql.catalog..credential=':' \ +--conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog..token-refresh-enabled=true +``` + +Replace `path-to-spark-client-jar` to where the built jar is located. The `spark-catalog-name` is the catalog name you +wil use with spark, and `polaris-catalog-name` is the catalog name used by Polaris service, for simplicity, you can use +the same name. Replace the `polaris-service-uri`, `client-id` and `client-secret` accordingly, you can refer to +[Quick Start]({{% ref "../0.9.0/quickstart" %}}) for more details about those fields. + +Or you can star \ No newline at end of file From ae805effcc99ecba6a30f24b3a72c478d200db7f Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Wed, 30 Apr 2025 17:30:35 -0700 Subject: [PATCH 02/12] update instruction --- .../in-dev/unreleased/polaris-spark-client.md | 69 +++++++++++++++---- 1 file changed, 55 insertions(+), 14 deletions(-) diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index 3f0668935b..a6c2374335 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -47,19 +47,18 @@ tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 cd spark-3.5 ``` -All Spark Client code is available under plugins/spark of the polaris repo, and we currently only provide -support for Spark 3.5. +All Spark Client code is available under plugins/spark of the polaris repo. ## Quick Start with Local Polaris Service If you want to quickly try out the functionality with a local Polaris service, you can follow the instructions -in plugins/spark/v3.5/getting-started/README.md. +in `plugins/spark/v3.5/getting-started/README.md`. The getting-started will start two containers: 1) The `polaris` service for running Apache Polaris using an in-memory metastore 2) The `jupyter` service for running Jupyter notebook with PySpark (Spark 3.5.5 is used) -The notebook SparkPolaris.ipynb provided under plugins/spark/v3.5/getting-started/notebooks provides examples -of basic commands, includes: +The notebook `SparkPolaris.ipynb` provided under `plugins/spark/v3.5/getting-started/notebooks` provides examples +with basic commands, includes: 1) Connect to Polaris using Python client to create Catalog and Roles 2) Start Spark session using the Polaris Spark Client 3) Using Spark to perform table operations for both Delta and Iceberg @@ -70,26 +69,26 @@ If you want to start Spark with a deployed Polaris service, you can follow the f Before start, Make sure the service deployed is up-to-date, and Spark 3.5 with at least version 3.5.3 is installed. ### Build Spark Client Jars -A task createPolarisSparkJar is added to project polaris-spark to help building jars for the Polaris Spark plugin, -the jar is named as: +The polaris-spark project provides a task createPolarisSparkJar to help building jars for the Polaris Spark client, +The built jar is named as: `polaris-iceberg--spark-runtime-_-.jar`. -For example: -`polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`. -Run the following to build a Spark Client jar that is compatible with Spark 3.5 and Scala 3.12. +For example: `polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`. + +Run the following commands to build a Spark Client jar that is compatible with Spark 3.5 and Scala 2.12. ```shell cd ~/polaris ./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar ``` If you want to build a Scala 2.13 compatible jar, you can use the following command: -- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` -- build jar for Spark 3.5 with Scala version 2.13. +- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` The result jar is located at plugins/spark/v3.5/build//libs after the build. You can also copy the corresponding jar to any location your Spark will have access. ### Connecting with Spark Using the built jar The following CLI command can be used to start the spark with connection to the deployed Polaris service using -the Polaris Spark Client. +the Polaris Spark client jar. ```shell bin/spark-shell \ @@ -107,8 +106,50 @@ bin/spark-shell \ ``` Replace `path-to-spark-client-jar` to where the built jar is located. The `spark-catalog-name` is the catalog name you -wil use with spark, and `polaris-catalog-name` is the catalog name used by Polaris service, for simplicity, you can use +will use with spark, and `polaris-catalog-name` is the catalog name used by Polaris service, for simplicity, you can use the same name. Replace the `polaris-service-uri`, `client-id` and `client-secret` accordingly, you can refer to [Quick Start]({{% ref "../0.9.0/quickstart" %}}) for more details about those fields. -Or you can star \ No newline at end of file +Or you can create a spark session start the connection, following is an example with pyspark +```shell +from pyspark.sql import SparkSession + +spark = SparkSession.builder + .config("spark.jars", ) + .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-spark_2.12:3.3.1") + .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") + .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") + .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") + .config("spark.sql.catalog..uri", ) + .config("spark.sql.catalog..token-refresh-enabled", "true") + .config("spark.sql.catalog..credential", ":") + .config("spark.sql.catalog..warehouse", ) + .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') + .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') + .getOrCreate() +``` +Similar as the CLI command, make sure the corresponding fields are replaced correctly. + +### Create tables with Spark +After the spark is started, you can use it to create and access Iceberg and Delta table like what you are doing before, +for example: +```shell +spark.sql("USE polaris") +spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") +spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") +spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") +spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( + id int, name string) +USING delta LOCATION 'file:///tmp/delta_tables/people'; +""") +``` + +## Limitations +The Polaris Spark client has the following functionality limitations: +1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` + is also not supported, since it relies on the CTAS support. +2) Create a Delta table without explicit location is not supported. +3) Rename a Delta table is not supported. +4) ALTER TABLE ... SET LOCATION/SET FILEFORMAT/ADD PARTITION is not supported for DELTA table. +5) For other non-iceberg tables like csv, there is no specific guarantee provided today. +6) Role-based RBAC support for Delta table write is not available. Create, Drop and List RBAC support is available. From bf27caea03fd92c4b3c77d7ae3c8f9d417ca5472 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Wed, 30 Apr 2025 17:33:07 -0700 Subject: [PATCH 03/12] update instrucrtion --- plugins/spark/README.md | 6 ++---- site/content/in-dev/unreleased/polaris-spark-client.md | 2 +- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/plugins/spark/README.md b/plugins/spark/README.md index 4c45ddfb43..e84e7e6693 100644 --- a/plugins/spark/README.md +++ b/plugins/spark/README.md @@ -57,13 +57,12 @@ bin/spark-shell \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ --conf spark.sql.catalog..warehouse= \ ---conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=true \ +--conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ --conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ --conf spark.sql.catalog..uri=http://localhost:8181/api/catalog \ --conf spark.sql.catalog..credential="root:secret" \ --conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ --conf spark.sql.catalog..token-refresh-enabled=true \ ---conf spark.sql.catalog..type=rest \ --conf spark.sql.sources.useV1SourceList='' ``` @@ -78,13 +77,12 @@ bin/spark-shell \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ --conf spark.sql.catalog.polaris.warehouse= \ ---conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=true \ +--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \ --conf spark.sql.catalog.polaris=org.apache.polaris.spark.SparkCatalog \ --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \ --conf spark.sql.catalog.polaris.credential="root:secret" \ --conf spark.sql.catalog.polaris.scope='PRINCIPAL_ROLE:ALL' \ --conf spark.sql.catalog.polaris.token-refresh-enabled=true \ ---conf spark.sql.catalog.polaris.type=rest \ --conf spark.sql.sources.useV1SourceList='' ``` diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index a6c2374335..fcb9ab00de 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -47,7 +47,7 @@ tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 cd spark-3.5 ``` -All Spark Client code is available under plugins/spark of the polaris repo. +All Spark Client code is available under `plugins/spark` of the polaris repo. ## Quick Start with Local Polaris Service If you want to quickly try out the functionality with a local Polaris service, you can follow the instructions From f448b6d3b90cd78174a5883bae4d5c0bdcc0f180 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Thu, 1 May 2025 11:27:51 -0700 Subject: [PATCH 04/12] address feedback --- plugins/spark/README.md | 5 ++-- .../in-dev/unreleased/polaris-spark-client.md | 23 ++++++++++--------- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/plugins/spark/README.md b/plugins/spark/README.md index e84e7e6693..87e7841b9f 100644 --- a/plugins/spark/README.md +++ b/plugins/spark/README.md @@ -97,5 +97,6 @@ Following describes the current functionality limitations of the Polaris Spark c 2) Create a Delta table without explicit location is not supported. 3) Rename a Delta table is not supported. 4) ALTER TABLE ... SET LOCATION/SET FILEFORMAT/ADD PARTITION is not supported for DELTA table. -5) For other non-iceberg tables like csv, there is no specific guarantee provided today. -6) Role-based RBAC support for Delta table write is not available. Create, Drop and List RBAC support is available. +5) For other non-Iceberg tables like csv, there is no specific guarantee provided today. +6) TABLE_WRITE_DATA privilege is not supported for Delta Table. +7) Credential Vending is not supported for Delta Table. diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index fcb9ab00de..6362a1f26f 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -25,12 +25,12 @@ weight: 400 Apache Polaris now provides Catalog support for Generic Tables (non-iceberg tables), please check out the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. -Along with the Generic Table Catalog support, Polaris is also releasing a Spark Client, which help -providing an end-to-end solution for Apache Spark to manage Delta tables using Polaris. +Along with the Generic Table Catalog support, Polaris is also releasing a Spark Client, which helps to +provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. Note the Polaris Spark Client is able to handle both Iceberg and Delta tables, not just Delta. -This pages documents how to build and use the Apache Polaris Spark Client before formal release. +This page documents how to build and use the Polaris Spark Client directly with the source repo. ## Prerequisite 1. Check out the polaris repo @@ -59,14 +59,14 @@ The getting-started will start two containers: The notebook `SparkPolaris.ipynb` provided under `plugins/spark/v3.5/getting-started/notebooks` provides examples with basic commands, includes: -1) Connect to Polaris using Python client to create Catalog and Roles +1) Connect to Polaris using Python client to create a Catalog and Roles 2) Start Spark session using the Polaris Spark Client 3) Using Spark to perform table operations for both Delta and Iceberg ## Start Spark against a deployed Polaris Service -If you want to start Spark with a deployed Polaris service, you can follow the following instructions. +If you want to start Spark with a deployed Polaris service, you can follow the instructions below. -Before start, Make sure the service deployed is up-to-date, and Spark 3.5 with at least version 3.5.3 is installed. +Before starting, make sure the service deployed is up-to-date, and that Spark 3.5 with at least version 3.5.3 is installed. ### Build Spark Client Jars The polaris-spark project provides a task createPolarisSparkJar to help building jars for the Polaris Spark client, @@ -83,7 +83,7 @@ cd ~/polaris If you want to build a Scala 2.13 compatible jar, you can use the following command: - `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` -The result jar is located at plugins/spark/v3.5/build//libs after the build. You can also copy the +The result jar is located at `plugins/spark/v3.5/build//libs` after the build. You can also copy the corresponding jar to any location your Spark will have access. ### Connecting with Spark Using the built jar @@ -111,7 +111,7 @@ the same name. Replace the `polaris-service-uri`, `client-id` and `client-secret [Quick Start]({{% ref "../0.9.0/quickstart" %}}) for more details about those fields. Or you can create a spark session start the connection, following is an example with pyspark -```shell +```python from pyspark.sql import SparkSession spark = SparkSession.builder @@ -133,7 +133,7 @@ Similar as the CLI command, make sure the corresponding fields are replaced corr ### Create tables with Spark After the spark is started, you can use it to create and access Iceberg and Delta table like what you are doing before, for example: -```shell +```python spark.sql("USE polaris") spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") @@ -151,5 +151,6 @@ The Polaris Spark client has the following functionality limitations: 2) Create a Delta table without explicit location is not supported. 3) Rename a Delta table is not supported. 4) ALTER TABLE ... SET LOCATION/SET FILEFORMAT/ADD PARTITION is not supported for DELTA table. -5) For other non-iceberg tables like csv, there is no specific guarantee provided today. -6) Role-based RBAC support for Delta table write is not available. Create, Drop and List RBAC support is available. +5) For other non-Iceberg tables like csv, there is no specific guarantee provided today. +6) TABLE_WRITE_DATA privileges is not supported for Delta Table. +7) Credential Vending is not supported for Delta Table. From 0b4ee9888930faca0fd5196bf2bb0faa0e0745bd Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Thu, 1 May 2025 13:31:49 -0700 Subject: [PATCH 05/12] update reference --- site/content/in-dev/unreleased/polaris-spark-client.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index 6362a1f26f..852b386daf 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -108,7 +108,7 @@ bin/spark-shell \ Replace `path-to-spark-client-jar` to where the built jar is located. The `spark-catalog-name` is the catalog name you will use with spark, and `polaris-catalog-name` is the catalog name used by Polaris service, for simplicity, you can use the same name. Replace the `polaris-service-uri`, `client-id` and `client-secret` accordingly, you can refer to -[Quick Start]({{% ref "../0.9.0/quickstart" %}}) for more details about those fields. +[Using Polaris]({{% ref "getting-started/using-polaris" %}}) for more details about those fields. Or you can create a spark session start the connection, following is an example with pyspark ```python From 842c132be7515905e7e6b512b611a934d8b3e871 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Thu, 1 May 2025 15:36:03 -0700 Subject: [PATCH 06/12] update change --- .../in-dev/unreleased/polaris-spark-client.md | 116 ++++++++++-------- 1 file changed, 68 insertions(+), 48 deletions(-) diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index 852b386daf..b8aab4a62e 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -22,34 +22,26 @@ type: docs weight: 400 --- -Apache Polaris now provides Catalog support for Generic Tables (non-iceberg tables), please check out +Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. -Along with the Generic Table Catalog support, Polaris is also releasing a Spark Client, which helps to +Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. -Note the Polaris Spark Client is able to handle both Iceberg and Delta tables, not just Delta. +Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. -This page documents how to build and use the Polaris Spark Client directly with the source repo. +This page documents how to connect Spark with Polaris Service using the Polaris Spark client. ## Prerequisite -1. Check out the polaris repo +Check out the Polaris repo: ```shell cd ~ git clone https://github.com/apache/polaris.git ``` -2. Spark with version >= 3.5.3 and <= 3.5.5, recommended with 3.5.5. -```shell -cd ~ -wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz -mkdir spark-3.5 -tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 -cd spark-3.5 -``` -All Spark Client code is available under `plugins/spark` of the polaris repo. +All Spark client code is available under `plugins/spark` of the Polaris repo. -## Quick Start with Local Polaris Service +## Quick Start with Local Polaris service If you want to quickly try out the functionality with a local Polaris service, you can follow the instructions in `plugins/spark/v3.5/getting-started/README.md`. @@ -60,40 +52,27 @@ The getting-started will start two containers: The notebook `SparkPolaris.ipynb` provided under `plugins/spark/v3.5/getting-started/notebooks` provides examples with basic commands, includes: 1) Connect to Polaris using Python client to create a Catalog and Roles -2) Start Spark session using the Polaris Spark Client +2) Start Spark session using the Polaris Spark client 3) Using Spark to perform table operations for both Delta and Iceberg -## Start Spark against a deployed Polaris Service -If you want to start Spark with a deployed Polaris service, you can follow the instructions below. - -Before starting, make sure the service deployed is up-to-date, and that Spark 3.5 with at least version 3.5.3 is installed. - -### Build Spark Client Jars -The polaris-spark project provides a task createPolarisSparkJar to help building jars for the Polaris Spark client, -The built jar is named as: -`polaris-iceberg--spark-runtime-_-.jar`. - -For example: `polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`. - -Run the following commands to build a Spark Client jar that is compatible with Spark 3.5 and Scala 2.12. +## Start Spark against a deployed Polaris service +Before starting, make sure the service deployed is up-to-date, and that Spark 3.5 with at least version 3.5.3 is installed. +Spark 3.5.5 is recommended, and you can follow the instructions below to get a Spark 3.5.5 distribution. ```shell -cd ~/polaris -./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar +cd ~ +wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz +mkdir spark-3.5 +tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 +cd spark-3.5 ``` -If you want to build a Scala 2.13 compatible jar, you can use the following command: -- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` -The result jar is located at `plugins/spark/v3.5/build//libs` after the build. You can also copy the -corresponding jar to any location your Spark will have access. - -### Connecting with Spark Using the built jar +### Connecting with Spark using the Polaris Spark client The following CLI command can be used to start the spark with connection to the deployed Polaris service using -the Polaris Spark client jar. +a released Polaris Spark client. ```shell bin/spark-shell \ ---jars \ ---packages org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \ +--packages ,org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ --conf spark.sql.catalog..warehouse= \ @@ -104,19 +83,21 @@ bin/spark-shell \ --conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ --conf spark.sql.catalog..token-refresh-enabled=true ``` +Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-iceberg-1.8.1-spark-runtime-3.5_2.12:1.0.0`, +replace the `polaris-spark-client-package` field with the release. + +The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used +by Polaris service, for simplicity, you can use the same name. -Replace `path-to-spark-client-jar` to where the built jar is located. The `spark-catalog-name` is the catalog name you -will use with spark, and `polaris-catalog-name` is the catalog name used by Polaris service, for simplicity, you can use -the same name. Replace the `polaris-service-uri`, `client-id` and `client-secret` accordingly, you can refer to +Replace the `polaris-service-uri`, `client-id` and `client-secret` accordingly, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) for more details about those fields. -Or you can create a spark session start the connection, following is an example with pyspark +You can also start the connection by creating a Spark session, following is an example with PySpark: ```python from pyspark.sql import SparkSession spark = SparkSession.builder - .config("spark.jars", ) - .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-spark_2.12:3.3.1") + .config("spark.jars.packages", ",org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-spark_2.12:3.3.1") .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") @@ -131,7 +112,7 @@ spark = SparkSession.builder Similar as the CLI command, make sure the corresponding fields are replaced correctly. ### Create tables with Spark -After the spark is started, you can use it to create and access Iceberg and Delta table like what you are doing before, +After the Spark is started, you can use it to create and access Iceberg and Delta table like what you are doing before, for example: ```python spark.sql("USE polaris") @@ -144,6 +125,45 @@ USING delta LOCATION 'file:///tmp/delta_tables/people'; """) ``` +### Build Spark Client jars locally +If there is no released Spark client, or you want to try the Spark client that is currently not yet released. You can +build a Spark Client jar locally with the source repo, and use the local jar to connect Spark with Polaris Service. + +The polaris-spark project provides a task createPolarisSparkJar to help building jars for the Polaris Spark client, +The built jar is named as: +`polaris-iceberg--spark-runtime-_-.jar`. + +For example: `polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`. + +Run the following commands to build a Spark client jar that is compatible with Spark 3.5 and Scala 2.12. +```shell +cd ~/polaris +./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar +``` +If you want to build a Scala 2.13 compatible jar, you can use the following command: +- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` + +The result jar is located at `plugins/spark/v3.5/build//libs` after the build. You can also copy the +corresponding jar to any location your Spark will have access. + +When starting Spark or create Spark session, instead of providing the Polaris Spark client as a `packages` configuration, we +need to provide the `jars` configuration as follows: +```shell +bin/spark-shell \ +--jars \ +--packages org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ +--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ +--conf spark.sql.catalog..warehouse= \ +--conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ +--conf spark.sql.catalog..uri= \ +--conf spark.sql.catalog..credential=':' \ +--conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog..token-refresh-enabled=true +``` +Replace `path-to-spark-client-jar` with where the built jar is located. + ## Limitations The Polaris Spark client has the following functionality limitations: 1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` @@ -152,5 +172,5 @@ The Polaris Spark client has the following functionality limitations: 3) Rename a Delta table is not supported. 4) ALTER TABLE ... SET LOCATION/SET FILEFORMAT/ADD PARTITION is not supported for DELTA table. 5) For other non-Iceberg tables like csv, there is no specific guarantee provided today. -6) TABLE_WRITE_DATA privileges is not supported for Delta Table. +6) TABLE_WRITE_DATA privilege is not supported for Delta Table. 7) Credential Vending is not supported for Delta Table. From 2d19c57980c55c3f57eef6b41db1fcd3d29a3c7e Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Thu, 1 May 2025 15:40:17 -0700 Subject: [PATCH 07/12] add change --- site/content/in-dev/unreleased/polaris-spark-client.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index b8aab4a62e..8b5dc9e542 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -125,7 +125,7 @@ USING delta LOCATION 'file:///tmp/delta_tables/people'; """) ``` -### Build Spark Client jars locally +## Connecting with Spark using local Polaris Spark client If there is no released Spark client, or you want to try the Spark client that is currently not yet released. You can build a Spark Client jar locally with the source repo, and use the local jar to connect Spark with Polaris Service. From 492459d276fd2669392ca5c30082d743c73918f6 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Fri, 2 May 2025 11:06:51 -0700 Subject: [PATCH 08/12] add change --- .../in-dev/unreleased/polaris-spark-client.md | 25 +++++++++---------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index 8b5dc9e542..e8cd997662 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -19,7 +19,7 @@ # Title: Polaris Spark Client type: docs -weight: 400 +weight: 650 --- Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out @@ -67,7 +67,7 @@ cd spark-3.5 ``` ### Connecting with Spark using the Polaris Spark client -The following CLI command can be used to start the spark with connection to the deployed Polaris service using +The following CLI command can be used to start the Spark with connection to the deployed Polaris service using a released Polaris Spark client. ```shell @@ -89,10 +89,12 @@ replace the `polaris-spark-client-package` field with the release. The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used by Polaris service, for simplicity, you can use the same name. -Replace the `polaris-service-uri`, `client-id` and `client-secret` accordingly, you can refer to -[Using Polaris]({{% ref "getting-started/using-polaris" %}}) for more details about those fields. +Replace the `polaris-service-uri` with the uri to the deployed Polaris service you want to use. -You can also start the connection by creating a Spark session, following is an example with PySpark: +For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) +for more details. + +You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: ```python from pyspark.sql import SparkSession @@ -112,8 +114,7 @@ spark = SparkSession.builder Similar as the CLI command, make sure the corresponding fields are replaced correctly. ### Create tables with Spark -After the Spark is started, you can use it to create and access Iceberg and Delta table like what you are doing before, -for example: +After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: ```python spark.sql("USE polaris") spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") @@ -121,13 +122,13 @@ spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( id int, name string) -USING delta LOCATION 'file:///tmp/delta_tables/people'; +USING delta LOCATION 'file:///tmp/var/delta_tables/people'; """) ``` ## Connecting with Spark using local Polaris Spark client -If there is no released Spark client, or you want to try the Spark client that is currently not yet released. You can -build a Spark Client jar locally with the source repo, and use the local jar to connect Spark with Polaris Service. +If you would like to use a version of the Spark client that is currently not yet released, you can +build a Spark client jar locally from source. The polaris-spark project provides a task createPolarisSparkJar to help building jars for the Polaris Spark client, The built jar is named as: @@ -171,6 +172,4 @@ The Polaris Spark client has the following functionality limitations: 2) Create a Delta table without explicit location is not supported. 3) Rename a Delta table is not supported. 4) ALTER TABLE ... SET LOCATION/SET FILEFORMAT/ADD PARTITION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, there is no specific guarantee provided today. -6) TABLE_WRITE_DATA privilege is not supported for Delta Table. -7) Credential Vending is not supported for Delta Table. +5) For other non-Iceberg tables like csv, it is not supported. From f898e93371b9aa1311d40b8040fac722cbb16a98 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Fri, 2 May 2025 11:13:16 -0700 Subject: [PATCH 09/12] address feedback --- plugins/spark/README.md | 4 +--- site/content/in-dev/unreleased/polaris-spark-client.md | 3 ++- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/plugins/spark/README.md b/plugins/spark/README.md index 87e7841b9f..66d4c29834 100644 --- a/plugins/spark/README.md +++ b/plugins/spark/README.md @@ -97,6 +97,4 @@ Following describes the current functionality limitations of the Polaris Spark c 2) Create a Delta table without explicit location is not supported. 3) Rename a Delta table is not supported. 4) ALTER TABLE ... SET LOCATION/SET FILEFORMAT/ADD PARTITION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, there is no specific guarantee provided today. -6) TABLE_WRITE_DATA privilege is not supported for Delta Table. -7) Credential Vending is not supported for Delta Table. +5) For other non-Iceberg tables like csv, it is not supported today. diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index e8cd997662..94bf8939c4 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -89,7 +89,8 @@ replace the `polaris-spark-client-package` field with the release. The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used by Polaris service, for simplicity, you can use the same name. -Replace the `polaris-service-uri` with the uri to the deployed Polaris service you want to use. +Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed +Polaris service, the uri would be `http://localhost:8181/api/catalog`. For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) for more details. From 6d09e16754f7deedb0bc8600501f170efccfe4f6 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Fri, 2 May 2025 16:52:31 -0700 Subject: [PATCH 10/12] address feedback --- .../in-dev/unreleased/polaris-spark-client.md | 62 +++---------------- 1 file changed, 8 insertions(+), 54 deletions(-) diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index 94bf8939c4..6167231f2b 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -32,31 +32,19 @@ Note the Polaris Spark client is able to handle both Iceberg and Delta tables, n This page documents how to connect Spark with Polaris Service using the Polaris Spark client. -## Prerequisite +## Quick Start with Local Polaris service +If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo +and follow the instructions in the Spark plugin getting-started +[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). + Check out the Polaris repo: ```shell cd ~ git clone https://github.com/apache/polaris.git ``` -All Spark client code is available under `plugins/spark` of the Polaris repo. - -## Quick Start with Local Polaris service -If you want to quickly try out the functionality with a local Polaris service, you can follow the instructions -in `plugins/spark/v3.5/getting-started/README.md`. - -The getting-started will start two containers: -1) The `polaris` service for running Apache Polaris using an in-memory metastore -2) The `jupyter` service for running Jupyter notebook with PySpark (Spark 3.5.5 is used) - -The notebook `SparkPolaris.ipynb` provided under `plugins/spark/v3.5/getting-started/notebooks` provides examples -with basic commands, includes: -1) Connect to Polaris using Python client to create a Catalog and Roles -2) Start Spark session using the Polaris Spark client -3) Using Spark to perform table operations for both Delta and Iceberg - ## Start Spark against a deployed Polaris service -Before starting, make sure the service deployed is up-to-date, and that Spark 3.5 with at least version 3.5.3 is installed. +Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). Spark 3.5.5 is recommended, and you can follow the instructions below to get a Spark 3.5.5 distribution. ```shell cd ~ @@ -129,42 +117,8 @@ USING delta LOCATION 'file:///tmp/var/delta_tables/people'; ## Connecting with Spark using local Polaris Spark client If you would like to use a version of the Spark client that is currently not yet released, you can -build a Spark client jar locally from source. - -The polaris-spark project provides a task createPolarisSparkJar to help building jars for the Polaris Spark client, -The built jar is named as: -`polaris-iceberg--spark-runtime-_-.jar`. - -For example: `polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`. - -Run the following commands to build a Spark client jar that is compatible with Spark 3.5 and Scala 2.12. -```shell -cd ~/polaris -./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar -``` -If you want to build a Scala 2.13 compatible jar, you can use the following command: -- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` - -The result jar is located at `plugins/spark/v3.5/build//libs` after the build. You can also copy the -corresponding jar to any location your Spark will have access. - -When starting Spark or create Spark session, instead of providing the Polaris Spark client as a `packages` configuration, we -need to provide the `jars` configuration as follows: -```shell -bin/spark-shell \ ---jars \ ---packages org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ ---conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ ---conf spark.sql.catalog..warehouse= \ ---conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ ---conf spark.sql.catalog..uri= \ ---conf spark.sql.catalog..credential=':' \ ---conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog..token-refresh-enabled=true -``` -Replace `path-to-spark-client-jar` with where the built jar is located. +build a Spark client jar locally from source. Please refer to the Spark plugin +[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. ## Limitations The Polaris Spark client has the following functionality limitations: From 8c1643783ba3ce837786153ac1b6440c8e45dc26 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Fri, 2 May 2025 16:54:50 -0700 Subject: [PATCH 11/12] update --- site/content/in-dev/unreleased/polaris-spark-client.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index 6167231f2b..9e20f15a47 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -117,7 +117,7 @@ USING delta LOCATION 'file:///tmp/var/delta_tables/people'; ## Connecting with Spark using local Polaris Spark client If you would like to use a version of the Spark client that is currently not yet released, you can -build a Spark client jar locally from source. Please refer to the Spark plugin +build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin [README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. ## Limitations From d1d6e8a735ceec72844e4c24756131acc9116256 Mon Sep 17 00:00:00 2001 From: Yun Zou Date: Fri, 2 May 2025 16:57:30 -0700 Subject: [PATCH 12/12] address feedback --- site/content/in-dev/unreleased/polaris-spark-client.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index 9e20f15a47..46796cdc64 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -115,7 +115,7 @@ USING delta LOCATION 'file:///tmp/var/delta_tables/people'; """) ``` -## Connecting with Spark using local Polaris Spark client +## Connecting with Spark using local Polaris Spark client jar If you would like to use a version of the Spark client that is currently not yet released, you can build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin [README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions.