From 8bb73c378ed81dfb5f0cfb33dbb77ee749dc787e Mon Sep 17 00:00:00 2001 From: Adam Christian Date: Thu, 6 Nov 2025 11:21:54 -0800 Subject: [PATCH 1/3] test: Add some Spark client tests --- plugins/spark/README.md | 37 +- plugins/spark/v3.5/getting-started/README.md | 8 +- .../polaris/spark/utils/DeltaHelper.java | 13 + .../polaris/spark/PolarisRESTCatalogTest.java | 345 ++++++++++++++++++ .../polaris/spark/utils/DeltaHelperTest.java | 164 +++++++++ .../spark/utils/PolarisCatalogUtilsTest.java | 162 ++++++++ .../in-dev/unreleased/generic-table.md | 30 +- .../in-dev/unreleased/polaris-spark-client.md | 35 +- 8 files changed, 743 insertions(+), 51 deletions(-) create mode 100644 plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/PolarisRESTCatalogTest.java create mode 100644 plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/DeltaHelperTest.java create mode 100644 plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/PolarisCatalogUtilsTest.java diff --git a/plugins/spark/README.md b/plugins/spark/README.md index 1bdfe3dd70..03ddf50465 100644 --- a/plugins/spark/README.md +++ b/plugins/spark/README.md @@ -21,12 +21,16 @@ The Polaris Spark plugin provides a SparkCatalog class, which communicates with the Polaris REST endpoints, and provides implementations for Apache Spark's -[TableCatalog](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java), -[ViewCatalog](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java) classes. -[SupportsNamespaces](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java), +- [TableCatalog](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java) +- [ViewCatalog](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java) +- [SupportsNamespaces](https://github.com/apache/spark/blob/v3.5.6/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java) -Right now, the plugin only provides support for Spark 3.5, Scala version 2.12 and 2.13, -and depends on iceberg-spark-runtime 1.9.1. +Right now, the plugin only provides support for Spark 3.5, Scala version 2.12 and 2.13, and depends on iceberg-spark-runtime 1.9.1. + +The Polaris Spark client supports catalog management for both Iceberg and Delta Lake tables. It routes all Iceberg table +requests to the Iceberg REST endpoints and routes all Delta Lake table requests to the Generic Table REST endpoints. + +The Spark Client requires at least delta 3.2.1 to work with Delta Lake tables, which requires at least Apache Spark 3.5.3. # Start Spark with local Polaris service using the Polaris Spark plugin The following command starts a Polaris server for local testing, it runs on localhost:8181 with default @@ -113,14 +117,15 @@ bin/spark-shell \ ``` # Limitations -The Polaris Spark client supports catalog management for both Iceberg and Delta tables, it routes all Iceberg table -requests to the Iceberg REST endpoints, and routes all Delta table requests to the Generic Table REST endpoints. - -The Spark Client requires at least delta 3.2.1 to work with Delta tables, which requires at least Apache Spark 3.5.3. -Following describes the current functionality limitations of the Polaris Spark client: -1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` - is also not supported, since it relies on the CTAS support. -2) Create a Delta table without explicit location is not supported. -3) Rename a Delta table is not supported. -4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, it is not supported today. +The following describes the current limitations of the Polaris Spark client: + +## General Limitations +1. The Polaris Spark client only supports Iceberg and Delta Lake tables. It does not support other table formats like CSV, JSON, etc. +2. Generic tables (non-Iceberg tables) do not currently support credential vending. + +## Delta Lake Limitations +1. Create table as select (CTAS) is not supported for Delta Lake tables. As a result, the `saveAsTable` method of `Dataframe` + is also not supported, since it relies on the CTAS support. +2. Create a Delta Lake table without explicit location is not supported. +3. Rename a Delta Lake table is not supported. +4. ALTER TABLE ... SET LOCATION is not supported for DELTA table. diff --git a/plugins/spark/v3.5/getting-started/README.md b/plugins/spark/v3.5/getting-started/README.md index 582bd177a8..ac831a52e6 100644 --- a/plugins/spark/v3.5/getting-started/README.md +++ b/plugins/spark/v3.5/getting-started/README.md @@ -17,12 +17,12 @@ under the License. --> -# Getting Started with Apache Spark and Apache Polaris With Delta and Iceberg +# Getting Started with Apache Spark and Apache Polaris With Delta Lake and Iceberg This getting started guide provides a `docker-compose` file to set up [Apache Spark](https://spark.apache.org/) with Apache Polaris using -the new Polaris Spark Client. +the new Polaris Spark Client. -The Polaris Spark Client enables manage of both Delta and Iceberg tables using Apache Polaris. +The Polaris Spark Client enables manage of both Delta Lake and Iceberg tables using Apache Polaris. A Jupyter notebook is started to run PySpark, and Polaris Python client is also installed to call Polaris APIs directly through Python Client. @@ -48,7 +48,7 @@ To start the `docker-compose` file, run this command from the repo's root direct docker-compose -f plugins/spark/v3.5/getting-started/docker-compose.yml up ``` -This will spin up 2 container services +This will spin up 2 container services: * The `polaris` service for running Apache Polaris using an in-memory metastore * The `jupyter` service for running Jupyter notebook with PySpark diff --git a/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/DeltaHelper.java b/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/DeltaHelper.java index 2974384247..679d444858 100644 --- a/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/DeltaHelper.java +++ b/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/DeltaHelper.java @@ -28,6 +28,19 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; +/** + * Helper class for integrating Delta Lake table functionality with Polaris Spark Catalog. + * + *

This class is responsible for dynamically loading and configuring a Delta Catalog + * implementation to work with Polaris. It sets up the Delta Catalog as a delegating catalog + * extension with Polaris Spark Catalog as the delegate, enabling Delta Lake table operations + * through Polaris. + * + *

The class uses reflection to configure the Delta Catalog to behave identically to Unity + * Catalog, as the current Delta Catalog implementation is hardcoded for Unity Catalog. This is a + * temporary workaround until Delta extends support for other catalog implementations (see + * https://github.com/delta-io/delta/issues/4306). + */ public class DeltaHelper { private static final Logger LOG = LoggerFactory.getLogger(DeltaHelper.class); diff --git a/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/PolarisRESTCatalogTest.java b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/PolarisRESTCatalogTest.java new file mode 100644 index 0000000000..ee27dc84ec --- /dev/null +++ b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/PolarisRESTCatalogTest.java @@ -0,0 +1,345 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.spark; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.anyMap; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; + +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.ImmutableSet; +import java.util.List; +import java.util.Map; +import org.apache.iceberg.CatalogProperties; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.NoSuchTableException; +import org.apache.iceberg.rest.RESTClient; +import org.apache.iceberg.rest.auth.OAuth2Util; +import org.apache.iceberg.rest.responses.ConfigResponse; +import org.apache.polaris.core.rest.PolarisEndpoints; +import org.apache.polaris.spark.rest.GenericTable; +import org.apache.polaris.spark.rest.ListGenericTablesRESTResponse; +import org.apache.polaris.spark.rest.LoadGenericTableRESTResponse; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.mockito.ArgumentCaptor; + +public class PolarisRESTCatalogTest { + + private RESTClient mockClient; + private OAuth2Util.AuthSession mockAuthSession; + private PolarisRESTCatalog catalog; + + @BeforeEach + public void setup() { + mockClient = mock(RESTClient.class); + mockAuthSession = mock(OAuth2Util.AuthSession.class); + when(mockAuthSession.headers()).thenReturn(ImmutableMap.of("Authorization", "Bearer token")); + when(mockClient.withAuthSession(any())).thenReturn(mockClient); + + catalog = new PolarisRESTCatalog(config -> mockClient); + } + + @Test + public void testInitializeWithDefaultEndpoints() { + ConfigResponse configResponse = + ConfigResponse.builder() + .withDefaults(ImmutableMap.of()) + .withOverrides(ImmutableMap.of()) + .build(); + + when(mockClient.get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any())) + .thenReturn(configResponse); + + Map properties = + ImmutableMap.of(CatalogProperties.URI, "http://localhost:8181"); + + catalog.initialize(properties, mockAuthSession); + + verify(mockClient).get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any()); + } + + @Test + public void testInitializeWithCustomEndpoints() { + ConfigResponse configResponse = + ConfigResponse.builder() + .withDefaults(ImmutableMap.of()) + .withOverrides(ImmutableMap.of()) + .withEndpoints( + ImmutableList.of( + PolarisEndpoints.V1_LIST_GENERIC_TABLES, + PolarisEndpoints.V1_CREATE_GENERIC_TABLE)) + .build(); + + when(mockClient.get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any())) + .thenReturn(configResponse); + + Map properties = + ImmutableMap.of(CatalogProperties.URI, "http://localhost:8181"); + + catalog.initialize(properties, mockAuthSession); + + verify(mockClient).get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any()); + } + + @Test + public void testInitializeWithPageSize() { + ConfigResponse configResponse = + ConfigResponse.builder() + .withDefaults(ImmutableMap.of()) + .withOverrides(ImmutableMap.of()) + .build(); + + when(mockClient.get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any())) + .thenReturn(configResponse); + + Map properties = + ImmutableMap.of( + CatalogProperties.URI, + "http://localhost:8181", + PolarisRESTCatalog.REST_PAGE_SIZE, + "10"); + + catalog.initialize(properties, mockAuthSession); + + verify(mockClient).get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any()); + } + + @Test + public void testInitializeWithInvalidPageSize() { + ConfigResponse configResponse = + ConfigResponse.builder() + .withDefaults(ImmutableMap.of()) + .withOverrides(ImmutableMap.of()) + .build(); + + when(mockClient.get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any())) + .thenReturn(configResponse); + + Map properties = + ImmutableMap.of( + CatalogProperties.URI, + "http://localhost:8181", + PolarisRESTCatalog.REST_PAGE_SIZE, + "-1"); + + assertThatThrownBy(() -> catalog.initialize(properties, mockAuthSession)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("must be a positive integer"); + } + + @Test + public void testInitializeWithNullConfig() { + assertThatThrownBy(() -> catalog.initialize(null, mockAuthSession)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("Invalid configuration: null"); + } + + @Test + public void testListGenericTables() { + initializeCatalog(); + + Namespace namespace = Namespace.of("test_ns"); + TableIdentifier table1 = TableIdentifier.of(namespace, "table1"); + TableIdentifier table2 = TableIdentifier.of(namespace, "table2"); + + ListGenericTablesRESTResponse response = + new ListGenericTablesRESTResponse(null, ImmutableSet.of(table1, table2)); + + when(mockClient.get(any(), anyMap(), eq(ListGenericTablesRESTResponse.class), anyMap(), any())) + .thenReturn(response); + + List tables = catalog.listGenericTables(namespace); + + assertThat(tables).hasSize(2); + assertThat(tables).contains(table1, table2); + } + + @Test + public void testListGenericTablesWithPagination() { + ConfigResponse configResponse = + ConfigResponse.builder() + .withDefaults(ImmutableMap.of()) + .withOverrides(ImmutableMap.of()) + .build(); + + when(mockClient.get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any())) + .thenReturn(configResponse); + + Map properties = + ImmutableMap.of( + CatalogProperties.URI, "http://localhost:8181", PolarisRESTCatalog.REST_PAGE_SIZE, "2"); + + catalog.initialize(properties, mockAuthSession); + + Namespace namespace = Namespace.of("test_ns"); + TableIdentifier table1 = TableIdentifier.of(namespace, "table1"); + TableIdentifier table2 = TableIdentifier.of(namespace, "table2"); + TableIdentifier table3 = TableIdentifier.of(namespace, "table3"); + + ListGenericTablesRESTResponse response1 = + new ListGenericTablesRESTResponse("page2", ImmutableSet.of(table1, table2)); + ListGenericTablesRESTResponse response2 = + new ListGenericTablesRESTResponse(null, ImmutableSet.of(table3)); + + when(mockClient.get(any(), anyMap(), eq(ListGenericTablesRESTResponse.class), anyMap(), any())) + .thenReturn(response1, response2); + + List tables = catalog.listGenericTables(namespace); + + assertThat(tables).hasSize(3); + assertThat(tables).contains(table1, table2, table3); + } + + @Test + public void testCreateGenericTable() { + initializeCatalog(); + + TableIdentifier identifier = TableIdentifier.of("test_ns", "test_table"); + GenericTable table = + GenericTable.builder() + .setName("test_table") + .setFormat("delta") + .setBaseLocation("s3://bucket/path") + .setDoc("Test table") + .setProperties(ImmutableMap.of("key", "value")) + .build(); + + LoadGenericTableRESTResponse response = new LoadGenericTableRESTResponse(table); + + when(mockClient.post(any(), any(), eq(LoadGenericTableRESTResponse.class), anyMap(), any())) + .thenReturn(response); + + GenericTable result = + catalog.createGenericTable( + identifier, "delta", "s3://bucket/path", "Test table", ImmutableMap.of("key", "value")); + + assertThat(result.getName()).isEqualTo("test_table"); + assertThat(result.getFormat()).isEqualTo("delta"); + assertThat(result.getBaseLocation()).isEqualTo("s3://bucket/path"); + } + + @Test + public void testLoadGenericTable() { + initializeCatalog(); + + TableIdentifier identifier = TableIdentifier.of("test_ns", "test_table"); + GenericTable table = GenericTable.builder().setName("test_table").setFormat("delta").build(); + + LoadGenericTableRESTResponse response = new LoadGenericTableRESTResponse(table); + + when(mockClient.get(any(), any(), eq(LoadGenericTableRESTResponse.class), anyMap(), any())) + .thenReturn(response); + + GenericTable result = catalog.loadGenericTable(identifier); + + assertThat(result.getName()).isEqualTo("test_table"); + assertThat(result.getFormat()).isEqualTo("delta"); + } + + @Test + public void testDropGenericTableSuccess() { + initializeCatalog(); + + TableIdentifier identifier = TableIdentifier.of("test_ns", "test_table"); + + when(mockClient.delete(any(), any(), anyMap(), any())).thenReturn(null); + + boolean result = catalog.dropGenericTable(identifier); + + assertThat(result).isTrue(); + } + + @Test + public void testDropGenericTableNotFound() { + initializeCatalog(); + + TableIdentifier identifier = TableIdentifier.of("test_ns", "test_table"); + + when(mockClient.delete(any(), any(), anyMap(), any())) + .thenThrow(new NoSuchTableException("Table not found")); + + boolean result = catalog.dropGenericTable(identifier); + + assertThat(result).isFalse(); + } + + @Test + public void testFetchConfigWithWarehouseLocation() { + RESTClient client = mock(RESTClient.class); + Map headers = ImmutableMap.of("Authorization", "Bearer token"); + Map properties = + ImmutableMap.of( + CatalogProperties.URI, + "http://localhost:8181", + CatalogProperties.WAREHOUSE_LOCATION, + "s3://warehouse"); + + ConfigResponse expectedResponse = + ConfigResponse.builder() + .withDefaults(ImmutableMap.of()) + .withOverrides(ImmutableMap.of()) + .build(); + + when(client.get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any())) + .thenReturn(expectedResponse); + + ConfigResponse response = PolarisRESTCatalog.fetchConfig(client, headers, properties); + + assertThat(response).isNotNull(); + + @SuppressWarnings("unchecked") + ArgumentCaptor> queryParamsCaptor = ArgumentCaptor.forClass(Map.class); + verify(client) + .get(any(), queryParamsCaptor.capture(), eq(ConfigResponse.class), anyMap(), any()); + + Map capturedParams = queryParamsCaptor.getValue(); + assertThat(capturedParams) + .containsEntry(CatalogProperties.WAREHOUSE_LOCATION, "s3://warehouse"); + } + + private void initializeCatalog() { + ConfigResponse configResponse = + ConfigResponse.builder() + .withDefaults(ImmutableMap.of()) + .withOverrides(ImmutableMap.of()) + .withEndpoints( + ImmutableList.of( + PolarisEndpoints.V1_LIST_GENERIC_TABLES, + PolarisEndpoints.V1_CREATE_GENERIC_TABLE, + PolarisEndpoints.V1_LOAD_GENERIC_TABLE, + PolarisEndpoints.V1_DELETE_GENERIC_TABLE)) + .build(); + + when(mockClient.get(any(), anyMap(), eq(ConfigResponse.class), anyMap(), any())) + .thenReturn(configResponse); + + Map properties = + ImmutableMap.of(CatalogProperties.URI, "http://localhost:8181"); + + catalog.initialize(properties, mockAuthSession); + } +} diff --git a/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/DeltaHelperTest.java b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/DeltaHelperTest.java new file mode 100644 index 0000000000..d85c7f0df3 --- /dev/null +++ b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/DeltaHelperTest.java @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.spark.utils; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; +import static org.mockito.Mockito.mock; + +import com.google.common.collect.ImmutableMap; +import org.apache.polaris.spark.NoopDeltaCatalog; +import org.apache.polaris.spark.PolarisSparkCatalog; +import org.apache.spark.sql.connector.catalog.DelegatingCatalogExtension; +import org.apache.spark.sql.connector.catalog.TableCatalog; +import org.apache.spark.sql.util.CaseInsensitiveStringMap; +import org.junit.jupiter.api.Test; + +public class DeltaHelperTest { + + @Test + public void testConstructorWithCustomDeltaCatalogImpl() { + CaseInsensitiveStringMap options = + new CaseInsensitiveStringMap( + ImmutableMap.of( + DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); + DeltaHelper helper = new DeltaHelper(options); + + assertThat(helper).isNotNull(); + } + + @Test + public void testLoadDeltaCatalogWithNoopDeltaCatalog() { + CaseInsensitiveStringMap options = + new CaseInsensitiveStringMap( + ImmutableMap.of( + DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); + DeltaHelper helper = new DeltaHelper(options); + PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); + + TableCatalog deltaCatalog = helper.loadDeltaCatalog(polarisSparkCatalog); + + assertThat(deltaCatalog).isNotNull(); + assertThat(deltaCatalog).isInstanceOf(NoopDeltaCatalog.class); + assertThat(deltaCatalog).isInstanceOf(DelegatingCatalogExtension.class); + } + + @Test + public void testLoadDeltaCatalogCachesInstance() { + CaseInsensitiveStringMap options = + new CaseInsensitiveStringMap( + ImmutableMap.of( + DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); + DeltaHelper helper = new DeltaHelper(options); + PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); + + TableCatalog deltaCatalog1 = helper.loadDeltaCatalog(polarisSparkCatalog); + TableCatalog deltaCatalog2 = helper.loadDeltaCatalog(polarisSparkCatalog); + + // Should return the same cached instance + assertThat(deltaCatalog1).isSameAs(deltaCatalog2); + } + + @Test + public void testLoadDeltaCatalogWithNonExistentClass() { + CaseInsensitiveStringMap options = + new CaseInsensitiveStringMap( + ImmutableMap.of( + DeltaHelper.DELTA_CATALOG_IMPL_KEY, "com.example.NonExistentDeltaCatalog")); + DeltaHelper helper = new DeltaHelper(options); + PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); + + assertThatThrownBy(() -> helper.loadDeltaCatalog(polarisSparkCatalog)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("Cannot initialize Delta Catalog") + .hasMessageContaining("com.example.NonExistentDeltaCatalog"); + } + + @Test + public void testLoadDeltaCatalogWithNonTableCatalogClass() { + // Use a class that exists but doesn't implement TableCatalog + CaseInsensitiveStringMap options = + new CaseInsensitiveStringMap( + ImmutableMap.of(DeltaHelper.DELTA_CATALOG_IMPL_KEY, "java.lang.String")); + DeltaHelper helper = new DeltaHelper(options); + PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); + + assertThatThrownBy(() -> helper.loadDeltaCatalog(polarisSparkCatalog)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("Cannot initialize Delta Catalog") + .hasMessageContaining("java.lang.String"); + } + + @Test + public void testLoadDeltaCatalogSetsIsUnityCatalogField() throws Exception { + CaseInsensitiveStringMap options = + new CaseInsensitiveStringMap( + ImmutableMap.of( + DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); + DeltaHelper helper = new DeltaHelper(options); + PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); + + TableCatalog deltaCatalog = helper.loadDeltaCatalog(polarisSparkCatalog); + + // Verify that the isUnityCatalog field is set to true using reflection + java.lang.reflect.Field field = deltaCatalog.getClass().getDeclaredField("isUnityCatalog"); + field.setAccessible(true); + boolean isUnityCatalog = (boolean) field.get(deltaCatalog); + + assertThat(isUnityCatalog).isTrue(); + } + + @Test + public void testLoadDeltaCatalogWithCaseInsensitiveOptions() { + // Test that options are case-insensitive + CaseInsensitiveStringMap options = + new CaseInsensitiveStringMap( + ImmutableMap.of("DELTA-CATALOG-IMPL", "org.apache.polaris.spark.NoopDeltaCatalog")); + DeltaHelper helper = new DeltaHelper(options); + PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); + + TableCatalog deltaCatalog = helper.loadDeltaCatalog(polarisSparkCatalog); + + assertThat(deltaCatalog).isNotNull(); + assertThat(deltaCatalog).isInstanceOf(NoopDeltaCatalog.class); + } + + @Test + public void testMultipleDeltaHelperInstances() { + CaseInsensitiveStringMap options1 = + new CaseInsensitiveStringMap( + ImmutableMap.of( + DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); + CaseInsensitiveStringMap options2 = + new CaseInsensitiveStringMap( + ImmutableMap.of( + DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); + + DeltaHelper helper1 = new DeltaHelper(options1); + DeltaHelper helper2 = new DeltaHelper(options2); + + PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); + + TableCatalog deltaCatalog1 = helper1.loadDeltaCatalog(polarisSparkCatalog); + TableCatalog deltaCatalog2 = helper2.loadDeltaCatalog(polarisSparkCatalog); + + // Different helper instances should create different delta catalog instances + assertThat(deltaCatalog1).isNotSameAs(deltaCatalog2); + } +} diff --git a/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/PolarisCatalogUtilsTest.java b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/PolarisCatalogUtilsTest.java new file mode 100644 index 0000000000..563ee07bfc --- /dev/null +++ b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/PolarisCatalogUtilsTest.java @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.spark.utils; + +import static org.assertj.core.api.Assertions.assertThat; + +import com.google.common.collect.ImmutableMap; +import java.util.HashMap; +import java.util.Map; +import org.apache.spark.sql.connector.catalog.TableCatalog; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.CsvSource; +import org.junit.jupiter.params.provider.ValueSource; + +public class PolarisCatalogUtilsTest { + + @Test + public void testIsTableWithSparkManagedLocationWithNoLocationOrPath() { + Map properties = ImmutableMap.of("key1", "value1", "key2", "value2"); + + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isTrue(); + } + + @Test + public void testIsTableWithSparkManagedLocationWithLocation() { + Map properties = + ImmutableMap.of(TableCatalog.PROP_LOCATION, "s3://bucket/path"); + + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + } + + @Test + public void testIsTableWithSparkManagedLocationWithPath() { + Map properties = + ImmutableMap.of(PolarisCatalogUtils.TABLE_PATH_KEY, "s3://bucket/path"); + + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + } + + @Test + public void testIsTableWithSparkManagedLocationWithBothLocationAndPath() { + Map properties = + ImmutableMap.of( + TableCatalog.PROP_LOCATION, + "s3://bucket/location", + PolarisCatalogUtils.TABLE_PATH_KEY, + "s3://bucket/path"); + + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + } + + @Test + public void testIsTableWithSparkManagedLocationWithEmptyProperties() { + Map properties = ImmutableMap.of(); + + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isTrue(); + } + + @ParameterizedTest + @CsvSource({ + "parquet, false, false", + "csv, false, false", + "orc, false, false", + "json, false, false", + "avro, false, false", + "delta, false, true", + "iceberg, true, false", + "DELTA, false, true", + "ICEBERG, true, false", + "DeLta, false, true", + "IceBerg, true, false" + }) + public void testProviderDetectionForOtherFormats( + String provider, boolean expectedIceberg, boolean expectedDelta) { + assertThat(PolarisCatalogUtils.useIceberg(provider)).isEqualTo(expectedIceberg); + assertThat(PolarisCatalogUtils.useDelta(provider)).isEqualTo(expectedDelta); + } + + @Test + public void testIsTableWithSparkManagedLocationWithMutableMap() { + Map properties = new HashMap<>(); + properties.put("key1", "value1"); + + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isTrue(); + + properties.put(TableCatalog.PROP_LOCATION, "s3://bucket/path"); + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + + properties.remove(TableCatalog.PROP_LOCATION); + properties.put(PolarisCatalogUtils.TABLE_PATH_KEY, "s3://bucket/path"); + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + } + + @Test + public void testIsTableWithSparkManagedLocationWithNullValues() { + Map properties = new HashMap<>(); + properties.put("key1", "value1"); + properties.put(TableCatalog.PROP_LOCATION, null); + + // Even with null value, the key exists, so it should return false + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + } + + @Test + public void testIsTableWithSparkManagedLocationWithEmptyStringValues() { + Map properties = new HashMap<>(); + properties.put("key1", "value1"); + properties.put(TableCatalog.PROP_LOCATION, ""); + + // Even with empty string value, the key exists, so it should return false + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + } + + @ParameterizedTest + @ValueSource( + strings = { + "s3://bucket/path", + "s3a://bucket/path", + "file:///local/path", + "hdfs://namenode/path", + "abfs://container@account.dfs.core.windows.net/path", + "gs://bucket/path" + }) + public void testIsTableWithSparkManagedLocationWithVariousLocationSchemes(String location) { + Map properties = ImmutableMap.of(TableCatalog.PROP_LOCATION, location); + + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + } + + @ParameterizedTest + @ValueSource( + strings = { + "s3://bucket/path", + "s3a://bucket/path", + "file:///local/path", + "hdfs://namenode/path", + "abfs://container@account.dfs.core.windows.net/path", + "gs://bucket/path" + }) + public void testIsTableWithSparkManagedLocationWithVariousPathSchemes(String path) { + Map properties = ImmutableMap.of(PolarisCatalogUtils.TABLE_PATH_KEY, path); + + assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); + } +} diff --git a/site/content/in-dev/unreleased/generic-table.md b/site/content/in-dev/unreleased/generic-table.md index 3c48338d01..46d45fa4a7 100644 --- a/site/content/in-dev/unreleased/generic-table.md +++ b/site/content/in-dev/unreleased/generic-table.md @@ -22,17 +22,19 @@ type: docs weight: 435 --- -The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: +The generic tables framework provides support for non-Iceberg table formats including Delta Lake, CSV, etc. With this framework, you can: - Create a generic table under a namespace - Load a generic table - Drop a generic table - List all generic tables under a namespace -**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. +{{< alert important >}} +Generic tables are in beta. Please use it with caution and report any issue if encountered. +{{< /alert >}} ## What is a Generic Table? -A generic table in Polaris is an entity that defines the following fields: +A generic table is an entity that defines the following fields: - **name** (required): A unique identifier for the table within a namespace - **format** (required): The format for the generic table, i.e. "delta", "csv" @@ -47,8 +49,8 @@ A generic table in Polaris is an entity that defines the following fields: ## Generic Table API Vs. Iceberg Table API -Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on -the Iceberg table entities. +Polaris provides a set of generic table APIs different from the Iceberg APIs. The following table +shows the comparison between the two APIs: | Operations | **Iceberg Table API** | **Generic Table API** | |--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| @@ -57,8 +59,7 @@ the Iceberg table entities. | Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | | List Table | List all Iceberg tables | List all generic tables | -Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since -there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. +Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. ## Working with Generic Table @@ -157,13 +158,10 @@ curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namesp For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). -## Limitations +## Known Limitations -Current limitations of Generic Table support: -1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. -2) No commit coordination or update capability provided at the catalog service level. - -Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. -It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or committing data -should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization -and update all happens at client side. +There are some known limitations for the generic table support: +1. Generic tables provide limited spec information. For example, there is no spec for Schema or Partition. +2. There is no commit coordination provided by Polaris. It is the responsibility of the engine to coordinate commits. +3. There is no update capability provided by Polaris. Any update to a generic table must be done through a drop and create. +4. Generic tables do not support credential vending. diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index c990e565a5..6af32d1d9b 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -22,17 +22,17 @@ type: docs weight: 650 --- -Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out -the [Polaris Catalog OpenAPI Spec]({{% ref "polaris-api-specs/polaris-catalog-api.md" %}}) for Generic Table API specs. +Polaris provides a Spark client to manage non-Iceberg tables through [Generic Tables]({{% ref "generic-table.md" %}}). -Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to -provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. +{{< alert note >}} +The Spark client can manage Iceberg tables and non-Iceberg tables. -Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. +Users who only use Iceberg tables can use Spark without this client. +{{< /alert >}} This page documents how to connect Spark with Polaris Service using the Polaris Spark client. -## Quick Start with Local Polaris service +## Quick Start with Local Polaris Service If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo and follow the instructions in the Spark plugin getting-started [README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). @@ -42,7 +42,7 @@ Check out the Polaris repo: git clone https://github.com/apache/polaris.git ~/polaris ``` -## Start Spark against a deployed Polaris service +## Start Spark against a Deployed Polaris Service Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). Spark 3.5.6 is recommended, and you can follow the instructions below to get a Spark 3.5.6 distribution. ```shell @@ -53,7 +53,7 @@ tar xzvf spark-3.5.6-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 cd spark-3.5 ``` -### Connecting with Spark using the Polaris Spark client +### Connecting with Spark using the Polaris Spark Client The following CLI command can be used to start the Spark with connection to the deployed Polaris service using a released Polaris Spark client. @@ -102,7 +102,7 @@ spark = SparkSession.builder Similar as the CLI command, make sure the corresponding fields are replaced correctly. ### Create tables with Spark -After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: +After Spark is started, you can use it to create and access Iceberg and Delta tables. For example: ```python spark.sql("USE polaris") spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") @@ -120,10 +120,15 @@ build a Spark client jar locally from source. Please check out the Polaris repo [README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. ## Limitations -The Polaris Spark client has the following functionality limitations: -1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` +The following describes the current limitations of the Polaris Spark client: + +### General Limitations +1. The Polaris Spark client only supports Iceberg and Delta Lake tables. It does not support other table formats like CSV, JSON, etc. +2. Generic tables (non-Iceberg tables) do not currently support credential vending. + +### Delta Lake Limitations +1. Create table as select (CTAS) is not supported for Delta Lake tables. As a result, the `saveAsTable` method of `Dataframe` is also not supported, since it relies on the CTAS support. -2) Create a Delta table without explicit location is not supported. -3) Rename a Delta table is not supported. -4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, it is not supported. +2. Create a Delta Lake table without explicit location is not supported. +3. Rename a Delta Lake table is not supported. +4. ALTER TABLE ... SET LOCATION is not supported for DELTA table. From cfb3732d1ba14f47ac43ff0c0b6ed3f148cd1a38 Mon Sep 17 00:00:00 2001 From: Adam Christian Date: Thu, 6 Nov 2025 11:31:45 -0800 Subject: [PATCH 2/3] Update test classes --- .../polaris/spark/utils/DeltaHelperTest.java | 49 ------------- .../spark/utils/PolarisCatalogUtilsTest.java | 69 ------------------- 2 files changed, 118 deletions(-) diff --git a/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/DeltaHelperTest.java b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/DeltaHelperTest.java index d85c7f0df3..15a551a437 100644 --- a/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/DeltaHelperTest.java +++ b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/DeltaHelperTest.java @@ -32,17 +32,6 @@ public class DeltaHelperTest { - @Test - public void testConstructorWithCustomDeltaCatalogImpl() { - CaseInsensitiveStringMap options = - new CaseInsensitiveStringMap( - ImmutableMap.of( - DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); - DeltaHelper helper = new DeltaHelper(options); - - assertThat(helper).isNotNull(); - } - @Test public void testLoadDeltaCatalogWithNoopDeltaCatalog() { CaseInsensitiveStringMap options = @@ -90,21 +79,6 @@ public void testLoadDeltaCatalogWithNonExistentClass() { .hasMessageContaining("com.example.NonExistentDeltaCatalog"); } - @Test - public void testLoadDeltaCatalogWithNonTableCatalogClass() { - // Use a class that exists but doesn't implement TableCatalog - CaseInsensitiveStringMap options = - new CaseInsensitiveStringMap( - ImmutableMap.of(DeltaHelper.DELTA_CATALOG_IMPL_KEY, "java.lang.String")); - DeltaHelper helper = new DeltaHelper(options); - PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); - - assertThatThrownBy(() -> helper.loadDeltaCatalog(polarisSparkCatalog)) - .isInstanceOf(IllegalArgumentException.class) - .hasMessageContaining("Cannot initialize Delta Catalog") - .hasMessageContaining("java.lang.String"); - } - @Test public void testLoadDeltaCatalogSetsIsUnityCatalogField() throws Exception { CaseInsensitiveStringMap options = @@ -138,27 +112,4 @@ public void testLoadDeltaCatalogWithCaseInsensitiveOptions() { assertThat(deltaCatalog).isNotNull(); assertThat(deltaCatalog).isInstanceOf(NoopDeltaCatalog.class); } - - @Test - public void testMultipleDeltaHelperInstances() { - CaseInsensitiveStringMap options1 = - new CaseInsensitiveStringMap( - ImmutableMap.of( - DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); - CaseInsensitiveStringMap options2 = - new CaseInsensitiveStringMap( - ImmutableMap.of( - DeltaHelper.DELTA_CATALOG_IMPL_KEY, "org.apache.polaris.spark.NoopDeltaCatalog")); - - DeltaHelper helper1 = new DeltaHelper(options1); - DeltaHelper helper2 = new DeltaHelper(options2); - - PolarisSparkCatalog polarisSparkCatalog = mock(PolarisSparkCatalog.class); - - TableCatalog deltaCatalog1 = helper1.loadDeltaCatalog(polarisSparkCatalog); - TableCatalog deltaCatalog2 = helper2.loadDeltaCatalog(polarisSparkCatalog); - - // Different helper instances should create different delta catalog instances - assertThat(deltaCatalog1).isNotSameAs(deltaCatalog2); - } } diff --git a/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/PolarisCatalogUtilsTest.java b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/PolarisCatalogUtilsTest.java index 563ee07bfc..cfe3236954 100644 --- a/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/PolarisCatalogUtilsTest.java +++ b/plugins/spark/v3.5/spark/src/test/java/org/apache/polaris/spark/utils/PolarisCatalogUtilsTest.java @@ -21,13 +21,11 @@ import static org.assertj.core.api.Assertions.assertThat; import com.google.common.collect.ImmutableMap; -import java.util.HashMap; import java.util.Map; import org.apache.spark.sql.connector.catalog.TableCatalog; import org.junit.jupiter.api.Test; import org.junit.jupiter.params.ParameterizedTest; import org.junit.jupiter.params.provider.CsvSource; -import org.junit.jupiter.params.provider.ValueSource; public class PolarisCatalogUtilsTest { @@ -92,71 +90,4 @@ public void testProviderDetectionForOtherFormats( assertThat(PolarisCatalogUtils.useIceberg(provider)).isEqualTo(expectedIceberg); assertThat(PolarisCatalogUtils.useDelta(provider)).isEqualTo(expectedDelta); } - - @Test - public void testIsTableWithSparkManagedLocationWithMutableMap() { - Map properties = new HashMap<>(); - properties.put("key1", "value1"); - - assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isTrue(); - - properties.put(TableCatalog.PROP_LOCATION, "s3://bucket/path"); - assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); - - properties.remove(TableCatalog.PROP_LOCATION); - properties.put(PolarisCatalogUtils.TABLE_PATH_KEY, "s3://bucket/path"); - assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); - } - - @Test - public void testIsTableWithSparkManagedLocationWithNullValues() { - Map properties = new HashMap<>(); - properties.put("key1", "value1"); - properties.put(TableCatalog.PROP_LOCATION, null); - - // Even with null value, the key exists, so it should return false - assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); - } - - @Test - public void testIsTableWithSparkManagedLocationWithEmptyStringValues() { - Map properties = new HashMap<>(); - properties.put("key1", "value1"); - properties.put(TableCatalog.PROP_LOCATION, ""); - - // Even with empty string value, the key exists, so it should return false - assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); - } - - @ParameterizedTest - @ValueSource( - strings = { - "s3://bucket/path", - "s3a://bucket/path", - "file:///local/path", - "hdfs://namenode/path", - "abfs://container@account.dfs.core.windows.net/path", - "gs://bucket/path" - }) - public void testIsTableWithSparkManagedLocationWithVariousLocationSchemes(String location) { - Map properties = ImmutableMap.of(TableCatalog.PROP_LOCATION, location); - - assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); - } - - @ParameterizedTest - @ValueSource( - strings = { - "s3://bucket/path", - "s3a://bucket/path", - "file:///local/path", - "hdfs://namenode/path", - "abfs://container@account.dfs.core.windows.net/path", - "gs://bucket/path" - }) - public void testIsTableWithSparkManagedLocationWithVariousPathSchemes(String path) { - Map properties = ImmutableMap.of(PolarisCatalogUtils.TABLE_PATH_KEY, path); - - assertThat(PolarisCatalogUtils.isTableWithSparkManagedLocation(properties)).isFalse(); - } } From 134ce549ef92a1b3f65bdce55de8ae11f6433261 Mon Sep 17 00:00:00 2001 From: Adam Christian Date: Fri, 7 Nov 2025 09:03:25 -0800 Subject: [PATCH 3/3] Update based on PR feedback --- plugins/spark/README.md | 18 +++++++++--------- plugins/spark/v3.5/getting-started/README.md | 4 ++-- .../polaris/spark/utils/DeltaHelper.java | 6 +++--- .../content/in-dev/unreleased/generic-table.md | 4 ++-- .../in-dev/unreleased/polaris-spark-client.md | 12 ++++++------ 5 files changed, 22 insertions(+), 22 deletions(-) diff --git a/plugins/spark/README.md b/plugins/spark/README.md index 03ddf50465..adaccec70c 100644 --- a/plugins/spark/README.md +++ b/plugins/spark/README.md @@ -27,10 +27,10 @@ REST endpoints, and provides implementations for Apache Spark's Right now, the plugin only provides support for Spark 3.5, Scala version 2.12 and 2.13, and depends on iceberg-spark-runtime 1.9.1. -The Polaris Spark client supports catalog management for both Iceberg and Delta Lake tables. It routes all Iceberg table -requests to the Iceberg REST endpoints and routes all Delta Lake table requests to the Generic Table REST endpoints. +The Polaris Spark client supports catalog management for both Iceberg and Delta tables. It routes all Iceberg table +requests to the Iceberg REST endpoints and routes all Delta table requests to the Generic Table REST endpoints. -The Spark Client requires at least delta 3.2.1 to work with Delta Lake tables, which requires at least Apache Spark 3.5.3. +The Spark Client requires at least delta 3.2.1 to work with Delta tables, which requires at least Apache Spark 3.5.3. # Start Spark with local Polaris service using the Polaris Spark plugin The following command starts a Polaris server for local testing, it runs on localhost:8181 with default @@ -116,16 +116,16 @@ bin/spark-shell \ --conf spark.sql.sources.useV1SourceList='' ``` -# Limitations +# Current Limitations The following describes the current limitations of the Polaris Spark client: ## General Limitations -1. The Polaris Spark client only supports Iceberg and Delta Lake tables. It does not support other table formats like CSV, JSON, etc. +1. The Polaris Spark client only supports Iceberg and Delta tables. It does not support other table formats like CSV, JSON, etc. 2. Generic tables (non-Iceberg tables) do not currently support credential vending. -## Delta Lake Limitations -1. Create table as select (CTAS) is not supported for Delta Lake tables. As a result, the `saveAsTable` method of `Dataframe` +## Delta Table Limitations +1. Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` is also not supported, since it relies on the CTAS support. -2. Create a Delta Lake table without explicit location is not supported. -3. Rename a Delta Lake table is not supported. +2. Create a Delta table without explicit location is not supported. +3. Rename a Delta table is not supported. 4. ALTER TABLE ... SET LOCATION is not supported for DELTA table. diff --git a/plugins/spark/v3.5/getting-started/README.md b/plugins/spark/v3.5/getting-started/README.md index ac831a52e6..dc52331653 100644 --- a/plugins/spark/v3.5/getting-started/README.md +++ b/plugins/spark/v3.5/getting-started/README.md @@ -17,12 +17,12 @@ under the License. --> -# Getting Started with Apache Spark and Apache Polaris With Delta Lake and Iceberg +# Getting Started with Apache Spark and Apache Polaris With Delta and Iceberg This getting started guide provides a `docker-compose` file to set up [Apache Spark](https://spark.apache.org/) with Apache Polaris using the new Polaris Spark Client. -The Polaris Spark Client enables manage of both Delta Lake and Iceberg tables using Apache Polaris. +The Polaris Spark Client enables manage of both Delta and Iceberg tables using Apache Polaris. A Jupyter notebook is started to run PySpark, and Polaris Python client is also installed to call Polaris APIs directly through Python Client. diff --git a/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/DeltaHelper.java b/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/DeltaHelper.java index 679d444858..09316ba124 100644 --- a/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/DeltaHelper.java +++ b/plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/DeltaHelper.java @@ -29,12 +29,12 @@ import org.slf4j.LoggerFactory; /** - * Helper class for integrating Delta Lake table functionality with Polaris Spark Catalog. + * Helper class for integrating Delta table functionality with Polaris Spark Catalog. * *

This class is responsible for dynamically loading and configuring a Delta Catalog * implementation to work with Polaris. It sets up the Delta Catalog as a delegating catalog - * extension with Polaris Spark Catalog as the delegate, enabling Delta Lake table operations - * through Polaris. + * extension with Polaris Spark Catalog as the delegate, enabling Delta table operations through + * Polaris. * *

The class uses reflection to configure the Delta Catalog to behave identically to Unity * Catalog, as the current Delta Catalog implementation is hardcoded for Unity Catalog. This is a diff --git a/site/content/in-dev/unreleased/generic-table.md b/site/content/in-dev/unreleased/generic-table.md index 46d45fa4a7..a1437321fe 100644 --- a/site/content/in-dev/unreleased/generic-table.md +++ b/site/content/in-dev/unreleased/generic-table.md @@ -22,7 +22,7 @@ type: docs weight: 435 --- -The generic tables framework provides support for non-Iceberg table formats including Delta Lake, CSV, etc. With this framework, you can: +The generic tables are non-Iceberg tables. Table can be multiple formats including Delta, CSV, etc. With this framework, you can: - Create a generic table under a namespace - Load a generic table - Drop a generic table @@ -162,6 +162,6 @@ For the complete and up-to-date API specification, see the [Catalog API Spec](ht There are some known limitations for the generic table support: 1. Generic tables provide limited spec information. For example, there is no spec for Schema or Partition. -2. There is no commit coordination provided by Polaris. It is the responsibility of the engine to coordinate commits. +2. There is no commit coordination provided by Polaris. It is the responsibility of the engine to coordinate loading and committing data. The catalog is only aware of the generic table fields above. 3. There is no update capability provided by Polaris. Any update to a generic table must be done through a drop and create. 4. Generic tables do not support credential vending. diff --git a/site/content/in-dev/unreleased/polaris-spark-client.md b/site/content/in-dev/unreleased/polaris-spark-client.md index 6af32d1d9b..66974508ad 100644 --- a/site/content/in-dev/unreleased/polaris-spark-client.md +++ b/site/content/in-dev/unreleased/polaris-spark-client.md @@ -119,16 +119,16 @@ If you would like to use a version of the Spark client that is currently not yet build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin [README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. -## Limitations +## Known Limitations The following describes the current limitations of the Polaris Spark client: ### General Limitations -1. The Polaris Spark client only supports Iceberg and Delta Lake tables. It does not support other table formats like CSV, JSON, etc. +1. The Polaris Spark client only supports Iceberg and Delta tables. It does not support other table formats like CSV, JSON, etc. 2. Generic tables (non-Iceberg tables) do not currently support credential vending. -### Delta Lake Limitations -1. Create table as select (CTAS) is not supported for Delta Lake tables. As a result, the `saveAsTable` method of `Dataframe` +### Delta Table Limitations +1. Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` is also not supported, since it relies on the CTAS support. -2. Create a Delta Lake table without explicit location is not supported. -3. Rename a Delta Lake table is not supported. +2. Create a Delta table without explicit location is not supported. +3. Rename a Delta table is not supported. 4. ALTER TABLE ... SET LOCATION is not supported for DELTA table.