Skip to content

LoadGenericTable table properties not being set correctly when using Polaris Spark Client #1785

@rahil-c

Description

@rahil-c

Describe the bug

Issue Summary

Currently when trying out the spark polaris plugin for creating generic tables with DELTA format running into the following issue below.

Followed the docs here: https://polaris.apache.org/in-dev/unreleased/polaris-spark-client/ however noticed that there is an issue with the artifact missing org.apache.polaris:polaris-iceberg-1.8.1-spark-runtime-3.5_2.12:1.0.0

So had to build the latest polaris spark client jar. Also note that I am using S3 as my storage type and creating a Delta Table, so included some additional jars such as (org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.671). See repro steps below.

To Reproduce

spark-sql \
  --jars /Users/rahil/workplace/polaris/plugins/spark/v3.5/build/2.12/libs/polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.671 \
  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
  --conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \
  --conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \
  --conf spark.sql.catalog.quickstart_catalog=org.apache.polaris.spark.SparkCatalog \
  --conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \
  --conf spark.sql.catalog.quickstart_catalog.credential=${USER_CLIENT_ID}:${USER_CLIENT_SECRET} \
  --conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \
  --conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \
  --conf spark.sql.catalog.quickstart_catalog.client.region=us-east-1

use quickstart_catalog;
CREATE NAMESPACE IF NOT EXISTS quickstart_namespace;
USE NAMESPACE quickstart_namespace;
CREATE TABLE IF NOT EXISTS people2 (
    id int, name string)
USING delta LOCATION 's3a://polaris-onehouse-bucket/people2' TBLPROPERTIES  (
  'enabled-read-table-formats' = 'ICEBERG');

Actual Behavior

When debugging the loadGenericTable java code path I saw that the additional property I had added 'enabled-read-table-formats' = 'ICEBERG' was not being set at the time of the response (Note this property is for a feature im working in Polaris for doing table conversion)

Image

Expected Behavior

Ideally the delta table properties should appear in both the time of createGenericTable and loadGenericTable when using spark polaris client.

Whats interesting is that when i run a DESCRIBE TABLE EXTENDED I can see the table properties present at the end but I think this is likely going thru some spark catalog code path to return these table properties.

# Detailed Table Information
Name                	delta.`s3a://polaris-onehouse-bucket/people2`
Type                	MANAGED
Location            	s3a://polaris-onehouse-bucket/people2
Provider            	delta
Table Properties    	[delta.minReaderVersion=1,delta.minWriterVersion=2,enabled-read-table-formats=ICEBERG]
Time taken: 5.166 seconds, Fetched 9 row(s)

cc @gh-yzou @eric-maynard

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions