Skip to content

Conversation

@adam-christian-software
Copy link
Contributor

@adam-christian-software adam-christian-software commented Nov 6, 2025

Context

As I was investigating how to support Parquet files using Generic Tables, I performed a few miscellaneous clean-ups:

  1. I did some minor README cleanup on the Spark Client. This is to streamline the reading and to incorporate the credential vending item as a known limitation.
  2. I added a JavaDoc to DeltaHelper. This was to add clarity on what the class was doing.
  3. I added some more test coverage of the Spark Client through PolarisRESTCatalogTest, DeltaHelperTest, and PolarisCatalogUtilsTest. This helped increase the code coverage in the plugin.
  4. I updated the user-facing docs for generic-table.md and polaris-spark-client.md. This documents the credential vending limitation and streamlines the content.

Checklist

  • [X ] 🛡️ Don't disclose security issues! (contact [email protected])
  • [X ] 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • [Not Needed] 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Docs Pictures

image image image image image image

@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Nov 6, 2025
@adam-christian-software adam-christian-software changed the title test: Add some Spark client tests test: Add Some Spark Client Tests and Update Documentation on Generic Tables Nov 6, 2025
@adam-christian-software
Copy link
Contributor Author

@gh-yzou & @flyrain - Here's the docs update for limitation around credential vending: https://apache-polaris.slack.com/archives/C084QSKD6S2/p1762453063740029?thread_ts=1762203273.837449&cid=C084QSKD6S2

I know that @gh-yzou did a good job at answering Abed's question, so potentially, we can build on this with Yun's answers there.

--conf spark.sql.sources.useV1SourceList=''
```

# Limitations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about update this title to # Current Limitations, because all those limitations can be eventually removed, just require extra work

2. Generic tables (non-Iceberg tables) do not currently support credential vending.

## Delta Lake Limitations
1. Create table as select (CTAS) is not supported for Delta Lake tables. As a result, the `saveAsTable` method of `Dataframe`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we updated Delta tables to Delta Lake Tables. I think Delta is the actual table metadata format, Delta lake seems more indicating the storage layer or system, and within delta lake, the table is stored in delta format. So from table format's point of view, I think Delta table is a more accurate term.

---

The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities:
The generic tables framework provides support for non-Iceberg table formats including Delta Lake, CSV, etc. With this framework, you can:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think generic table is a framework, it is a catalog concept, can we keep what we originally have?

2) No commit coordination or update capability provided at the catalog service level.

Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata.
It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or committing data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part explains the current contract between the server and client, can we add this part back?

The Spark client can manage Iceberg tables and non-Iceberg tables.

Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta.
Users who only use Iceberg tables can use Spark without this client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> For users who only need to interact with iceberg tables is not strictly required to use Polaris Spark Client. Regular Iceberg provided Spark Client should continue work.

@gh-yzou gh-yzou requested a review from flyrain November 7, 2025 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants