Support retrieving the latest Iceberg table on table scan #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes are included in this PR?
Makes the
IcebergTableProvider::try_newmethod public that takes anArc<dyn Catalog>and aTableIdent. It uses that to get the current table metadata when the DataFusion TableProvider is created - but it also stores a reference to theArc<dyn Catalog>. When the DataFusion TableProvider is asked to scan the table, it uses the catalog to fetch the latest table metadata.This allows the TableProvider to get the latest changes to the Iceberg table, as opposed to being stuck on the snapshot when the table was created. This aligns closer to the expectation of using DataFusion TableProviders, where the scan is expected to scan the latest data.
Are these changes tested?
Covered by the existing integration tests at
crates/integrations/datafusion/tests/integration_datafusion_test.rs