-
Notifications
You must be signed in to change notification settings - Fork 393
Support Snowflake-Managed Iceberg Tables via SnowflakeCatalog #1834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@srilman From what I understand from Snowflake, it is also transitioning to the Rest Catalog protocol. I much rather use that since it is properly tested. As you mentioned, there are some unsolved issues with the Snowflake catalog. |
|
Hi @Fokko, Are you aware of any public announcement from Snowflake signalling they are transitioning to REST catalog for Snowflake-managed tables? AFAIK this is not in their roadmap. |
|
@iamontheinet @jdanielmyers What's your take on this? At my org, we'd love to use Snowflake-managed Iceberg tables but the lack of support for them in pyiceberg has been blocking us for months. Thanks for the help! |
|
@monti-python I think the best way to unblock yourself for Snowflake is to do like this article suggests, and sync the Snowflake-managed Iceberg table with an "Open Catalog" (aka snowflake-managed polaris) and then query using pyiceberg/Spark/etc, since the open catalog/Polaris implements REST catalog spec. |
|
Hi @corleyma, Thanks for the tip! Unfortunately, that approach would require configuring the catalog as "external", which only supports reads (not writes) from third-party tools such as PyIceberg, Spark, etc. That limitation makes it an unworkable solution in this case. Until either:
the only way to enable bi-directional read/write access between Python and Snowflake is to create an "internal" catalog in Open Catalog. However, doing so would mean replicating our entire Snowflake RBAC framework in Open Catalog, a tool that doesn’t offer equivalent semantics or authentication/authorization features. For these reasons, a proper solution here is badly needed. Reference: https://docs.snowflake.com/en/user-guide/opencatalog/overview#catalog-types |
Closes #685.
Rationale for this change
Reopens PR #687 that adds a Snowflake Catalog that was closed. I addressed comments and applied some additional changes based on errors found when used with the Bodo data processing library.
One way Snowflake supports Iceberg is via managed tables, where Snowflake has both read and write access to these tables. They are basically regular Snowflake tables with an Iceberg backend. Outside of Snowflake, these tables are read-only. To work with them, we wrap some SQL calls in a Catalog API.
I skipped some of the less-commonly used APIs that can be filled in later.
Are these changes tested?
Tested manually by itself and with the Bodo library on both AWS and Azure. Some of the Azure tests don't current work because Snowflake uses path prefixes like
wasb://,wasbs://, etc. Waiting for the other PR for support for PyArrowFileIO w/ Azure.Also copied the mock tests from the original PR.
Are there any user-facing changes?
Users can read and query Snowflake-managed Iceberg tables, with minimal write operations.