Skip to content

Conversation

@srilman
Copy link
Contributor

@srilman srilman commented Mar 24, 2025

Closes #685.

Rationale for this change

Reopens PR #687 that adds a Snowflake Catalog that was closed. I addressed comments and applied some additional changes based on errors found when used with the Bodo data processing library.

One way Snowflake supports Iceberg is via managed tables, where Snowflake has both read and write access to these tables. They are basically regular Snowflake tables with an Iceberg backend. Outside of Snowflake, these tables are read-only. To work with them, we wrap some SQL calls in a Catalog API.

I skipped some of the less-commonly used APIs that can be filled in later.

Are these changes tested?

Tested manually by itself and with the Bodo library on both AWS and Azure. Some of the Azure tests don't current work because Snowflake uses path prefixes like wasb://, wasbs://, etc. Waiting for the other PR for support for PyArrowFileIO w/ Azure.

Also copied the mock tests from the original PR.

Are there any user-facing changes?

Users can read and query Snowflake-managed Iceberg tables, with minimal write operations.

@srilman srilman marked this pull request as ready for review March 24, 2025 01:13
@Fokko
Copy link
Contributor

Fokko commented Apr 4, 2025

@srilman From what I understand from Snowflake, it is also transitioning to the Rest Catalog protocol. I much rather use that since it is properly tested. As you mentioned, there are some unsolved issues with the Snowflake catalog.

@monti-python
Copy link

monti-python commented Aug 15, 2025

Hi @Fokko,

Are you aware of any public announcement from Snowflake signalling they are transitioning to REST catalog for Snowflake-managed tables?

AFAIK this is not in their roadmap.

@monti-python
Copy link

monti-python commented Sep 19, 2025

@iamontheinet @jdanielmyers What's your take on this?

At my org, we'd love to use Snowflake-managed Iceberg tables but the lack of support for them in pyiceberg has been blocking us for months.

Thanks for the help!

@corleyma
Copy link

corleyma commented Sep 19, 2025

@monti-python I think the best way to unblock yourself for Snowflake is to do like this article suggests, and sync the Snowflake-managed Iceberg table with an "Open Catalog" (aka snowflake-managed polaris) and then query using pyiceberg/Spark/etc, since the open catalog/Polaris implements REST catalog spec.

@monti-python
Copy link

monti-python commented Oct 27, 2025

Hi @corleyma,

Thanks for the tip!

Unfortunately, that approach would require configuring the catalog as "external", which only supports reads (not writes) from third-party tools such as PyIceberg, Spark, etc. That limitation makes it an unworkable solution in this case.

Until either:

  • PyIceberg supports the Snowflake-managed catalog, or
  • Snowflake natively supports the REST protocol,

the only way to enable bi-directional read/write access between Python and Snowflake is to create an "internal" catalog in Open Catalog. However, doing so would mean replicating our entire Snowflake RBAC framework in Open Catalog, a tool that doesn’t offer equivalent semantics or authentication/authorization features.

For these reasons, a proper solution here is badly needed.

Reference: https://docs.snowflake.com/en/user-guide/opencatalog/overview#catalog-types

CC @iamontheinet @jdanielmyers @Fokko

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for snowflake catalog in apache iceberg

4 participants