-
Notifications
You must be signed in to change notification settings - Fork 330
Support HMS Federation #2355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support HMS Federation #2355
Conversation
efa4a18 to
bffb367
Compare
eric-maynard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a regression test like the one in #2286?
|
Yes, I have a test that I used locally. Unfortunately, I just can't add it to the test until we get this PR baked into the Docker image. :( |
|
Here is a jupyter notebook I've used in the past to demo the feature. (in lieu of the regtests that I will add after this PR is merged). |
dennishuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though is this the case where we wanted compile-time flags to allow the Polaris user to choose to exclude the whole org.apache.hive:hive-metastore dependency, even if we chose not to use a compile-time flag for HadoopCatalog due to that being a breaking change?
| String warehouse = ((HiveConnectionConfigInfoDpo) connectionConfigInfoDpo).getWarehouse(); | ||
| // Unlike Hadoop, HiveCatalog does not require us to create a Configuration object, the iceberg | ||
| // rest library find the default configuration by reading hive-site.xml in the classpath | ||
| // (including HADOOP_CONF_DIR classpath). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be worth adding a TODO here to better qualify the assertion about not requiring Configuration and requiring hive-site.xml (and so we don't forget), that if we do want Hadoop Configuration to come from runtime Polaris properties instead, that we need to call hiveCatalog.setConf before the call to hiveCatalog.initialize and our constructed Configuration should begin with new Configuration(false) to avoid loading the default hive-site.xml?
AFAICT doing those two actions should be sufficient in potential multi-catalog environments to at least prevent basic conf-leakage between HiveCatalog instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
99d0867
In #2369 Iceberg table federation was refactored around the new `IcebergRESTExternalCatalogFactory` type based on discussion in the community sync. This has unblocked the ability to federate to more non-Iceberg catalogs, such as in #2355. This PR refactors generic table federation to go through the same mechanism. After this, we can go through and implement generic table federation for the existing `IcebergRESTExternalCatalogFactory` implementations.
In apache#2369 Iceberg table federation was refactored around the new `IcebergRESTExternalCatalogFactory` type based on discussion in the community sync. This has unblocked the ability to federate to more non-Iceberg catalogs, such as in apache#2355. This PR refactors generic table federation to go through the same mechanism. After this, we can go through and implement generic table federation for the existing `IcebergRESTExternalCatalogFactory` implementations.
Supports federating to HiveCatalog using the Iceberg REST library.
All hive dependencies are added in an independent module, i.e.,
polaris-extensions-federation-hiveand can be removed/converted to a compile time flag if necessary.Similar to HadoopCatalog, HMS federation support is currently restricted to
IMPLICITauth. The underlying authentication can be any form that Hive supports, however Polaris will not store and manage any of these credentials. Again, similar to HadoopCatalog, this version supports federating to a single Hive instance.This PR relies on Polaris discovering the
hive-site.xmlfile to get the configuration options from the classpath (includingHADOOP_CONF_DIR).The spec change has been discussed in the dev mailing list, followed by a discussion in the Polaris community sync on Aug 7, 2025.
Testing:
Modified the regression test to locally test that Hive federation works as expected. The next step would be to add a regression test once the change is baked into the Polaris docker image (for CI builds).
This PR primarily builds on #1305 and #1466. Thank you @dennishuo and @eric-maynard for helping out with this!