Skip to content

Conversation

@poojanilangekar
Copy link
Contributor

@poojanilangekar poojanilangekar commented Aug 7, 2025

Based on the discussion in the community sync, this PR modularizes federation calls to a separate factory. The specific instances of the NonRESTCatalogFactory is loaded only when a non-REST catalog is initialized and accessed. Beyond the catalog initialization, all the calls are identical for internal and federated catalogs.

After this change, Polaris needs to be compiled with the NonRESTCatalogs property set to the list of catalogs for which are implemented/supported. Unless a catalog type is a part of the flag, Polaris will not load its implementation at runtime.

Testing:
Applied the regtest in PR #2286.

@poojanilangekar
Copy link
Contributor Author

CC @eric-maynard @dennishuo

@poojanilangekar
Copy link
Contributor Author

The conditional runtime dependency could rely on one of the two options:

  1. Based on the NonRESTCatalogs property (which an additional property to be set in gradle.properties or as a JVM argument).
  2. Reuse the existing polaris.features.SUPPORTED_EXTERNAL_CATALOG_AUTHENTICATION_TYPES but require the user to declare it in gradle.properties instead of application.properties.

The current implementation uses option 1.

I can change the option or name if necessary. If you have other possible suggestions, please let me know.

Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just one concern about how we can unify the approach taken for IRC and non-IRC federation

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this PR introduces a breaking change that removes functionality.

implementation(project(":polaris-api-management-service"))
implementation(project(":polaris-api-iceberg-service"))
implementation(project(":polaris-api-catalog-service"))
if ((project.findProperty("NonRESTCatalogs") as String?)?.contains("HADOOP") == true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change to Polaris that effectively removes functionality in the Polaris server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the functionality is still there, but the way you need to build Polaris to use that functionality does change (across versions).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, shouldn't this block be declared rather in runtime/server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding runtime/server module: We reference the ExternalCatalogFactory in the IcebergCatalogHandler which is in runtime/service module. So isn't this the correct location?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But afaict ExternalCatalogFactory is an interface in polaris-core. We don't reference HadoopFederatedCatalogFactory directly in code.

In runtime/server we already have other similar runtimeOnly declarations:

  runtimeOnly(project(":polaris-eclipselink"))
  runtimeOnly("org.postgresql:postgresql")
  runtimeOnly(project(":polaris-relational-jdbc"))
  runtimeOnly("io.quarkus:quarkus-jdbc-postgresql")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that until c73966c

I can do either, from the Polaris OSS sync last week (discussion about regarding hive federation), my takeaway was that we wanted to avoid having the default Polaris JAR depend on anything hadoop.

However, if you'd prefer compiling it each time (and only loading if necessary), I can revert that change. I will send out a separate PR without dynamic compilation and update a README.md for this PR. Please pick the option that's best suited according to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to adding something to the README/getting-started or similar to better make it an easy copy/paste command to decide whether to compile with or without the extended dependencies.

My understanding is indeed that this check was directly to address the concern others had about having Hadoop (or Hive in the future) compile-time dependencies be always present for all Polaris builds.

Personally I don't feel too strongly either way, so I'm okay with or without the additional compilation property.

Copy link
Contributor

@dennishuo dennishuo Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, per offline discussion, +1 to what @adutra said about putting the Hadoop extension into runtime/server if it works correctly for Quarkus finding it in the runtime assembly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 59 to 63
java { toolchain { languageVersion.set(JavaLanguageVersion.of(21)) } }

tasks.withType<JavaCompile> { options.encoding = "UTF-8" }

tasks.test { useJUnitPlatform() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines shouldn't be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

implementation(project(":polaris-api-management-service"))
implementation(project(":polaris-api-iceberg-service"))
implementation(project(":polaris-api-catalog-service"))
if ((project.findProperty("NonRESTCatalogs") as String?)?.contains("HADOOP") == true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, shouldn't this block be declared rather in runtime/server?

eric-maynard
eric-maynard previously approved these changes Aug 12, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Aug 12, 2025
dennishuo
dennishuo previously approved these changes Aug 12, 2025
implementation(project(":polaris-api-management-service"))
implementation(project(":polaris-api-iceberg-service"))
implementation(project(":polaris-api-catalog-service"))
if ((project.findProperty("NonRESTCatalogs") as String?)?.contains("HADOOP") == true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to adding something to the README/getting-started or similar to better make it an easy copy/paste command to decide whether to compile with or without the extended dependencies.

My understanding is indeed that this check was directly to address the concern others had about having Hadoop (or Hive in the future) compile-time dependencies be always present for all Polaris builds.

Personally I don't feel too strongly either way, so I'm okay with or without the additional compilation property.

Instance<ExternalCatalogFactory> externalCatalogFactory =
externalCatalogFactories.select(
Identifier.Literal.of(connectionType.getFactoryIdentifier()));
if (!externalCatalogFactory.isUnsatisfied()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I like to avoid double-negatives where possible, so could be worth considering swapping the if/else here to get rid of the double-negative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I've used isResolvable() instead.

@poojanilangekar poojanilangekar dismissed stale reviews from dennishuo and eric-maynard via 7c7e257 August 12, 2025 21:05
dennishuo
dennishuo previously approved these changes Aug 12, 2025
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my point that this is a breaking change as it removes functionality from Apache Polaris still stands.

Nothing ever sets this Gradle property.

Projects MUST direct outsiders towards official releases rather than raw source repositories, nightly builds, snapshots, release candidates, or any other similar packages.. Asking users to re-build Polaris from source means that that have to be active developers and are "... aware of the conditions placed on unreleased materials."

Yes, "nothing Hadoop" should end up in Polaris, and that should be the eventual goal.
But removing existing user facing functionality deserves a broader audience, deprecation and eventual removal.

@github-project-automation github-project-automation bot moved this from Ready to merge to PRs In Progress in Basic Kanban Board Aug 13, 2025
@poojanilangekar
Copy link
Contributor Author

@snazy I've sent out an alternate in #2332 which does not remove the functionality by default but rather creates a separate module that can be removed at a later point. If you oppose removing the functionality, then simply modularizing it makes no user visible changes. We need one of these two options to move forward with Hive federation (which I've proposed over a month ago and we agreed upon during the last community sync). Hence I request you to pick one of the two options so we can make progress on supporting various federation options for our users.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants