-
Notifications
You must be signed in to change notification settings - Fork 332
Modularize calls to federated catalogs #2301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The conditional runtime dependency could rely on one of the two options:
The current implementation uses option 1. I can change the option or name if necessary. If you have other possible suggestions, please let me know. |
polaris-core/src/main/java/org/apache/polaris/core/catalog/NonRESTCatalogFactory.java
Outdated
Show resolved
Hide resolved
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
eric-maynard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just one concern about how we can unify the approach taken for IRC and non-IRC federation
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this PR introduces a breaking change that removes functionality.
runtime/service/build.gradle.kts
Outdated
| implementation(project(":polaris-api-management-service")) | ||
| implementation(project(":polaris-api-iceberg-service")) | ||
| implementation(project(":polaris-api-catalog-service")) | ||
| if ((project.findProperty("NonRESTCatalogs") as String?)?.contains("HADOOP") == true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change to Polaris that effectively removes functionality in the Polaris server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the functionality is still there, but the way you need to build Polaris to use that functionality does change (across versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, shouldn't this block be declared rather in runtime/server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding runtime/server module: We reference the ExternalCatalogFactory in the IcebergCatalogHandler which is in runtime/service module. So isn't this the correct location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But afaict ExternalCatalogFactory is an interface in polaris-core. We don't reference HadoopFederatedCatalogFactory directly in code.
In runtime/server we already have other similar runtimeOnly declarations:
runtimeOnly(project(":polaris-eclipselink"))
runtimeOnly("org.postgresql:postgresql")
runtimeOnly(project(":polaris-relational-jdbc"))
runtimeOnly("io.quarkus:quarkus-jdbc-postgresql")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that until c73966c
I can do either, from the Polaris OSS sync last week (discussion about regarding hive federation), my takeaway was that we wanted to avoid having the default Polaris JAR depend on anything hadoop.
However, if you'd prefer compiling it each time (and only loading if necessary), I can revert that change. I will send out a separate PR without dynamic compilation and update a README.md for this PR. Please pick the option that's best suited according to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding something to the README/getting-started or similar to better make it an easy copy/paste command to decide whether to compile with or without the extended dependencies.
My understanding is indeed that this check was directly to address the concern others had about having Hadoop (or Hive in the future) compile-time dependencies be always present for all Polaris builds.
Personally I don't feel too strongly either way, so I'm okay with or without the additional compilation property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, per offline discussion, +1 to what @adutra said about putting the Hadoop extension into runtime/server if it works correctly for Quarkus finding it in the runtime assembly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
| java { toolchain { languageVersion.set(JavaLanguageVersion.of(21)) } } | ||
|
|
||
| tasks.withType<JavaCompile> { options.encoding = "UTF-8" } | ||
|
|
||
| tasks.test { useJUnitPlatform() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines shouldn't be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
runtime/service/build.gradle.kts
Outdated
| implementation(project(":polaris-api-management-service")) | ||
| implementation(project(":polaris-api-iceberg-service")) | ||
| implementation(project(":polaris-api-catalog-service")) | ||
| if ((project.findProperty("NonRESTCatalogs") as String?)?.contains("HADOOP") == true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, shouldn't this block be declared rather in runtime/server?
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
runtime/service/build.gradle.kts
Outdated
| implementation(project(":polaris-api-management-service")) | ||
| implementation(project(":polaris-api-iceberg-service")) | ||
| implementation(project(":polaris-api-catalog-service")) | ||
| if ((project.findProperty("NonRESTCatalogs") as String?)?.contains("HADOOP") == true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding something to the README/getting-started or similar to better make it an easy copy/paste command to decide whether to compile with or without the extended dependencies.
My understanding is indeed that this check was directly to address the concern others had about having Hadoop (or Hive in the future) compile-time dependencies be always present for all Polaris builds.
Personally I don't feel too strongly either way, so I'm okay with or without the additional compilation property.
| Instance<ExternalCatalogFactory> externalCatalogFactory = | ||
| externalCatalogFactories.select( | ||
| Identifier.Literal.of(connectionType.getFactoryIdentifier())); | ||
| if (!externalCatalogFactory.isUnsatisfied()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I like to avoid double-negatives where possible, so could be worth considering swapping the if/else here to get rid of the double-negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I've used isResolvable() instead.
7c7e257
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my point that this is a breaking change as it removes functionality from Apache Polaris still stands.
Nothing ever sets this Gradle property.
Projects MUST direct outsiders towards official releases rather than raw source repositories, nightly builds, snapshots, release candidates, or any other similar packages.. Asking users to re-build Polaris from source means that that have to be active developers and are "... aware of the conditions placed on unreleased materials."
Yes, "nothing Hadoop" should end up in Polaris, and that should be the eventual goal.
But removing existing user facing functionality deserves a broader audience, deprecation and eventual removal.
|
@snazy I've sent out an alternate in #2332 which does not remove the functionality by default but rather creates a separate module that can be removed at a later point. If you oppose removing the functionality, then simply modularizing it makes no user visible changes. We need one of these two options to move forward with Hive federation (which I've proposed over a month ago and we agreed upon during the last community sync). Hence I request you to pick one of the two options so we can make progress on supporting various federation options for our users. |
Based on the discussion in the community sync, this PR modularizes federation calls to a separate factory. The specific instances of the NonRESTCatalogFactory is loaded only when a non-REST catalog is initialized and accessed. Beyond the catalog initialization, all the calls are identical for internal and federated catalogs.
After this change, Polaris needs to be compiled with the
NonRESTCatalogsproperty set to the list of catalogs for which are implemented/supported. Unless a catalog type is a part of the flag, Polaris will not load its implementation at runtime.Testing:
Applied the regtest in PR #2286.