-
Notifications
You must be signed in to change notification settings - Fork 332
NoSQL persistence: add Java/Vert.X executor abstraction layer #2527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Why do we want an abstraction layer here and why do we need two implementations of that interface? I'm not sure where this fit's into the whole picture of the NoSQL persistence effort. |
There are a couple of task submissions required by the NoSQL work and also the async+reliable tasks work. Within Quarkus it's better to leverage the already present Vert.X timer infrastructure, based on Netty. For testing purposes, there's no Quarkus, so another solution (based on Java's scheduled executor) is needed. |
|
I'm not sure I follow, why can't we use Vert.x outside of Quarkus? Doesn't it exist as a stand alone library? |
|
Vert.X is a whole library, not just a thread-pool. |
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: Vert.x - it is currently used inside Quarkus. Yet, Quarkus is our standard runtime env. in this project, so I think it is fine for now. If there is a way (or need) to use Vert.x with this executor tool outside of Quarkus, let's do that in a separate follow-up PR.
My question here is whether we need this PR at all, I'm not sure why we would want an abstraction layer over something which is a required component. I'm not sure having the flexibility here is worth the complexity. This PR is just a building block for something else so I think it's worth considering whether the PR is actually required or whether we can build everything without it. In this case I'm not sure why we wouldn't just have everything using Vert.X and not use an abstraction at all? |
PR #1189 uses the java impl. in some test contexts where a full Quarkus server is not available. We probably could reduce all usage to just the Vert.x implementation, but that would cause (the heavy) Quarkus dependencies to propagate to modules where they would really be excessive. |
But shouldn't that PR just use the Vert.X implementation in those tests as a library? It doesn't make sense to me that we would add an abstraction on top of a critical library (which effects all of the code written in the project using the library) just so we can avoid using that library during testing. I may not understand the problems here but I feel like there must be a way to run the tests with Vert.X but without Quarkus? |
I do not know such a way off the top of my head 😅 ... that is without some extra scaffolding, which would then make the scaffolded runtime env. pretty much equivalent to the "plain java" implementation in terms of code correctness validation "power", but the plain impl. is easier, IMHO :) Edit: running Vert.x in a normal way (e.g. in unit tests), still pulls in the whole framework, including Netty, so it is a lot heavier than the java impl. |
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.polaris.async; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is only used within MangoDB persistence. I'd suggest a different package name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package names are meant to represent what the code is the package "is" or "does", not where it is used 🙂 The current package rename reflects the purpose of the contained code well, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, but I think usage context matters too. If a package name only describes what the code “is/does,” it could become too generic. Tying the name to where and how it’s used helps maintain clarity and avoids misplacement over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we had the package name org.apache.polaris.nosql.async and one week from now reuse it for handling events? Would that package name work in the events context?
The code is already under the nosql directory name (as requested in other discussions). That provides isolation for current use cases. However, if it later becomes used in another context, it will be possible to relocate the module to another directory (for visibility / sharing) without affecting existing code with package renames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting consideration. Generally I'd agree that abstractly useful packages shouldn't bake assumed use sites into its package name. On the other hand, the reusability of packages should be a somewhat intentional choice, and a lot of the injection/application-scoped patterns in these impls make it hard to reason about what it should be used for or what it's safe to be used for.
For example, naively just diving in without context, as a developer trying to assess what this package does and if it would work for some new feature I'm adding elsewhere in Polaris, I see JavaPoolAsyncExec being annotated for injection as ApplicationScoped and then there's
private static final AtomicInteger POOL_ID = new AtomicInteger();
private final int poolId = POOL_ID.incrementAndGet();
It's unclear whether I'm supposed to use some process-wide shared pool or spin up a new pool for the task type, etc.
Also, as the scope implies broad usage for any "async" things, we'd want to more clearly document the semantics of the various wrapper idioms and how they relate to either Vertx or to Java core concurrency libraries. For example, naively jumping in it's hard to understand how to wield the interface, and why there's a Cancelable interface defined here in Polaris and how that relates to cancelling a Future. My guess would've been if it's trying to be a common interface between two async exec frameworks that don't consistently use Futures, but then Cancelable has a completionStage method and CompletionStage has toCompletableFuture(), so it's unclear what that means. And both actual AsyncExec impls in here actually to just return their instances of CancelableFuture which is a private final class CancelableFuture<R> implements Cancelable<R>, Runnable defined in AsyncExec.
So the caller will get something called CancelableFuture but the public interface only gives you a Cancelable, and a CancelableFuture doesn't even implement Future, but apparently it contains a Future which has a cancel method. And this AsyncExec interface is basically like ScheduledExecutorService, except it returns these Cancelable instead of ScheduledFuture.
If we want to make this a more broadly used library within Polaris we can do it but it'll just need a lot more scrutiny, compared to chalking it up to idiosyncracies of usage in a particular submodule.
For me, maybe it's just that I'm not as familiar with Vertx. Is this Cancelable stuff a Vertx idiom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You bring good points / questions, @dennishuo . I think it's certainly possible to get a more thorough discussion about them... However, I wonder if it might be preferable to do that on the dev ML.
From the practical point of view the Polaris codebase is still in rapid development are we're adding a lot of code in other areas too. Polishing everything to perfection is probably going to slow down progress to crawl 😅
So, to allow the bigger NoSQL feature work to progress, would you be ok with renaming the package to something NoSQL-specific for now so that prospective users are less likely to assume it is "trivial" to use in other contexts?
My suggestion: org.apache.polaris.nodes.async (since it is currently used only within the node management code in #1189).
I think adding some more javadoc in this PR would be reasonable, but I'm not sure adding exhaustive javadoc would be practical.
@dennishuo @flyrain WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package name is org.apache.polaris.nosql.async now... Does it work from you POV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Narrowing the package name to org.apache.polaris.nosql.async for now works for me. If we ever want to open it up to broader use we should be ready to discuss and change the Cancelable and AsyncExec interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Thanks for the review @dennishuo !
|
Re: Abstraction: The "api" module defines the functionality expected by other modules for executing tasks inside the JVM. It is not quite the same as the usual java "executor" / "scheduler" interfaces, but defines something more specific and smaller in terms of API surface. The Vert.x implementation is provided to take advantage of the runtime efficiencies within Quarkus. The "java" implementation is provided for use in non-Quarkus environments (e.g. tests). It is lighter than Vert.x (in runtime). Another point for having the java impl. is to validate the Vert.x impl. by having and alternative implementation (kind of like solving a math problem in two ways for extra certainty). The java impl. can also be used as a fallback, should unexpected issues come up with the Vert.x impl. (during dependency upgrades, for example). |
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.polaris.async; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting consideration. Generally I'd agree that abstractly useful packages shouldn't bake assumed use sites into its package name. On the other hand, the reusability of packages should be a somewhat intentional choice, and a lot of the injection/application-scoped patterns in these impls make it hard to reason about what it should be used for or what it's safe to be used for.
For example, naively just diving in without context, as a developer trying to assess what this package does and if it would work for some new feature I'm adding elsewhere in Polaris, I see JavaPoolAsyncExec being annotated for injection as ApplicationScoped and then there's
private static final AtomicInteger POOL_ID = new AtomicInteger();
private final int poolId = POOL_ID.incrementAndGet();
It's unclear whether I'm supposed to use some process-wide shared pool or spin up a new pool for the task type, etc.
Also, as the scope implies broad usage for any "async" things, we'd want to more clearly document the semantics of the various wrapper idioms and how they relate to either Vertx or to Java core concurrency libraries. For example, naively jumping in it's hard to understand how to wield the interface, and why there's a Cancelable interface defined here in Polaris and how that relates to cancelling a Future. My guess would've been if it's trying to be a common interface between two async exec frameworks that don't consistently use Futures, but then Cancelable has a completionStage method and CompletionStage has toCompletableFuture(), so it's unclear what that means. And both actual AsyncExec impls in here actually to just return their instances of CancelableFuture which is a private final class CancelableFuture<R> implements Cancelable<R>, Runnable defined in AsyncExec.
So the caller will get something called CancelableFuture but the public interface only gives you a Cancelable, and a CancelableFuture doesn't even implement Future, but apparently it contains a Future which has a cancel method. And this AsyncExec interface is basically like ScheduledExecutorService, except it returns these Cancelable instead of ScheduledFuture.
If we want to make this a more broadly used library within Polaris we can do it but it'll just need a lot more scrutiny, compared to chalking it up to idiosyncracies of usage in a particular submodule.
For me, maybe it's just that I'm not as familiar with Vertx. Is this Cancelable stuff a Vertx idiom?
| /** | ||
| * Abstraction for platform/environment-specific scheduler implementations. | ||
| * | ||
| * <p>Quarkus production systems use Vert.x, tests usually use Java executors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if not using Java executors, can't we still use Java interfaces like ScheduledExecutorService? Is Vertx fundamentally incompatible with ScheduledFutures that are returned in ScheduledExecutorService?
|
|
||
| import java.util.concurrent.CompletionStage; | ||
|
|
||
| public interface Cancelable<R> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some javadoc comments explaining the difference between this and Future and why we want to partially hide the Futures returned by executors under this interface?
Can callers still get a future via completionStage().toCompletableFuture()? Why is it a "has a future" pattern instead of an "is a future" pattern?
| import java.util.concurrent.atomic.AtomicInteger; | ||
|
|
||
| @ApplicationScoped | ||
| class AppScopedChecker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a javadoc for how this is supposed to be used in tests. Does it make sense to use this across different usage domains? Like if we had some persistence and some events-handling features all using this package, and an integration test exercises both, is there a semantic to sharing a global singleton AtomicInteger between them via injection?
Is the CDI usage important here compared to just declaring a static final AtomicInteger within the test class itself?
dennishuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the move to the more contained package, and the expanded javadoc comments, I'm okay going ahead with this. If we want to start using this outside of the nosql-specific code I'd want to dig a bit more into the necessity of the Cancelable and AsyncExec interfaces, in contrast to structuring any wrappers as adapters over core Java interfaces instead.
|
Thanks all, merging. |
* (Based on PR#2223)Support Namespace/Table level RBAC for external passthrough catalogs (apache#2673) Creates missing synthetic entities for securables in external passthrough catalogs. Based on Option 1 discussed in the RBAC section of catalog federation design doc. In the future, we could remove calls to PolarisEntity.Builder() and replace them with entities fetched from the remote catalog. (enabling Option 2). --------- Co-authored-by: Pooja Nilangekar <[email protected]> * Docs: Add more details about v1 schema user to upgrade from 1.0 to 1.1 (apache#2674) * Site: The link https://iceberg.apache.org/concepts/catalog/ doesn't exist anymore. (apache#2683) * Docs: Add analytics for polaris.apache.org (apache#2676) * Make ENABLE_SUB_CATALOG_RBAC_FOR_FEDERATED_CATALOGS configurable per catalog (apache#2688) * Update ENABLE_SUB_CATALOG_RBAC_FOR_FEDERATED_CATALOGS to be configurable per catalog * chore(deps): update postgres docker tag to v18 (apache#2692) * fix(deps): update dependency org.eclipse.persistence:eclipselink to v4.0.8 (apache#2682) * fix(deps): update dependency org.apache.logging.log4j:log4j-core to v2.25.2 (apache#2646) * chore(deps): update dependency openapi-generator-cli to v7.15.0 (apache#2410) * chore(deps): update dependency io.quarkus to v3.27.0 (apache#2663) Co-authored-by: Mend Renovate <[email protected]> * Publish Develocity builds scans for PRs and local use (apache#2596) This PR enables Develocity build scans for all PRs and contributors w/o an Apache account. CI build scans in the `apache/polaris` repo against branches and tags and having access to the ASF's Develocity secret continue to publish to the ASF's Develocity instance (no behavioral change). All other build scans are published to Gradle's public Develocity instance: - Build scans from local developer (non-CI) runs are only published, if Gradle is invoked with the `--scan` option. - Build scans from or targeting another repository than `apache/polaris` do need be enabled explicity by accepting Gradle's terms of service, via a repository variable, because this is a decision of the owner of a repository. Advanced options to configure another Develocity server or project-ID are available (for non-`apache/polaris` repositories). Detailed instructions in the `README.md`. * Fix & enhancements to the Events API hierarchy (apache#2629) Summary of changes: - Turned `PolarisEventListener` into an interface to facilitate implementation / mocking - Added missing `implements PolarisEvent` to many event records - Removed unused method overrides - Added missing method overrides to `TestPolarisEventListener` * fix(deps): update dependency org.kordamp.gradle:jandex-gradle-plugin to v2.3.0 (apache#2694) * Auth: reorganize internal authentication components (apache#2634) This PR contains no functional and no user-facing change. It is merely a refactor to better organize auth code. Summary of changes: - Moved all internal authentication components to the `org.apache.polaris.service.auth.internal` package and subpackages - Reduced visibility of utility classes - Renamed `TokenBroker` class hierarchy to stick to the naming standard: `<Algorithm>JWTBroker` - Introduced `@PolarisImmutable` whenever appropriate - Removed unused `NoneTokenBrokerFactory` (we already have `DisabledOAuth2ApiService`) - Removed unused `TokenBrokerFactoryConfig` * Enhancement : adding support for Aurora postgres AWS IAM authentication (apache#2650) Add support for postgres AWS IAM authentication using the `apache-client` lib. * Remove unused `name` arg from findCatalogByName in PolarisAdminService (apache#2691) * remove unused name param * Rename for better readability * Fix a race condition in sendNotification where concurrent parent-namespace creation causes failures (apache#2693) * Fix a race condition in sendNotification where concurrent parent-namespace creation causes failures The semantics of the createNonExistingNamespaces method used during sendNotification were supposed to be "create if needed". However, the behavior ended up surfacing an AlreadyExistsException if multiple concurrent sendNotification attempts were made for a brand-new namespace (where the notifications may be different tables). This would cause a table sync to fail if a sibling table was being synced at the same time, even though the new table should successfully get created under the shared namespace. * Also better future-proof the createNamespaceInternal logic by explicitly checking for ENTITY_ALREADY_EXISTS, per review suggestion. Log a less scary message since it's not an error scenario type of race condition, per review suggestion * Client: add credential reset option (apache#2698) * Client: add credential reset option * Client: add credential reset option * Client: add credential reset option * Add integration testing * Fix lint * fix(deps): update dependency software.amazon.awssdk:bom to v2.34.5 (apache#2702) * fix(deps): update dependency com.gradleup.shadow:shadow-gradle-plugin to v9.2.2 (apache#2661) * Support S3 storage that does not have STS (apache#2672) * Support S3 storage that does not have STS This change is backward compatible with old catalogs that have storage configuration for S3 systems with STS. * Add new property to S3 storage config: `stsUnavailable` (defaults to "available"). * Do not call STS when unavailable in `AwsCredentialsStorageIntegration`, but still put other properties (e.g. s3.endpoint) into `AccessConfig` Relates to apache#2615 Relates apache#2207 * Docs/improve idp documentation (apache#2695) * Fix Github links in IDP documentation * Separate IDP docs for usage and development * - Add telemetry config example - Fix link to getting started from landing page - Fix mentioning role-arn as required * Fix some relative links (local Hugo resolves them properly, but PR auto checks still fails) * Docs: narrow down --role-arn usage for AWS S3 only; fix a link in keycloak guide. * Docs: fix a link in keycloak guide. * chore(deps): update gradle/actions digest to 748248d (apache#2708) * Client: fix integration testing (apache#2700) * Add fallback in case the VERSION table is not present (apache#2653) * initial commit * wire up * pastefix * change to postgres specific code * [Catalog Federation] Add feature flag to disallow setting sub-RBAC for federated catalog at catalog level (apache#2696) In apache#2688 (comment), we've identified that configuring polaris.config.enable-sub-catalog-rbac-for-federated-catalogs at catalog level should not be allowed in all cases, especially when the owner is not the same subject as the catalog user or admin. This PR add a feature flag, ALLOW_SETTING_SUB_CATALOG_RBAC_FOR_FEDERATED_CATALOGS to allow owner to disable catalog level setting polaris.config.enable-sub-catalog-rbac-for-federated-catalogs * Fix `delegationModes` parameter propagation in `createTableStaged()` (apache#2713) This is follow-up bugfix for apache#2589 The bugfix part apache#2711 is extracted here since apache#2711 proved to be non-trivial and may require extra time. * Use the `delegationModes` method parameter as intended (as opposed to a local constant). * Generate Request IDs (if not specified); Return Request ID as a Header (apache#2602) * fix(deps): update dependency org.junit:junit-bom to v5.14.0 (apache#2715) * NoSQL persistence: add Java/Vert.X executor abstraction layer (apache#2527) Provides an abstraction to submit asynchronous tasks, optionally with a delay or delay + repetition and implementations based on Java's `ThreadPoolExecutor` and Vert.X. * Fix RDS devservices config + adopt for `:polaris-admin:test` (apache#2723) Changes: * Disables devservices for `:polaris-admin` tests as well, which is necessary to _not_ spin up test containers. * Use the explicit devservices-config as everywhere else. The first bullet point can cause excessive memory usage, especially with more test classes, eventually killing the whole GH runner. * fix(deps): update dependency io.smallrye:jandex to v3.5.0 (apache#2722) * fix(deps): update dependency org.jboss.weld:weld-junit5 to v5.0.2.final (apache#2721) * chore(deps): update quay.io/keycloak/keycloak docker tag to v26.4.0 (apache#2719) * Last merged commit 4024557 * NoSQL: Minor-ish changes to "nodes" projects Adopt nodes projects to OSS PR content * NoSQL: adapt to async package rename * Build: remove unnecessary explicit vertx-core dependency The async-vertx implementation should not propagate a different Vert.X dependency than Quarkus provides. This wouldn't be an issue if we could just use `enforcedPlatform()` for all Quarkus-builds, but sadly we cannot for the spark-plugin-inttests. --------- Co-authored-by: Honah (Jonas) J. <[email protected]> Co-authored-by: Pooja Nilangekar <[email protected]> Co-authored-by: Prashant Singh <[email protected]> Co-authored-by: JB Onofré <[email protected]> Co-authored-by: Mend Renovate <[email protected]> Co-authored-by: Alexandre Dutra <[email protected]> Co-authored-by: fabio-rizzo-01 <[email protected]> Co-authored-by: Dennis Huo <[email protected]> Co-authored-by: Yong Zheng <[email protected]> Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: olsoloviov <[email protected]> Co-authored-by: Eric Maynard <[email protected]> Co-authored-by: Adnan Hemani <[email protected]>
Provides an abstraction to submit asynchronous tasks, optionally with a delay or delay + repetition and implementations based on Java's
ThreadPoolExecutorand Vert.X.