-
Notifications
You must be signed in to change notification settings - Fork 329
API Spec: Add ConnectionConfigInfo to ExternalCatalog #1026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Spec: Add ConnectionConfigInfo to ExternalCatalog #1026
Conversation
Remove the currently unused remoteUrl field from the top-level ExternalCatalog into the ConnectionConfigInfo as remoteUri instead for better consistency; remote catalogs in the future may be defined by arbitrary URIs that are not, for example, http(s) URLs. This is just the spec definition for now, so it's not yet wired into the internal entity layer or persistence objects. Allow extensibility of different connection types in the future even if we start with only an ICEBERG_REST type. Similarly, provide extensibility for different authn mechanisms to use with the connection.
| connectionConfigInfo: | ||
| $ref: "#/components/schemas/ConnectionConfigInfo" | ||
|
|
||
| ConnectionConfigInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow updating the ConnectionConfigInfo on catalog entity? See UpdateCatalogRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, yes, I'll add it to the update as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! For updating connection configs, I think we should only allow modifying the secret info so that users can rotate the secrets. For other connection configs, if we allow customers to modify them, customers may point to another remote catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more, I replied to @dimas-b 's comment here as well: https://github.com/apache/polaris/pull/1026/files#r1978040280
I think we might want to rework how we express updates so that it's less confusing. Right now it almost looks like the API takes a strict REST-style "replace entire object" approach, but it's already subtly conditioned on which fields are "special" to be ignored vs deleted vs modified.
If we do it Iceberg's updateTable-style, we'd flatten out verbs to take update objects like UpdateCatalogConnectionConfigRemoteUri and UpdateCatalogConnectionConfigSecrets.
This is possibly going to be a bit complicated to hash out, so maybe it's best to leave ConnectionConfigInfo out of UpdateCatalog for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we’ve run into a lot of issues with updating the storage config. For connection config, it would be helpful to define which properties are alterable and establish a fine-grained update spec.
| description: An externally managed catalog | ||
| type: object | ||
| allOf: | ||
| - $ref: "#/components/schemas/Catalog" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For external catalog, is the storage config required if Polaris just passes through the response sent from remote catalog and not generates the subscoped creds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, StorageConfig might become more optional. However, this is actually an important design point about whether we're willing to return the remote catalog's subscoped creds.
At least some of the known use cases explicitly want Polaris to be the one responsible for access control and credential vending, while the remote catalog does not perform credential vending. So we want the ability for Polaris to mix-in vended credentials.
Returning the remote catalog's vended credentials will probably need to be configurable. For most real use cases we'd probably want some formal protocol for declaring the "on-behalf-of delegation chain"; e.g. the ConnectionConfig contains a "system identity" but we'd want a way to declare the identity of the calling Principal in the request to the remote catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we should provide a configuration for customers!
The vended credentials send back from the remote catalog represents the system identity, it's very powerful and we need some sorts of request-scoped identity. This could be achieved by passing a http header to polaris.
| type: string | ||
| description: URL to the remote catalog API | ||
| connectionConfigInfo: | ||
| $ref: "#/components/schemas/ConnectionConfigInfo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make ConnectionConfigInfo a required property for external catalogs, just like StorageConfigInfo is for internal catalogs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For current backwards-compatibility, I was actually thinking of it like:
- if (ExternalCatalog.getConnectionConfigInfo() == null) { internalSubtype = STATIC_FACADE; }
- else { internalSubtype = PASSTHROUGH_FACADE; }
Admittedly that might be kind of a hack to only use the presence of connection config to determine static vs passthrough facade, but conceptually, it makes sense that an ExternalCatalog that can't dial out to a remote catalog fundamentally must behave as a "static facade" where content is "pushed" into the ExternalCatalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we introduce a new catalog type like federated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually prefer to move away from more "type-differentiation" in favor of "capability differentiation", since we've talked about wanting the ability to convert from an external catalog to an internal catalog in the future someday.
Then, a catalog simply may or may not have a ConnectionConfigInfo; during a migration, it doesn't exactly matter what "type" we call the catalog, but it should be able to start functioning like a normal INTERNAL catalog at some point, while still using the ConnectionConfigInfo to detect updates in the remote catalog.
Aside from conversion to INTERNAL, there's also a close relationship between EXTERNAL catalogs with and without ConnectionConfigInfo; maybe people start out with a typical migration tool and plain notification-based EXTERNAL catalog, but then want to enable federation on the existing catalog that might already have grants defined. The entity metadata we might have "cached" locally would then be able to be "verified" during loadTable from the remote catalog, and then updateTable requests could be accepted as well.
It might still be good to have an enum of some sort so that we're not just inferring a mode-of-operation based on what fields are present, but maybe that would be better as a separate modeOfOperation enum than type, with key differences being:
- The mode of operation doesn't necessarily strictly define the set of attributes in the catalog object
- There would be no API-spec "discriminator" for sub-object based on the
modeOfOperation-- the discriminator-style inheritance is honestly quite painful to work with - The mode of operation is more fluid, intended to be able to be changed over time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comment makes sense to me, @dennishuo !
However, if we follow that path to the logical end, I think we're looking at a full redesign of the Management REST API 😅 This is actually fine from my POV. We can have APIs v1 and v2 co-existing for a while.
spec/polaris-management-service.yml
Outdated
| enum: | ||
| - ICEBERG_REST | ||
| description: The type of remote catalog service represented by this connection | ||
| remoteUri: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remote for a URI is superfluous, IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to uri
|
|
||
| OauthRestAuthenticationInfo: | ||
| type: object | ||
| description: OAuth authentication based on client_id/client_secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's just about the client credentials flow, maybe we can rename the type to be something like OAuthClientCredentialsParameters?.. Iceberg REST Servers do not have to be restricted to client credentials, other flows may be available (e.g. delegation), which are still OAuth2, but will require different parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, including the casing OAuth to match conventions elsewhere.
spec/polaris-management-service.yml
Outdated
| type: string | ||
| description: oauth scopes to specify when exchanging for a short-lived access token | ||
|
|
||
| BearerRestAuthenticationInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bearer auth is not specific to Iceberg REST. How about BearerAuthenticationParameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! We might need to create an authentication config specifically for ConnectionConfig. For other connections, like IcebergHiveConnectionConfig, we can still reuse the existing authentication parameters.
Also, using Parameters as part of the name is a great approach. Since we’ll be generating some classes based on the spec, suffixing them with Parameters makes sense as they are part of requests. This also provides more flexibility in naming our internal classes.
For example, we currently have AwsStorageConfigInfo (generated by OpenAPI) and AwsStorageConfigurationInfo (internal representation of the AWS storage config), which can be quite confusing. Using a clearer naming convention will help distinguish between generated request models and internal representations.
Maybe we can use IcebergRestConnectionConfigInfo in our spec and IcebergRestConnectionConfig as the name of internal class, or other suffix like Model. If we follow the same pattern as the storage config, the name of the internal name would be very long, e.g. IcebergRestConnectionConfigurationInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion above makes sense to me (my first comment was just a minor naming concern)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, renamed to BearerAuthenticationParameters
| remoteUrl: | ||
| type: string | ||
| description: URL to the remote catalog API | ||
| connectionConfigInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do updates happen? It looks like changing any config property in an ExternalCatalog requires re-submitting the whole object... Is that so?
Having client re-submit credentials on every config change is probably not ideal 🤔 WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the way we express updates probably needs some rework anyways. Even though the original structure of UpdateCatalogRequest kind of looks a "replace all", it also says:
Any fields which are required in the Catalog will remain unaltered if omitted from the contents of this Update request.
Which is a bit ambiguous, especially when it comes to partially specifying optional portions of a required outer struct (e.g. if StorageConfigInfo is being partially updated).
In practice, there's basically currently some very specialized logic for deciding which fields are supposed to be "total replace", or "delete through omission" or "ignore if omitted" which is definitely confusing when using the API.
On the plus side at least we didn't just make the body of UpdateCatalogRequest be the full Catalog like a naive REST pattern would do. Probably we need to switch to a more imperative/verb-style like used in the Iceberg updateTable API and basically just flatten out all the possible individual-field updates, maybe accepting a list of updates in a single request.
In theory the HTTP verb should've been PATCH to most correctly match "partial-update" semantics, but there's some inconsistency in support of PATCH. Also it's interesting that Iceberg's updateTable is a POST as well as createTable also being a POST (on the parent namespace), which is presumably why they needed to make namespace-update a POST on /v1/{prefix}/namespaces/{namespace}/properties instead of on /v1/{prefix}/namespaces/{namespace}, instead of normally PUT being the update verb.
I'm starting to think I'll leave ConnectionConfigInfo out of scope of the current UpdateCatalog definition so we can put more thought into updates overall without blocking basic progress on federation.
…arerAuthenticationParameters. Also rename 'remoteUri' to just 'uri'
|
@dennishuo : WDYT about forking the java change for remote URL into a separate PR? I think those can be merge right now. |
| allOf: | ||
| - $ref: '#/components/schemas/AuthenticationParameters' | ||
| properties: | ||
| bearerToken: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure storing credentials inside catalog objects is a good idea in general. I believe we briefly discussed it elsewhere.
Also apache/iceberg#12197 opens a lot of other authentication possibilities.
I suppose we these options to proceed :
- Merge "as is" but treat the external catalogs feature as "alpha" , "subject to change", then incrementally improve connection auth and secret management.
- Hold this PR and work on improving those related areas, then re-do this API.
I'm personally fine either way, but I want to emphasize that with option 1 the API will go though a series on non-backward-compatible changes as we are approaching the finish line.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the intent here is to support a variety of authentication models, this could be something we hash out more on the dev mailing list as well.
I agree we could iterate quickly by marking the feature as alpha initially, as I suspect the end-to-end interaction will make the considerations more obvious as we iterate on it.
As for "storing", I think there are two aspects:
- Expressing the secret directly in the ConnectionConfig struct of the API object
- Persisting the secret somewhere
Are you concerned about the API structure for (1)?
I agree we probably don't want (2) to default to storing plaintext in the persistence DB at any point in time. There are a few different models that could coexist for (2):
- Fully require an external "secret" to already exist as a reference, e.g. to a Vault URI, and specify that Vault URI in the config info -- but this only shifts the problem to the meta-secret for accessing Vault, so at some point the caller needs to configure some secret on the Polaris side
- Require a separate SecretIntegration type of Polaris entity to already be created whose sole purpose is plumbing the secrets into some other configurable secret store (e.g. Polaris would manage Vault in this case), but this is somewhat complex for the API and ultimately the same kinds of internal secrets plumbing would need to be built.
- (Plan to do this): Within Polaris's logic for handling entities that define secrets, we can have a SecretsManager class whose purpose is to take secrets from API models and actually store them somewhere else that's safe - Vault, KMS, some other keystore, etc., and then add internal references to the stored secret inside the Polaris entity so that the secret can be found when needed. Callsites that need the unpacked secret again go through the SecretsManager to "extract" the secret from the Polaris entity; the implementation of the SecretsManager gets to decide for itself how it wants to store a reference and then unpack it again.
- (Should consider also supporting this model): Similar to (3) but if we want non-uniform types of secrets but a uniform secrets-management layer, the SecretsManager might, instead of only storing a reference to an external resource, could be leasing a new encryption key from the external system and then embedding the encrypted secret in some field within the Polaris entity. Then when the callsite again needs the unpacked secret, this SecretsManager would recognize to decrypt the encrypted blob in order to retrieve the original secrets
The nice thing about such a model is that it's fairly flexible for different secrets-management flows under a single interface. For example, to add direct sigv4-based auth to AWS Glue or S3Tables, the part where the SecretsManager transforms the entity would not actually be handling a secret directly, but instead performing the assumeRole relationship like how the StorageConfiguration does it; in this world, true "secrets" are only implicitly handled within the environment, but the flow looks the same -- embed the userArn and externalId into the Polaris entity, and then later when you want to unpack a secret, you use something externally-managed to become that userArn before doing an assumeRole to mint a new subscoped token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe even passing secrets through the Management API is a security risk.
Ideally, the act of configuring an external catalog would reference secrets (e.g. by URN) as opposed to submitting them directly. It might be best to start another dev ML thread and doc for this, tough. I'm there are alots of details to iron out.
Iterative approach LGTM.
| remoteUrl: | ||
| type: string | ||
| description: URL to the remote catalog API | ||
| connectionConfigInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dennishuo @sfc-gh-rxing this change seems removing an existing field called remoteUrl. I am wondering would that break the existing user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah normally I'd be more hesitant to make breaking changes, but in this case there wasn't any code in Polaris actually using it.
It's conceivable that someone who customized Polaris for their own services uses it, so maybe we could see if anyone vetoes it, but in general it doesn't really make sense to have a remoteUrl without connection authentication settings anyways. At best it would be used as a cosmetic string to display somewhere.
I've at least checked that one of the large stakeholders of a customized Polaris deployment (Snowflake OpenCatalog) does not use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a project , I think we have to allow some leeway in breaking API changes. It may be worth marking certain areas as "alpha" / "beta" even when they are merged... until the API stabilizes.
I do not think we can assume that every API change results in a "final" API contract at this stage of the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I am not against doing this change, as far as we have thought about the consequence, i think we should be fine.
| mapping: | ||
| ICEBERG_REST: "#/components/schemas/IcebergRestConnectionConfigInfo" | ||
|
|
||
| IcebergRestConnectionConfigInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any particular reason to put Iceberg as part of the name? the RestConnectionConfigInfo seems something can be used for non-iceberg REST service also. In your opinion, how it would look like if we want to generalize this to support none-iceberg REST service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is specifically for the Iceberg REST-specific subclass, where the discriminator connectionType is ICEBERG_REST. I suppose if all different connection types end up needing the same set of parameters then we don't need to use discriminator-based subclasses and could just make the enum function by itself. Though if we then find out there are type-specific fields we'd need to basically add an "optional" of each possible specialized connection type if we didn't start out with the discriminator approach.
Anyways, right now the Iceberg REST connection does have the remoteCatalogName which is intended to be used in a way that is somewhat specific to Iceberg REST -- in particular, to pass it in as the warehouse property when calling getConfig and possibly expecting an override of the URL PREFIX. If this old-fashioned handshake is deprecated someday, then PREFIX would need to become first-class and again it would be specific to the way Iceberg REST constructs paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! I am more of thinking the case that we might want to enable connection with other catalog service like glue/unit-catalog/hive-metastore in general, not just iceberg endpoints. I haven't looked into how discriminator works, and have one quick question, if in the future we remove the discriminator, will there be any user facing impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, each other connectionType should have a corresponding type-specific struct defined.
discriminator:
propertyName: connectionType
mapping:
ICEBERG_REST: "#/components/schemas/IcebergRestConnectionConfigInfo"
HIVE: "#/components/schemas/HiveConnectionConfigInfo"
HiveConnectionConfigInfo:
..
If we remove the discriminator, the JSON on the wire is still generally compatible, if we flatten the fields of all possible subtypes into the base type. The internal autogenerated java classes will not be compatible, but can be rewritten if we want that.
spec/polaris-management-service.yml
Outdated
|
|
||
| ConnectionConfigInfo: | ||
| type: object | ||
| description: A connection configuration representing a remote catalog service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about adding a note here that this is "experiment API" and "subject to change"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, added.
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this PR in order to allow backend implementation / POC for external catalogs to proceed SGTM.
However, as discussed, I think we need to revisit the handling of secrets in Polaris holistically (including External Catalogs, but not limiting the discussion just to them).
…the base ConnectionConfigInfo
XJDKC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Dennis!
|
Vote passes on the mailing list: https://lists.apache.org/thread/gys4hbqrzwffl0rlrhr6gv4c3119tz8l +1: 5 (binding), 4 (non-binding) Commencing with merge |
* Generify MetaStoreManagerFactory.getOrCreateSessionSupplier (apache#1173) No functional change. * Adjust the type parameter to the Persistence supplier to cover all possible implementation types. * Remove unnecessary fields from IcebergCatalogAdapter * Adjust types at call sites of `getOrCreateSessionSupplier`. * [Enhancement] Refactor Cleanup Task Handler (apache#516) * refactor cleanup task handler * format * make base class abstract * make cleanup task record * simplify logger * update test after merge * update task handler register after merge * fix test error after merge * refine call context and ut after merge * fix ci error (apache#1174) * main: Update dependency org.junit:junit-bom to v5.12.1 (apache#1177) * Publish 0.9.0 documentation (apache#1175) * main: Update dependency gradle to v8.13 (apache#1063) * main: Update dependency gradle to v8.13 * Adopt build-scripts to Gradle changes See https://docs.gradle.org/8.13/userguide/upgrading_version_8.html#changes_to_jvmtestsuite --------- Co-authored-by: Robert Stupp <[email protected]> * Remove jetbrains-annotations (apache#1176) The main intention of this change is to avoid confusion between Jetbrains' `@NotNull` and Jakarta's `@Nonnull` (the latter is standard in the Polaris codebase). As a side effect `@Contract` is no longer available. However, its value is realised only in tools that support it and Polaris builds do not rely on it for producing artifacts. For a human being the value of `@Contract` appears to be negligible compared to javadoc. Therefore, in the interest of keeping annotation dependencies concise, `@Contract` lines are removed. Jetbrains' `@VisibleForTesting` is converted to the same annotation from Guava (which is also standard in the Polaris codebase). * main: Update dependency com.google.cloud:google-cloud-storage-bom to v2.50.0 (apache#1178) * Isolate Persistence objects in different threads. (apache#1166) * Simplify PolarisGrantManager (apache#1171) * Simplify PolarisGrantManager Previously strongly typed methods redirected to typeless lookup methods, while implementations had only the typeless variant. This change reverts the redirects from strongly typed methods to existing typeless methods in implementations. As a result it is possible to simplify the interface by removing typeless lookup methods. Existing call sites all have strongly typed parameters available and use the typed lookup methods now. For context: This refactoring seems valuable by itself, but it is also needed for the upcoming NoSQL implementations for reasons similar to apache#1112 * main: Update dependency com.github.spotbugs:spotbugs-annotations to v4.9.3 (apache#1180) * main: Update mockito monorepo to v5.16.1 (apache#1182) * main: Update dependency software.amazon.awssdk:bom to v2.31.1 (apache#1188) * Remove extra run in dockerfile (apache#1185) * Policy Store: Add PolicyEntity and PolicyTypes (apache#1133) * Fix spark download and add check/cleanup (apache#1184) * Update EclipseLink doc to 4.0 (apache#1198) * Sync psql persistence (apache#1187) * sync persistence config when using Postgres * sync persistence config when using Postgres * sync persistence config when using Postgres * Revert change for 0.9.0 * Renovate: Group all Quarkus dependencies (apache#1206) Quarkus-platform releases happen some time after the "actual" Quarkus release, which causes "broken" CI for Renovate PRs against for example the Quarkus Gradle plugin. This change groups all Quarkus dependencies together to consistently bump all Quarkus-platform dependencies at once. * The ASF Infra deployed a new parser, that requires a fix on our .asf.yaml (apache#1209) * Let 'spotless' run on all java source directories (apache#1205) Only runs on 'main', 'test', 'testFixtures', but not on others like 'intTest'. This change fixes this. * commit (apache#1201) * Move Polaris client into root dir (apache#1172) * Move python client into root dir * Fix paths for regtests * Fix path for notebooks and docs * Change the path within container for consistency * Add License Header * Fix the copy folder cmd to restore the original regtests layout * Rearrange dir layout inside docker * Rename classes in transactional persistence package (apache#1197) * Refactors to support generic tables (apache#1147) * rename polarisbasecatalog * move catalog files * more renames; introduce class * add basic create/load methods * some refactoring * test stable * small rename * rename per review * bump id * move a file * autolint * rebase * autolint * changes per review * autolint * rename * autolint * fix merge conflict * main: Update dependency ch.qos.logback:logback-classic to v1.5.18 (apache#1196) * main: Update Quarkus Platform and Group (apache#1212) * main: Update dependency boto3 to v1.37.16 (apache#1093) * Use 'en-us' in all `Dockerfile`s and Gradle `Test` tasks (apache#1214) Fixes apache#885 Supersedes apache#886 * main: Update dependency com.google.guava:guava to v33.4.5-jre (apache#1218) * IcebergCatalogAdapter: close underlying catalog consistently (apache#1224) With the revert of #b84f4624db8d0bd5b8920b0df719bcc15666008f by #ccf25df7b055e9d232b88a3f6fe8b4e0a2ab035a, we lost an extra benefit that was included in that change: a fix for the fact that IcebergCatalogHandlerWrapper does not always close its underlying `Catalog`, thus relying on `CallContext` to play the role of the "sweep vehicle" and close everything that was left unclosed at the end of the request processing. This PR re-applies that fix again. * Add BOM (Bill of Materials) (apache#1216) Fixes apache#788 * Fix CI build after polaris.io (apache#1232) * Add Apache Polaris Community Meeting from March 20, 2025 (apache#1234) * Simplify `polaris` client script (apache#1220) Address [SC2164](https://github.com/koalaman/shellcheck/wiki/SC2164), [SC2098](https://github.com/koalaman/shellcheck/wiki/SC2098), [SC2086](https://github.com/koalaman/shellcheck/wiki/SC2086) Co-authored-by: Alexandre Dutra <[email protected]> * Move polaris-admin-tool tests to separate package (apache#1227) ... and make the PG test-resource non-global. * Let `PurgeCommand` inspect the results (apache#1226) * Add type and converters for `MemorySize` (apache#1230) The type allows human friendly memory size specifications like `32k` or `64M`, including support for smallrye-config and Jackson. * Retire `polaris-reg-test` script (apache#1219) * Nit: add misc-types to bom (apache#1241) * JWTBroker: fix refresh token logic (apache#1242) * Fix overzealous check in the Polaris CLI (apache#1237) * fix * adjust * revert * main: Update dependency software.amazon.awssdk:bom to v2.31.6 (apache#1245) * main: Update dependency boto3 to v1.37.18 (apache#1244) * Add zip+tar to publishable artifacts and add a `run.sh` script (apache#1082) Adds the tar+zip distribution archives as publishable artifacts to Maven publication. Also updates polaris-quarkus-admin to build as a "fast-jar" instead of an "uber-jar". * Build: Add `pom.xml`, `pom.properties` and LICENSE+NOTICE to release jars (apache#1036) Adds convenient, but not strictly necessary information to each generated "main" jar. This includes `pom.properties` and `pom.xml` files where Maven places those, in `META-INF/maven/group-id/artifact-id/`. Also adds the `NOTICE` and `LICENSE` files in `META-INF`, which makes it easier for license scanners. * Doc: catalog bootstrap steps for helm deployment (apache#1243) * main: Update actions/setup-python digest to 8d9ed9a (apache#1249) * main: Update dependency com.google.guava:guava to v33.4.6-jre (apache#1251) * main: Update actions/stale digest to ba23c1c (apache#1250) * Core: Add data compaction policy content parser and validator (apache#1238) * main: Update gradle/actions digest to 06832c7 (apache#1255) * Admin tool: fix Dockerfile.jvm (apache#1256) * Implement GenericTableCatalog (apache#1231) * add missing apis * more tests, fixes * clean up drop * autolint * changes per review * revert iceberg messages to comply with oss tests * another revert * more iceberg catalog changes * autolint * wip * refactor to subtype * autolint * rebase * add another assert * autolint * add another best effort check * autolint * reduce metastore trips * autolint * API Spec: Add ConnectionConfigInfo to ExternalCatalog (apache#1026) * API Spec: Add ConnectionConfigInfo to ExternalCatalog Remove the currently unused remoteUrl field from the top-level ExternalCatalog into the ConnectionConfigInfo as "uri" instead for better consistency; remote catalogs in the future may be defined by arbitrary URIs that are not, for example, http(s) URLs. This is just the spec definition for now, so it's not yet wired into the internal entity layer or persistence objects. Allow extensibility of different connection types in the future even if we start with only an ICEBERG_REST type. Similarly, provide extensibility for different authn mechanisms to use with the connection. * Implement service interfaces for policies & generic tables (apache#1263) * ready * autolint * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1 (apache#1268) * main: Update dependency io.smallrye.common:smallrye-common-annotation to v2.11.0 (apache#1267) * main: Update dependency io.smallrye.config:smallrye-config-core to v3.12.4 (apache#1266) * Add openApiGenerate task as dependency for processResources (apache#1259) * Core: Add policy content and validator for more maintenance policies (apache#1261) * Better error message when sibling resolution fails (apache#1253) * better error * autolint * better message * Add CLI dependency update option (apache#1222) * PySpark Reg Test Updates (apache#1262) * PySpark Reg Test Updates * Nits --------- Co-authored-by: Travis Bowen <[email protected]> * Update .asf.yaml adding dismiss_stale_reviews to true and require_last_push_approval to false (apache#1265) * main: Update dependency software.amazon.awssdk:bom to v2.31.11 (apache#1279) * main: Update dependency boto3 to v1.37.23 (apache#1278) * main: Update dependency com.azure:azure-sdk-bom to v1.2.33 (apache#1275) * Update team (apache#1282) * Vend Azure credentials compatible with Iceberg 1.7 (apache#1252) * update * autolint * fix * autolint * clean up * autolint * test * autolint * paranoid check * typofix * Add a note about nit/minor comments (apache#1280) * Upgrade Iceberg to 1.8.1 (apache#1126) * Policy Store: PolicyMappingRecord with Persistence Impl (apache#1104) * Spark: Setup repository code structure and build (apache#1190) * Added freshness aware table loading using metadata file location for ETag (apache#1037) * Pulled in iceberg 1.8.0 spec changes for freshness aware table loading and added feature to Polaris * Changed etag support to use entityId:version tuple * fixed getresponse call * Changed etagged response to record and gave default implementation to ETaggableEntity * Made iceberg rest spec docs clearer * Added HTTP Compliant ETag and IfNoneMatch representations and separated persistence from etag logic * Changed ETag to be a record and improved semantics of IfNoneMatch * Fixed semantics of if none match * Removed ETag representation, consolidated in IfNoneMatch * fixed if none match parsing * Added table entity retrieval method to table operations * removed accidental commit of pycache folders * Fixed formatting * Changed to use metadata location hash * Ran formatting * use sha256 * Moved out ETag functions to utility class and removed ETaggedLoadTableResponse * Addressed comments * Fixed IcebergTableLikeEntity package rename * main: Update dependency io.opentelemetry.semconv:opentelemetry-semconv to v1.31.0 (apache#1288) * Update LICENSE and NOTICE in the distributions (admin and server) (apache#1258) * Gradle/Quarkus: make imageBuild task depend on jandex (apache#1290) * Core: Clarify the atomicity of BasePersistence methods (apache#1274) * Implement GenericTableCatalogAdapter (apache#1264) * rebase * more fixes * autolint * working on tests * stable test * autolint * polish * changes per review * some changes per review * grants * autolint * changes per review * changes per review * typofix * Improve code-containment and efficiency of etag-aware loading (apache#1296) * Improve code-containment and efficiency of etag-aware loading -Make the hash generation resilient against null metadataLocation -Use getResolvedPath instead of getPassthroughResolvedPath to avoid redundant persistence round-trip -Only try to calculate the etag for comparison against ifNoneMatch if the ifNoneMatch is actually provided * Add strict null-checking at callsites to generateETag, disallow passing null to generator * Add TODO to refactor shared logic for etag generation * Core: Add Endpoints and resource paths for Generic Table (apache#1286) * main: Update dependency com.nimbusds:nimbus-jose-jwt to v10.1 (apache#1299) * [JDBC] Part1 : ADD SQL script for Polaris setup (apache#1276) * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1743605859 (apache#1300) * done (apache#1297) * Add Polaris Community Meeting for April 3, 2025 (apache#1304) * Use config-file to define errorprone rule (apache#1233) Also enabled a couple more simple rules, and adding suppressions/fixes for/to the code. The two rules `EqualsGetClass` and `UnusedMethod`, which I think are useful, are not enabled yet, because that would mean actual code changes, which I do not want to do in this PR. The rule `PatternMatchingInstanceof`, introduced in apache#393, is disabled in this PR. It does not work before errorrpone 2.37.0 (via apache#1213) - requires additional changes to enable the rule (see apache#1215). * Add Yun as a contributor (apache#1310) * Refactor CatalogHandler to comply with ErrorProne rules (apache#1312) Fix the CI error after apache#1233 * Implement PolicyCatalog Stage 1: CRUD + ListPolicies (apache#1294) * main: Update dependency io.opentelemetry:opentelemetry-bom to v1.49.0 (apache#1316) * main: Update docker.io/jaegertracing/all-in-one Docker tag to v1.68.0 (apache#1317) * main: Update dependency boto3 to v1.37.28 (apache#1328) * main: Update dependency software.amazon.awssdk:bom to v2.31.16 (apache#1329) * Make `BasePolaritsMetaStoreManagerTest` and `(Base)ResolverTest` reusable (apache#1308) Moves the test cases into the `Base*` classes and make sure the classes can be reused by other persistence implementations. * main: Update dependency io.opentelemetry.semconv:opentelemetry-semconv to v1.32.0 (apache#1293) * main: Update mockito monorepo to v5.17.0 (apache#1311) * PySpark Update AWS Region (apache#1302) Co-authored-by: Travis Bowen <[email protected]> * main: Update dependency com.nimbusds:nimbus-jose-jwt to v10.2 (apache#1334) * main: Update dependency com.diffplug.spotless:spotless-plugin-gradle to v7.0.3 (apache#1335) * Maven publication: Produce correct `<scm><tag>` in `pom.xml` (apache#1330) `project.scm.tag` in a Maven pom is intended to refer to the SCM (Git) tag. We currently publish `main`, which is incorrect. This change omits the SCM tag for snapshot builds, but emits the Git tag for releases. * Remove `@StaticInitSafe` annotation (apache#1331) There was an issue around mapped configurations having the `@StaticInitSafe` annotation that led to _two_ instances (a "static" one and a "somewhet application-scoped" one) - this was fixed in Quarkus 3.21. One bug in smallrye-config is fixed for Quarkus > 3.21.0, another issue however remains. Since `@StaticInitSafe` annotated configs seem to cause some weird issues, it seems legit to remote that annotation altogether. This approach was [taken in Nessie](projectnessie/nessie#10606) as well. Investigations (via practical experiments) have proven that there's no measurable impact (runtime + heap) when doing this - and that's also been confirmed by Quarkus + Smallrye-config maintainers. Hence this change remotes that annotation from the code base. * Build/Release: Add a "generate digest" task and use for source tarball and Quarkus distributables (apache#1271) * Ensure that digest and signature are generated for both Polaris-Server and admin tar/zip distribution * Move "generate digest" functionality to a Gradle task * main: Update dependency com.google.errorprone:error_prone_core to v2.37.0 (apache#1213) * main: Update Quarkus Platform and Group to v3.21.1 (apache#1291) * main: Update dependency io.netty:netty-codec-http2 to v4.2.0.Final (apache#1301) * Remove unnecessary `clean` and `--no-build-cache` from Gradle invocations (apache#1338) `quarkusAppPartsBuild --rerun` is the right way to force a Docker image build. * Generalize bootstrapping in servers (apache#1313) * Remove `instanceof` checks from `QuarkusProducers`. * Remove the now unused `onStartup` method from `InMemoryPolarisMetaStoreManagerFactory`. * Instead, call the good old `bootstrapRealms` method from `QuarkusProducers`. * Add new config property to control which MetaStore types are bootstrapped automatically (defaults to `in-memory` as before). * There is no bootstrap behaviour change in this PR, only refactorings to simplify code. * Add info log message to indicate when a realm is bootstrapped in runtime using preset credentials. Future enhancements may include pulling preset credentials from a secret manager like Vault for bootstrapping (s discussed in comments on apache#1228). * main: Update actions/stale digest to 816d9db (apache#1341) * main: Update dependency com.adobe.testing:s3mock-testcontainers to v4 (apache#1342) * main: Update dependency org.eclipse.persistence:eclipselink to v4.0.6 (apache#1343) * main: Update dependency io.quarkus to v3.21.2 (apache#1344) * main: Update dependency com.google.guava:guava to v33.4.7-jre (apache#1340) Co-authored-by: Robert Stupp <[email protected]> * Spark: Add Namespaces and View support for SparkCatalog (apache#1332) * Demote technical log messages to DEBUG in PolarisCallContextCatalogFactory (apache#1346) These messages appear to be logging low-level technical details about what is going on in the factory and are not likely to be of interest to most users on a daily basis. * Core/Service: Implement PolicyCatalog Stage 2: detach/attach/getApplicablePolicies (apache#1314) * Spec: Add 'inherited' and 'namespace' Fields to GetApplicablePolicies API Response (apache#1277) * Properly track bootstrappedRealms in InMemoryPolarisMetaStoreManagerFactory (apache#1352) Fixes apache#1351 * Implement GenericTableCatalogAdapter; admin-related fixes (apache#1298) * initial commit: * debugging * some polish * autolint * spec change * bugfix * bugfix * various fixes * another missing admin location * autolint * false by default * fixes per review * autolint * more fixes * DRY * revert small change for a better error * integration test * extra test * autolint * stable * wip * rework subtypes a bit * stable again * autolint * apply new lint rule * errorprone again * adjustments per review * update golden files * add another test * clean up logic in PolarisAdminService * autolint * more fixes per review * format * Update versions in distribution LICENSE and NOTICE (apache#1350) * Spark: Add CreateTable and LoadTable implementation for SparkCatalog (apache#1303) * Add a weigher to the EntityCache based on approximate entity size (apache#490) * initial commit * autolint * resolve conflicts * autolint * pull main * Add multiplier * account for name, too * adjust multiplier * add config * autolint * remove old cast * more tests, fixes per review * add precise weight test * autolint * populate credentials field for loadTableResponse (apache#1225) * populate credentials field for loadTableResponse * spotless * spotless * remove unused hashset * fix merge * fix empty credential case * spotlessApply --------- Co-authored-by: David Lu <[email protected]> * main: Update dependency io.smallrye.common:smallrye-common-annotation to v2.12.0 (apache#1355) * Build: Avoid adding duplicated projects for Intelij IDE usage (apache#1333) * main: Update dependency org.junit:junit-bom to v5.12.2 (apache#1354) * main: Update dependency org.apache.commons:commons-text to v1.13.1 (apache#1358) * main: Update dependency boto3 to v1.37.33 (apache#1360) * main: Update dependency software.amazon.awssdk:bom to v2.31.21 (apache#1361) * main: Update dependency io.micrometer:micrometer-bom to v1.14.6 (apache#1362) * main: Update dependency com.google.guava:guava to v33.4.8-jre (apache#1366) * Update LICENSE/NOTICE with latest versions (apache#1364) * Use "clean" LICENSE and NOTICE in published jar artifacts (apache#1292) * main: Update dependency io.projectreactor.netty:reactor-netty-http to v1.2.5 (apache#1372) * Add `Varint` type for variable-length integer encoding (apache#1229) * main: Update docker.io/prom/prometheus Docker tag to v3.3.0 (apache#1375) * Set version to 0.10.0-beta in prepaaration for the next release (apache#1370) * Update the link to OpenAPI in the documentation (apache#1379) * Integration test for Spark Client (apache#1349) * add integration test * add change * add comments * rebase main * update class comments * add base integration * clean up comments * main: Update dependency net.ltgt.gradle:gradle-errorprone-plugin to v4.2.0 (apache#1392) * Add generic table documentations (apache#1374) * add generic table documentation (incomplete) * fix table and spacing * remove documentation in client api since there is no implementation yet * remove spacing * minor fix - proof read * review fix, wording * add generic table documentation (incomplete) * fix table and spacing * remove documentation in client api since there is no implementation yet * remove spacing * minor fix - proof read * review fix, wording * proof read - punctuation fix * change table privilege reference * Unblock test `listNamespacesWithEmptyNamespace` (apache#1289) * Unblock test `listNamespacesWithEmptyNamespace` * Use `containsExactly` to simplify the test * Fix empty namespace behavior * Address comments * Block dropping empty namespace * Improve error messages * Revamp the Quick Start page (apache#1367) * First Draft with AWS * try again * try again * try again * try again * try again * try now * should work * AWS First Draft Complete * ensure file changed * Azure First Draft Complete * Azure First Draft, pt. 2 * Azure Completed * GCP First Draft * GCP Verified * File structure fixed * Remove Trino-specific tutorial * Restructured Quick Start * Addresses minor comments from @eric-maynard * Added reference to Deploying Polaris in Production * Fix MD Link Checker --------- Co-authored-by: Adnan Hemani <[email protected]> * Update README with links to new Quickstart experience (apache#1393) * Update the StorageConfiguration to invoke singleton client objects, a… (apache#1386) * Update the StorageConfiguration to invoke singleton client objects, and add a test * Fix formatting * using guava suppliers * Add aws region * Cleanup and mock test * Spark: Add rest table operations (drop, list, purge and rename etc) for Spark Client (apache#1368) * Initial MVP implementation of Catalog Federation to remote Iceberg REST Catalogs (apache#1305) * Initial prototype of catalog federation just passing special properties into internal properties. Make Resolver federation-aware to properly handle "best-effort" resolution of passthrough facade entities. Targets will automatically reflect the longest-path that we happen to have stored locally and resolve grants against that path (including the degenerate case where the longest-path is just the catalog itself). This provides Catalog-level RBAC for passthrough federation. Sketch out persistence-layer flow for how connection secrets might be pushed down into a secrets-management layer. * Defined internal representation classes for connection config * Construct and initialize federated iceberg catalog based on connection config * Apply the same spec renames to the internal ConnectionConfiguration representations. * Manually pick @XJDKC fixes for integration tests and omittign secrets in response objects * Fix internal connection structs with updated naming from spec PR * Push CreateCatalogRequest down to PolarisAdminService::createCatalog just like UpdateCatalogRequest in updateCatalog. This is needed if we're going to make PolarisAdminService handle secrets management without ever putting the secrets into a CatalogEntity. * Add new interface UserSecretsManager along with a default implementation The default UnsafeInMemorySecretsManager just uses an inmemory ConcurrentHashMap to store secrets, but structurally illustrates the full flow of intended implementations. For mutual protection against a compromise of a secret store or the core persistence store, the default implementation demonstrates storing only an encrypted secret in the secret store, and a one-time-pad key in the returned referencePayload; other implementations using standard crypto protocols may choose to instead only utilize the remote secret store as the encryption keystore while storing the ciphertext in the referencePayload (e.g. using a KMS engine with Vault vs using a KV engine). Additionally, it demonstrates the use of an integrity check by storing a basic hashCode in the referencePayload as well. * Wire in UserSecretsManager to createCatalog and federated Iceberg API handlers Update the internal DPOs corresponding to the various ConnectionConfigInfo API objects to no longer contain any possible fields for inline secrets, instead holding the JSON-serializable UserSecretReference corresponding to external/offloaded secrets. CreateCatalog for federated catalogs containing secrets will now first extract UserSecretReferences from the CreateCatalogRequest, and the CatalogEntity will populate the DPOs corresponding to ConnectionConfigInfos in a secondary pass by pulling out the relevant extracted UserSecretReferences. For federated catalog requests, when reconstituting the actual sensitive secret configs, the UserSecretsManager will be used to obtain the secrets by using the stored UserSecretReferences. Remove vestigial internal properties from earlier prototypes. * Since we already use commons-codec DigestUtils.sha256Hex, use that for the hash in UnsafeInMemorySecretsManager just for consistency and to illustrate a typical scenario using a cryptographic hash. * Rename the persistence-objects corresponding to API model objects with a new naming convention that just takes the API model object name and appends "Dpo" as a suffix; * Use UserSecretsManagerFactory to Produce the UserSecretsManager (#1) * Move PolarisAuthenticationParameters to a top-level property according to the latest spec * Create a Factory for UserSecretsManager * Fix a typo in UnsafeInMemorySecretsManagerFactory * Gate all federation logic behind a new FeatureConfiguration - ENABLE_CATALOG_FEDERATION * Also rename some variables and method names to be consistent with prior rename to ConnectionConfigInfoDpo * Change ConnectionType and AuthenticationType to be stored as int codes in persistence objects. Address PR feedback for various nits and javadoc comments. * Add javadoc comment to IcebergCatalogPropertiesProvider * Add some constraints on the expected format of the URN in UserSecretReference and placeholders for next steps where we'd provide a ResolvingUserSecretsManager for example if the runtime ever needs to delegate to two different implementations of UserSecretsManager for different entities. Reduce the `forEntity` argument to just PolarisEntityCore to make it more clear that the implementation is supposed to extract the necessary identifier info from forEntity for backend cleanup and tracking purposes. --------- Co-authored-by: Rulin Xing <[email protected]> Co-authored-by: Rulin Xing <[email protected]> * Add Adnan and Neelesh to collaborators list (apache#1396) * Replace authentication filters with Quarkus Security (apache#1373) * Implement PolicyCatalogHandler and Add Policy Privileges Stage 1: CRUD + ListPolicies (apache#1357) * Add PolicyCatalogHandler and tests * Fix style * Address review comments * Address review comments 2 * fix nit * Remove CallContext.getAuthenticatedPrincipal() (apache#1400) * main: Update dependency info.picocli:picocli-codegen to v4.7.7 (apache#1408) * main: Update dependency com.google.errorprone:error_prone_core to v2.38.0 (apache#1404) * Add Polaris Community Meeting 2025-04-17 (apache#1409) * main: Update dependency boto3 to v1.37.37 (apache#1412) * EclipseLink: add PrimaryKey to policy mapping records JPA model (apache#1403) * Re-instate dependencies between Docker Compose services (apache#1407) * Do not rotate bootstrapped root credentials (apache#1414) * Add Getting Started Button to the Apache Polaris Webshite Homepage (apache#1406) * Core: change to return ApplicablePolicies (apache#1415) * Rename the Snapshot Retention policy (apache#1284) * Rename the Snapshot Retention policy * Resolve comments * Resolve comments --------- Co-authored-by: Yufei Gu <yufei.apache.org> * main: Update dependency com.adobe.testing:s3mock-testcontainers to v4.1.0 (apache#1419) * rename snapshotRetention to snashotExpiry (apache#1420) * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1744796716 (apache#1394) * main: Update dependency software.amazon.awssdk:bom to v2.31.26 (apache#1413) * main: Update dependency com.adobe.testing:s3mock-testcontainers to v4.1.1 (apache#1425) * Fix releaseEmailTemplate task (apache#1384) * Update distributions LICENSE and NOTICE with AWS SDK 2.31.26 update (apache#1423) * Support snapshots=refs (apache#1405) * initial commit * autolint * small revert * rebase * autolint * simpler * autolint * tests * autolint * stable * fix leak * ready for review * improved test * autolint * logic flip again * Update service/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java Co-authored-by: Alexandre Dutra <[email protected]> * Update integration-tests/src/main/java/org/apache/polaris/service/it/env/CatalogApi.java Co-authored-by: Alexandre Dutra <[email protected]> * adjustments for committed suggestions * autolint --------- Co-authored-by: Alexandre Dutra <[email protected]> * Remove activatedPrincipalRoles property from AuthenticatedPolarisPrincipal (apache#1410) This seems to be a leftover from when ActiveRolesProvider was introduced. The setter was still used, but the getter wasn't, which hints at the fact that this property can be safely removed. As a bonus, AuthenticatedPolarisPrincipal now becomes immutable, which is imho a very good thing. * Implement PolicyCatalogHandler and Add Policy Privileges Stage 2: AttachPolicy + DetachPolicy (apache#1416) * add auth test for attach/detach * apply formatter * refactor authorizePolicyAttachmentOperation * address comment * better naming * Ship eclipselink and PostgreSQL JDBC driver by default in Polaris distribution (apache#1411) * Fix Connection Config DPOs (apache#1422) * Fix connection config dpos * Run spotlessApply * Doc: Fix the issue that html tags are not working in Hugo (apache#1382) * Implement PolicyCatalogHandler Stage 3: GetApplicablePolicies (apache#1421) * [JDBC] Part2: Add Relational JDBC module (apache#1287) * Bump version to 0.11.0-beta-incubating-SNAPSHOT (apache#1429) * Make entity lookups by id honor the specified entity type (apache#1401) * Make entity lookups by id honor the specified entity type All implementations of `TransactionalPersistence.lookupEntityInCurrentTxn()` are currently ignoring the `typeCode` parameter completely and could potentially return an entity of the wrong type. This can become very concerning during authentication, since a principal lookup could return some entity that is not a principal, and that would be considered a successful authentication. * review * Remove "test" Authenticator (apache#1399) * Propagate SQLException as "caused by" (apache#1430) * Remove logging for DbOps (apache#1433) * Spark: Add regtests for Spark client to test built jars (apache#1402) * main: Update dependency com.google.cloud:google-cloud-storage-bom to v2.51.0 (apache#1436) * main: Update dependency org.testcontainers:testcontainers-bom to v1.21.0 (apache#1437) * main: Update actions/setup-python digest to a26af69 (apache#1440) * Spark-IT: use correct configurations (apache#1444) ... do not let Spark leak into Quarkus * PolarisRestCatalogIntegrationTest: Always purge generic tables (apache#1443) * Add missing Postgresql dependency (apache#1447) * Add Request Timeouts (apache#1431) * add timeout * add iceberg exception mapping * dont use quarkus bom, disable timeout * nits * Fix sparks sql regtests with up to date config (apache#1454) * Refactor BasePolarisTableOperations & BasePolarisViewOperations (apache#1426) * initial copy paste * Reorder * view copy paste * fixes, polish * stable * yank * CODE_COPIED_TO_POLARIS comments * autolint * update license * typofix * update comments * autolint * Use .sha512 extension instead of -sha512 (apache#1449) * main: Update dependency org.eclipse.microprofile.fault-tolerance:microprofile-fault-tolerance-api to v4.1.2 (apache#1451) * Doc: Update Local Root Principal Credentials in Quickstart (apache#1452) * Update the Getting Started Workflow with each Cloud Provider's Blob Storage (apache#1435) * AWS First Draft * Debug * revert typo * Add JQ to docker runtime * Debug, pt2 * debug * debug * Allow Instance Profile Roles * change random suffix * change instance profile to regular IAM roles * AWS Final Draft * Azure First Draft * debug * Azure First Draft * debug * typo * GCP First Try * GCP Complete * GCP Final * add all jars to Spark * refactor * Implement PolicyCatalogAdapter (apache#1438) * Generic Table/Policy Store: Move feature config check to Adapter and some small refactoring (apache#1465) * update refs (apache#1464) * [JDBC] Part3: Plumb JDBC module to Quarkus (apache#1371) * Allow BasePolarisTableOperations to skip refreshing metadata after a commit (apache#1456) * initial commit * fix another test * changes per comments * visibility * changes per review * autolint * oops * main: Update dependency com.fasterxml.jackson:jackson-bom to v2.19.0 (apache#1455) * Doc: Added set custom credentials instruction in README (apache#1461) * Doc: Add policy documentation (apache#1460) * main: Update dependency software.amazon.awssdk:bom to v2.31.30 (apache#1475) * main: Update dependency gradle to v8.14 (apache#1459) * main: Update dependency gradle to v8.14 * fix PR --------- Co-authored-by: Robert Stupp <[email protected]> * Remove unused class TokenInfoExchangeResponse (apache#1479) This is an oversight from apache#1399. * Upgrade Polaris to Iceberg 1.9.0 (apache#1309) * Doc: Update on access-control policy docs (apache#1472) * main: Update Quarkus Platform and Group (apache#1381) * Added link to the Spark-Jupyter Notebook Getting Started from the main Getting Started Page (apache#1453) * Added link to the Spark-Jupyter Notebook Getting Started from the main Quickstart page * Typo Co-authored-by: Eric Maynard <[email protected]> * Suggestions as per @eric-maynard's review * Fix Typo --------- Co-authored-by: Eric Maynard <[email protected]> * [JDBC] Support Policy (apache#1468) * Refactor EntityCache into an interface (apache#1193) * Refactor EntityCache to an interface * fix * spotless * Remove unused PolarisCredentialVendor.validateAccessToLocations() (apache#1480) * Remove unused PolarisCredentialVendor.validateAccessToLocations() * review: remove ValidateAccessResult and comments * Policy Store: Check whether Policy is in use before dropping and support `detach-all` flag (apache#1467) * fix error (apache#1492) * Ensure writeToPolicyMappingRecord update existing record if primary key equals in EclipseLink Persistence Impl (apache#1469) * update PolicyMappingRecord if not exists * update test * add TODO * Eliminate getCurrentContext() call in PolarisAuthorizerImpl (apache#1494) * Add getting-started for Polaris Spark Client with Delta tables (apache#1488) * Fix: Pull Postgres image automatically (apache#1495) * Fix Outdated Information and add Information regarding `docker compose down` to Quickstart (apache#1497) * Fix Outdated Information and Add Information regarding docker compose down to Quickstart * Revision 2 * Remove shutdown from README * typo * Upgrade Iceberg REST Spec to match Iceberg 1.8 (apache#1283) * prep for review * reset * more changes * fixes * github action change * another build change * try api revert * re-all * custom type mappings, rebuild * autolint * polish * yank custom types * update * autolint * wip * Revert build changes * example * autolint * Fix FileIOExceptionsTest to conform to new Iceberg 1.8 API (apache#1501) It looks like after apache#1283, this test no longer compiles as the Iceberg API has changed. I'm not sure how this wasn't caught by CI on that PR itself. * JDBC: Optimize writeEntity calls (apache#1496) * Remove transaction from atomic writes * remove if-else * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1745840590 (apache#1499) * Support for external identity providers (apache#1397) * JDBC: create objects without reflection (apache#1434) * Include quarkus-container-image and README in the binary distributions (apache#1493) * Site: Fix Management and Catalog Spec links (apache#1507) * Lazy iteration over JDBC ResultSet (apache#1487) * refactor * autolint * polish * autolint * changes per review * autolint * unwrapping caller * changes per review * Update distributions LICENSE and NOTICE with artifacts and versions sync (apache#1509) * Avoid using deprecated `NestedField.of()` (apache#1514) * Fix compile warning: unknown enum constant Id.NAME (apache#1513) * Doc: Add getting started with JDBC source (apache#1470) * Site: Add Polaris Spark client webpage under unreleased (apache#1503) * fix merge error * retrigger test * Fix test failure (apache#1541) * mitigate .snyk issue * revert file in this pr * add .snyk file * retrigger * move snyk file * retrigger * resolve conflict * retrigger * Revert "resolve conflict" This reverts commit 5d6427150cab67aad7a4eca37142e87316f514fc. * repick the change --------- Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: danielhumanmod <[email protected]> Co-authored-by: Mend Renovate <[email protected]> Co-authored-by: JB Onofré <[email protected]> Co-authored-by: Robert Stupp <[email protected]> Co-authored-by: MonkeyCanCode <[email protected]> Co-authored-by: Honah (Jonas) J. <[email protected]> Co-authored-by: Eric Maynard <[email protected]> Co-authored-by: Liam Bao <[email protected]> Co-authored-by: Yufei Gu <[email protected]> Co-authored-by: Alexandre Dutra <[email protected]> Co-authored-by: Dennis Huo <[email protected]> Co-authored-by: Travis Bowen <[email protected]> Co-authored-by: Travis Bowen <[email protected]> Co-authored-by: gh-yzou <[email protected]> Co-authored-by: Mansehaj Singh <[email protected]> Co-authored-by: Prashant Singh <[email protected]> Co-authored-by: Juichang Lu <[email protected]> Co-authored-by: David Lu <[email protected]> Co-authored-by: gfakbar20 <[email protected]> Co-authored-by: Adnan Hemani <[email protected]> Co-authored-by: Adnan Hemani <[email protected]> Co-authored-by: Neelesh Salian <[email protected]> Co-authored-by: Rulin Xing <[email protected]> Co-authored-by: Rulin Xing <[email protected]> Co-authored-by: fabio-rizzo-01 <[email protected]> Co-authored-by: Pierre Laporte <[email protected]> Co-authored-by: Richard Liu <[email protected]> Co-authored-by: Michael Collado <[email protected]> Co-authored-by: Owen Lin (You-Cheng Lin) <[email protected]> Co-authored-by: Eric Maynard <[email protected]> Co-authored-by: Andrew Guterman <[email protected]>
Remove the currently unused remoteUrl field from the top-level ExternalCatalog into the ConnectionConfigInfo as remoteUri instead for better consistency; remote catalogs in the future may be defined by arbitrary URIs that are not, for example, http(s) URLs.
This is just the spec definition for now, so it's not yet wired into the internal entity layer or persistence objects.
Allow extensibility of different connection types in the future even if we start with only an ICEBERG_REST type.
Similarly, provide extensibility for different authn mechanisms to use with the connection.
Relates to #540