Skip to content

Conversation

@eric-maynard
Copy link
Contributor

Since #1942, persistence implementations -- particularly the JDBC implementation -- have a concept of a schema "version" and Polaris versions are not necessarily coupled with schema versions. As a result, we can dynamically load the schema version at runtime to determine whether certain features are safe to enable.

However, there may be users who are running a schema version from before the concept of schema versioning was even introduced -- and those users might currently see an error when Polaris attempts to load from the version table. This PR adds a best-effort fallback to attempt to mitigate the error they would see by defaulting to schema version 0.

import org.apache.polaris.persistence.relational.jdbc.DatabaseType;

public class SchemaVersion implements Converter<SchemaVersion> {
public static final SchemaVersion MINIMUM = new SchemaVersion(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's why we want to use 0 -- the v1 schema file includes the VERSION table, so the lack of a VERSION table indicates an earlier (pre-1) schema.

We don't use this MINIMUM number to choose a schema file, only the reverse -- the detect what schema version an already-running metastore was bootstrapped with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v1 schema released with 1.0.x doesn't include the version table.
Also, do we want to distinguish v0 and v1 in the code base? I'd prefer not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need to distinguish them. For a given Polaris version e.g. 1.1.0, we have a notion of a v1 schema as defined by the v1 schema file and that contains the version table. If there's a metastore without the version table, it definitely can't be v1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might miss something. What would be behavior difference between a 1.0 v1 schema and a 1.1 v1 schema? I think they should be identical. The problem we are trying to resolve is that the 1.0 v1 schema doesn't have a version table when 1.1 Polaris deployed. The system behavior should be the same as if there is a version table with version 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have two different schema versions without any behavior difference between them, so I'm not sure I follow your question.

The problem we are trying to solve is identifying the schema version for a metastore without a schema version table. The schema version table starts at version 1. A metastore without this table is therefore a pre-1 version.

Copy link
Contributor

@dimas-b dimas-b Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we have any formal requirement for the schema versions to be comparable to each other numerically.

Current JDBC Persistence code makes some numerical comparisons, but I tend to think that this is an implementation concern. As long as the interpretation of schema version numbers is consistent with the schemas themselves, the code is fine.

With that in mind, using 0 for missing schema information should produce correct runtime behaviour, as far as I can tell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to use the schema version for things proposed here, https://lists.apache.org/thread/5d9rl1l2jflbbnrl12ofmczjbcw8qv89. "Pre-1" and "1" are going to have the same behavior in terms of how we handling the missing column location_without_schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor comment for me. Feel free to merge it as is.

@dimas-b dimas-b added this to the 1.2.0 milestone Sep 23, 2025
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the main point in this PR as allowing users to run new Polaris code with old database schemas. I think the PR is mostly good to merge in its current form.

The point on database-specific error parsing is nice to resolve, though, as it is likely to be helpful later, IMHO.

@dimas-b
Copy link
Contributor

dimas-b commented Sep 25, 2025

@eric-maynard : please rebase to fix markdown CI 🙂

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Sep 26, 2025
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have a slight preference for 1 over 0, considering 1 being the starting version number, but totally fine if we wanna forward with 0, The check i think we might wanna add later is < 2 than if we want to not project column location_without_schema which is also fine !

I will merge this PR by EOD today if no blocking comment come up to keep the ball rolling for 1.2

Thank you so much @eric-maynard !

@singhpk234 singhpk234 merged commit 5ea215a into apache:main Sep 30, 2025
14 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Sep 30, 2025
snazy added a commit to snazy/polaris that referenced this pull request Nov 20, 2025
* (Based on PR#2223)Support Namespace/Table level RBAC for external passthrough catalogs (apache#2673)

Creates missing synthetic entities for securables in external passthrough catalogs.
Based on Option 1 discussed in the RBAC section of catalog federation design doc.

In the future, we could remove calls to PolarisEntity.Builder() and replace them with entities fetched from the remote catalog. (enabling Option 2).

---------

Co-authored-by: Pooja Nilangekar <[email protected]>

* Docs: Add more details about v1 schema user to upgrade from 1.0 to 1.1 (apache#2674)

* Site: The link https://iceberg.apache.org/concepts/catalog/ doesn't exist anymore. (apache#2683)

* Docs: Add analytics for polaris.apache.org (apache#2676)

* Make ENABLE_SUB_CATALOG_RBAC_FOR_FEDERATED_CATALOGS configurable per catalog (apache#2688)

* Update ENABLE_SUB_CATALOG_RBAC_FOR_FEDERATED_CATALOGS to be configurable per catalog

* chore(deps): update postgres docker tag to v18 (apache#2692)

* fix(deps): update dependency org.eclipse.persistence:eclipselink to v4.0.8 (apache#2682)

* fix(deps): update dependency org.apache.logging.log4j:log4j-core to v2.25.2 (apache#2646)

* chore(deps): update dependency openapi-generator-cli to v7.15.0 (apache#2410)

* chore(deps): update dependency io.quarkus to v3.27.0 (apache#2663)

Co-authored-by: Mend Renovate <[email protected]>

* Publish Develocity builds scans for PRs and local use (apache#2596)

This PR enables Develocity build scans for all PRs and contributors w/o an Apache account.

CI build scans in the `apache/polaris` repo against branches and tags and having access to the ASF's Develocity secret continue to publish to the ASF's Develocity instance (no behavioral change).

All other build scans are published to Gradle's public Develocity instance:
- Build scans from local developer (non-CI) runs are only published, if Gradle is invoked with the `--scan` option.
- Build scans from or targeting another repository than `apache/polaris` do need be enabled explicity by accepting Gradle's terms of service, via a repository variable, because this is a decision of the owner of a repository.

Advanced options to configure another Develocity server or project-ID are available (for non-`apache/polaris` repositories).

Detailed instructions in the `README.md`.

* Fix & enhancements to the Events API hierarchy (apache#2629)

Summary of changes:

- Turned `PolarisEventListener` into an interface to facilitate implementation / mocking
- Added missing `implements PolarisEvent` to many event records
- Removed unused method overrides
- Added missing method overrides to `TestPolarisEventListener`

* fix(deps): update dependency org.kordamp.gradle:jandex-gradle-plugin to v2.3.0 (apache#2694)

* Auth: reorganize internal authentication components (apache#2634)

This PR contains no functional and no user-facing change. It is merely a refactor to better organize auth code.

Summary of changes:

- Moved all internal authentication components to the `org.apache.polaris.service.auth.internal` package and subpackages
- Reduced visibility of utility classes
- Renamed `TokenBroker` class hierarchy to stick to the naming standard: `<Algorithm>JWTBroker`
- Introduced `@PolarisImmutable` whenever appropriate
- Removed unused `NoneTokenBrokerFactory` (we already have `DisabledOAuth2ApiService`)
- Removed unused `TokenBrokerFactoryConfig`

* Enhancement : adding support for Aurora postgres AWS IAM authentication (apache#2650)

Add support for postgres AWS IAM authentication using the `apache-client` lib.

* Remove unused `name` arg from findCatalogByName in PolarisAdminService (apache#2691)

* remove unused name param

* Rename for better readability

* Fix a race condition in sendNotification where concurrent parent-namespace creation causes failures (apache#2693)

* Fix a race condition in sendNotification where concurrent parent-namespace creation causes failures

The semantics of the createNonExistingNamespaces method used during sendNotification were supposed
to be "create if needed". However, the behavior ended up surfacing an AlreadyExistsException
if multiple concurrent sendNotification attempts were made for a brand-new namespace (where
the notifications may be different tables). This would cause a table sync to fail if a sibling
table was being synced at the same time, even though the new table should successfully get created
under the shared namespace.

* Also better future-proof the createNamespaceInternal logic by explicitly
checking for ENTITY_ALREADY_EXISTS, per review suggestion.

Log a less scary message since it's not an error scenario type of race
condition, per review suggestion

* Client: add credential reset option (apache#2698)

* Client: add credential reset option

* Client: add credential reset option

* Client: add credential reset option

* Add integration testing

* Fix lint

* fix(deps): update dependency software.amazon.awssdk:bom to v2.34.5 (apache#2702)

* fix(deps): update dependency com.gradleup.shadow:shadow-gradle-plugin to v9.2.2 (apache#2661)

* Support S3 storage that does not have STS (apache#2672)

* Support S3 storage that does not have STS

This change is backward compatible with old catalogs that have storage configuration for S3 systems with STS.

* Add new property to S3 storage config: `stsUnavailable` (defaults to "available").

* Do not call STS when unavailable in `AwsCredentialsStorageIntegration`, but still put other properties (e.g. s3.endpoint) into `AccessConfig`

Relates to apache#2615
Relates apache#2207

* Docs/improve idp documentation (apache#2695)

* Fix Github links in IDP documentation

* Separate IDP docs for usage and development

* - Add telemetry config example
- Fix link to getting started from landing page
- Fix mentioning role-arn as required

* Fix some relative links (local Hugo resolves them properly, but PR auto checks still fails)

* Docs: narrow down --role-arn usage for AWS S3 only; fix a link in keycloak guide.

* Docs: fix a link in keycloak guide.

* chore(deps): update gradle/actions digest to 748248d (apache#2708)

* Client: fix integration testing (apache#2700)

* Add fallback in case the VERSION table is not present (apache#2653)

* initial commit

* wire up

* pastefix

* change to postgres specific code

* [Catalog Federation] Add feature flag to disallow setting sub-RBAC for federated catalog at catalog level (apache#2696)

In apache#2688 (comment), we've identified that configuring polaris.config.enable-sub-catalog-rbac-for-federated-catalogs at catalog level should not be allowed in all cases, especially when the owner is not the same subject as the catalog user or admin.

This PR add a feature flag, ALLOW_SETTING_SUB_CATALOG_RBAC_FOR_FEDERATED_CATALOGS to allow owner to disable catalog level setting polaris.config.enable-sub-catalog-rbac-for-federated-catalogs

* Fix `delegationModes` parameter propagation in `createTableStaged()` (apache#2713)

This is follow-up bugfix for apache#2589

The bugfix part apache#2711 is extracted here since apache#2711 proved to be
non-trivial and may require extra time.

* Use the `delegationModes` method parameter as intended (as opposed
  to a local constant).

* Generate Request IDs (if not specified); Return Request ID as a Header (apache#2602)

* fix(deps): update dependency org.junit:junit-bom to v5.14.0 (apache#2715)

* NoSQL persistence: add Java/Vert.X executor abstraction layer (apache#2527)

Provides an abstraction to submit asynchronous tasks, optionally with a delay or delay + repetition and implementations based on Java's `ThreadPoolExecutor` and Vert.X.

* Fix RDS devservices config + adopt for `:polaris-admin:test` (apache#2723)

Changes:
* Disables devservices for `:polaris-admin` tests as well, which is necessary to _not_ spin up test containers.
* Use the explicit devservices-config as everywhere else.

The first bullet point can cause excessive memory usage, especially with more test classes, eventually killing the whole GH runner.

* fix(deps): update dependency io.smallrye:jandex to v3.5.0 (apache#2722)

* fix(deps): update dependency org.jboss.weld:weld-junit5 to v5.0.2.final (apache#2721)

* chore(deps): update quay.io/keycloak/keycloak docker tag to v26.4.0 (apache#2719)

* Last merged commit 4024557

* NoSQL: Minor-ish changes to "nodes" projects

Adopt nodes projects to OSS PR content

* NoSQL: adapt to async package rename

* Build: remove unnecessary explicit vertx-core dependency

The async-vertx implementation should not propagate a different Vert.X dependency than Quarkus provides. This wouldn't be an issue if we could just use `enforcedPlatform()` for all Quarkus-builds, but sadly we cannot for the spark-plugin-inttests.

---------

Co-authored-by: Honah (Jonas) J. <[email protected]>
Co-authored-by: Pooja Nilangekar <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: JB Onofré <[email protected]>
Co-authored-by: Mend Renovate <[email protected]>
Co-authored-by: Alexandre Dutra <[email protected]>
Co-authored-by: fabio-rizzo-01 <[email protected]>
Co-authored-by: Dennis Huo <[email protected]>
Co-authored-by: Yong Zheng <[email protected]>
Co-authored-by: Dmitri Bourlatchkov <[email protected]>
Co-authored-by: olsoloviov <[email protected]>
Co-authored-by: Eric Maynard <[email protected]>
Co-authored-by: Adnan Hemani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants