Skip to content

Conversation

@dennishuo
Copy link
Contributor

The semantics of the createNonExistingNamespaces method used during sendNotification were supposed to be "create if needed". However, the behavior ended up surfacing an AlreadyExistsException if multiple concurrent sendNotification attempts were made for a brand-new namespace (where the notifications may be different tables). This would cause a table sync to fail if a sibling table was being synced at the same time, even though the new table should successfully get created under the shared namespace.

…space creation causes failures

The semantics of the createNonExistingNamespaces method used during sendNotification were supposed
to be "create if needed". However, the behavior ended up surfacing an AlreadyExistsException
if multiple concurrent sendNotification attempts were made for a brand-new namespace (where
the notifications may be different tables). This would cause a table sync to fail if a sibling
table was being synced at the same time, even though the new table should successfully get created
under the shared namespace.
HonahX
HonahX previously approved these changes Sep 26, 2025
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I have one comment about an improvement we could make to prevent potential NPE in future development.

createNamespaceInternal(nsLevel, Collections.emptyMap(), resolvedParent);
try {
createNamespaceInternal(nsLevel, Collections.emptyMap(), resolvedParent);
} catch (AlreadyExistsException aee) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we throw AlreadyExistsException if the result is not success without checking whether the return status is ENTITY_ALREADY_EXIST. Despite ENTITY_ALREADY_EXIST is the only error status thrown in existing metastore manager impl, furture development and other implementation may include other error status like UNEXPECTED_ERROR, where we do not want to ignore and log here.

I think we could add an additional check

if (returnedEntity == null) {
throw new AlreadyExistsException(
"Cannot create namespace %s. Namespace already exists", namespace);
}

to only throw AlreadyExistsException when return status is that, otherwise throw a RuntimeException/IllegalState/...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated to throw ServiceFailureException if returnStatus isn't ENTITY_ALREADY_EXISTS.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Sep 26, 2025
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some non-blocking comments.

LOGGER
.atInfo()
.setCause(aee)
.addKeyValue("namespace", namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it supposed to be referenced in the message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I actually normally default to the traditional format-string with curly-braces, but I think others have been pushing for the more modern structured logging convention so I'm been forcing myself to use it :)

Structured logging is indeed more beneficial for cases where logs sinks are good at having structured-log elements pulled out into separate semi-structured fields for efficient aggregations, etc.

The convention as I understand it is to make the main log message avoid any direct mention of the structured params: https://www.slf4j.org/manual.html

// using classical API
logger.debug("oldT={} newT={} Temperature changed.", oldT, newT);

// using fluent API 
logger.atDebug().setMessage("Temperature changed.").addKeyValue("oldT", oldT).addKeyValue("newT", newT).log();

As opposed to addArgument which is for the traditional inline string-replacement into the human-friendly string, the addKeyvalue is standalone, and usually log outputters still show the addKeyValue fields conveniently in the log message, but for machine readers they are independent from the natural-language message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added a comment above the log statement better elucidating the considerations

createNamespaceInternal(nsLevel, Collections.emptyMap(), resolvedParent);
} catch (AlreadyExistsException aee) {
LOGGER
.atInfo()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why INFO? If we expect it to be a normal and expected call path, "informing" log readers every time it happens seems to be an overkill from the operations perspective. I propose debug (non-blocking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was on the fence. It's one of those cases where it's "normal" as something that will happen time-to-time, but usually isn't deterministically triggered since here the attempt to create the namespace is only made if the immediate preceding call thinks the namespace is missing. So it would be fairly infrequent, only correlating with periods of high parent-creation contention (not just normal periods of high concurrency on an already-existing namespace).

I had even started atWarn but figured that's too alarming, so downgraded to info.

My thought is that if there are unforeseen bugs relating to the state of these "create if needed" namespaces they're more likely to be related to these collisions than to normal operation, so even if someone sets their log-viewer to exclude DEBUG level it'd be easy to eyeball a correlation between this log statement and whatever other buggy behavior comes up.

I also don't feel too strongly though so I'm happy to downgrade it if there's a preference for DEBUG despite this rationale .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. No need to change :)

.setCause(aee)
.addKeyValue("namespace", namespace)
.log(
"Namespace already exists in createNonExistingNamespace; possible race condition");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible race condition sounds like it is a problem to address, but the intention of this PR seems to be to treat this case as a "normal" situation 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, removed the second clause to make it less scary.

checking for ENTITY_ALREADY_EXISTS, per review suggestion.

Log a less scary message since it's not an error scenario type of race
condition, per review suggestion
@HonahX HonahX merged commit f2e3848 into apache:main Sep 27, 2025
14 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Sep 27, 2025
fabio-rizzo-01 added a commit to fabio-rizzo-01/polaris that referenced this pull request Sep 29, 2025
This PR contains no functional and no user-facing change. It is merely a refactor to better organize auth code.

Summary of changes:

- Moved all internal authentication components to the `org.apache.polaris.service.auth.internal` package and subpackages
- Reduced visibility of utility classes
- Renamed `TokenBroker` class hierarchy to stick to the naming standard: `<Algorithm>JWTBroker`
- Introduced `@PolarisImmutable` whenever appropriate
- Removed unused `NoneTokenBrokerFactory` (we already have `DisabledOAuth2ApiService`)
- Removed unused `TokenBrokerFactoryConfig`

Enhancement : adding support for Aurora postgres AWS IAM authentication (apache#2650)

Add support for postgres AWS IAM authentication using the `apache-client` lib.

Remove unused `name` arg from findCatalogByName in PolarisAdminService (apache#2691)

* remove unused name param

* Rename for better readability

Fix a race condition in sendNotification where concurrent parent-namespace creation causes failures (apache#2693)

* Fix a race condition in sendNotification where concurrent parent-namespace creation causes failures

The semantics of the createNonExistingNamespaces method used during sendNotification were supposed
to be "create if needed". However, the behavior ended up surfacing an AlreadyExistsException
if multiple concurrent sendNotification attempts were made for a brand-new namespace (where
the notifications may be different tables). This would cause a table sync to fail if a sibling
table was being synced at the same time, even though the new table should successfully get created
under the shared namespace.

* Also better future-proof the createNamespaceInternal logic by explicitly
checking for ENTITY_ALREADY_EXISTS, per review suggestion.

Log a less scary message since it's not an error scenario type of race
condition, per review suggestion

Client: add credential reset option (apache#2698)

* Client: add credential reset option

* Client: add credential reset option

* Client: add credential reset option

* Add integration testing

* Fix lint

fix(deps): update dependency software.amazon.awssdk:bom to v2.34.5 (apache#2702)

fix(deps): update dependency com.gradleup.shadow:shadow-gradle-plugin to v9.2.2 (apache#2661)

added Aurora postgres to metastore documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants