Skip to content

Conversation

@dennishuo
Copy link
Contributor

This fixes #1076

Explicitly set transaction isolation level to SERIALIZABLE and set eclipselink.transaction.join-existing
Set the Isolation level in both a new EclipseLink SessionCustomizer as well as providing
default postgresql.conf and injecting it in the getting-started docker-compose.

Set eclipselink.transaction.join-existing in persistence.xml to force reads to go through
the same write connection per EntityManager session, otherwise reads are not consistent with
writes in a transaction.

Only call rollback() if the transaction is still active, otherwise we get 500 server errors.

Fix handling of non-SUCCESS ReturnStatus in BasePolarisCatalog for createTableLike/updateTableLike

Without this fix, even if the underlying database layer does the right thing in resolving
concurrency conflicts, the API call will incorrectly return a 200 success.

This isn't too common of a race condition under normal operation since BasePolarisCatalog
does redundant state-checking right before the transaction (which maybe should be removed
in the future for performance), but in theory this bug indeed could've caused dropped writes
under high concurrency.

…lipselink.transaction.join-existing

Set the Isolation level in both a new EclipseLink SessionCustomizer as well as providing
default postgresql.conf and injecting it in the getting-started docker-compose.

Set eclipselink.transaction.join-existing in persistence.xml to force reads to go through
the same write connection per EntityManager session, otherwise reads are not consistent with
writes in a transaction.

Only call rollback() if the transaction is still active, otherwise we get 500 server errors.
…eateTableLike/updateTableLike

Without this fix, even if the underlying database layer does the right thing in resolving
concurrency conflicts, the API call will incorrectly return a 200 success.

This isn't too common of a race condition under normal operation since BasePolarisCatalog
does redundant state-checking right before the transaction (which maybe should be removed
in the future for performance), but in theory this bug indeed could've caused dropped writes
under high concurrency.
@dennishuo
Copy link
Contributor Author

dennishuo commented Mar 2, 2025

@jbonofre I'm trying to fix the license documentation by including the following in the main LICENSE file:

This product includes code from PostgreSQL.

* getting-started/eclipselink/postgresql.conf

Home page: https://www.postgresql.org/
License: https://www.postgresql.org/about/licence/

However, it looks like our main LICENSE file only aggregates Apache 2.0 licensed project inclusions - is that true, is our LICENSE file only for Apache licenses? Also I guess the text of the PostgreSQL license mentions that the copy right notice and the additional two paragraphs themselves must appear in all copies, so should I include the PostgreSQL license file itself directly the polaris repo? Or copy paste it as a header into getting-started/eclipselink/postgresql.conf?

Alternatively, if PostgreSQL license is messy to incorporate especially if it has implications on downstream users of Polaris, I could rewrite the config update to be something where the docker-compose.yaml definition just appends a few extra config keys to the end of the conf file in the postgres image instead of displaying the entire postgres sample conf file in our getting-started directory.

Additionally, the file extension/persistence/eclipselink/src/main/java/org/apache/polaris/extension/persistence/impl/eclipselink/PolarisEclipseLinkSessionCustomizer.java is based on the EclipseLink tutorial from https://eclipse.dev/eclipselink/documentation/2.6/dbws/creating_dbws_services002.htm so should that be documented as code included from EclipseLink as well?

a note that the sample conf file can be found at the postgres github
mirror, and then only specify the settings we immediately care about.
@dennishuo
Copy link
Contributor Author

Okay, thinking about it more, including the whole sample conf file for postgres might be too cluttered anyways, so instead I just added a link to the postgres github repo where the sample conf file can be looked at for anything who wants to customize their postgres settings easily, and then only include the particular config changes we care about.

Comment on lines +31 to +32
max_wal_size = 1GB
min_wal_size = 80MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these just defaults, or where did these come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah they were in the postgres docker image. Originally I copy pasted the whole file but it was too cluttered so I just literally filtered to keep only the non-commented-out default settings.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

tr.rollback();
// For some transaction conflict errors, the transaction will already no longer be active;
// if it's still active, explicitly rollback.
if (tr.isActive()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess that might be the reason for 500 errors in my old test with Serializable isolation.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 3, 2025
@eric-maynard eric-maynard merged commit ca38c97 into apache:main Mar 3, 2025
5 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Mar 3, 2025
gh-yzou pushed a commit to gh-yzou/polaris that referenced this pull request Mar 6, 2025
…apache#1092)

* Explicitly set transaction isolation level to SERIALIZABLE and set eclipselink.transaction.join-existing
Set the Isolation level in both a new EclipseLink SessionCustomizer as well as providing
default postgresql.conf and injecting it in the getting-started docker-compose.

Set eclipselink.transaction.join-existing in persistence.xml to force reads to go through
the same write connection per EntityManager session, otherwise reads are not consistent with
writes in a transaction.

Only call rollback() if the transaction is still active, otherwise we get 500 server errors.

* Fix handling of non-SUCCESS ReturnStatus in BasePolarisCatalog for createTableLike/updateTableLike

Without this fix, even if the underlying database layer does the right thing in resolving
concurrency conflicts, the API call will incorrectly return a 200 success.

This isn't too common of a race condition under normal operation since BasePolarisCatalog
does redundant state-checking right before the transaction (which maybe should be removed
in the future for performance), but in theory this bug indeed could've caused dropped writes
under high concurrency.

* Instead of including the entire sample postgres.conf file, just include
a note that the sample conf file can be found at the postgres github
mirror, and then only specify the settings we immediately care about.

* Add a link to the EclipseLink tutorial for injecting a SessionCustomizer for
attribution of the PolarisEclipseLinkSessionCustomizer.java code.

* Fix format

* Add some other required default postgres.conf settings so that it can start successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate tables with same name in a namespace

4 participants