-
Notifications
You must be signed in to change notification settings - Fork 333
Persistence implementations for list pagination #1555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1c383cf to
66d20aa
Compare
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the approach implemented here yields correct results. The code assumes strict ordering of integer IDs, which is from general experience w/ relational DBs and in particular looking at org.apache.polaris.extension.persistence.relational.jdbc.IdGenerator not the case.
| String query = QueryGenerator.generateSelectQuery(new ModelEntity(), params); | ||
|
|
||
| if (pageToken instanceof EntityIdPageToken entityIdPageToken) { | ||
| query += String.format(" AND id > %d ORDER BY id ASC", entityIdPageToken.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work with org.apache.polaris.extension.persistence.relational.jdbc.IdGenerator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that it won't; this logic is copied from EclipseLink where IDs are always increasing but does not work with the current way that the JDBC metastore creates IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose to change the Page/PageToken contract in a way to push the parameter "as is" down to the persistence layer and let the persistence implementation deal with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke with @singhpk234 who noted this is probably the same discussion as here on the old PR. With that context, I think we might be OK here.
IMO it's alright that the list ordering you'd get across metastores won't be the same. Other than that difference, seems like everything should work with JDBC's IdGenerator. Although the IDs aren't generated sequentially, pagination only uses the entity ID as an essentially arbitrary consistent ordering.
The key implication here is that if an entity gets created in the middle of a listing operation (e.g. between list calls 2 and 3) it may or may not show up in the next page. An alternative would be to try to filter it out so that the behavior is more obvious & consistent, but I think the simple approach that ultimately gives the user a chance to see these new entities is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Losing new entities that are stored after pagination start is fine from my POV. The JDBC persistence does not implement catalog-level versioning, so this is unavoidable, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that we will naturally lose some entities, the question is whether we are OK with entities stored after pagination start being lost nondeterministically rather than always. Right now, whether the new entity is lost or not depends on what entity ID it gets. If it gets a high entity ID you might see it in a later page and if it gets a low ID you might not.
My thought on this question is "yes", because it's better to show the entity if we can and it simplifies the code.
But if we feel like this is too unintuitive, we can add a secondary filter on the entity's creation time to try and get rid of these entities (on a best-effort basis, since clocks are not perfect).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think current pagination behaviour wrt concurrent changes is fine.
Making it deterministic would be a great addition to Polaris, but that, I think, has a much broader scope. For example, if an entry is deleted after pagination starts, but a client re-submits a page request using an old token, the new response would still be inconsistent with the old response.
From my POV a complete and deterministic pagination solution implies catalog-level versioning.
On this note, I think it's not true actually. The code assumes that IDs are sortable but it doesn't rely on any kind of semantic meaning behind this comparison. So IDs can be created totally randomly and you can still paginate simply by breaking that random key space into pages of some size. There's no assumption that new entries will appear at the end, either. |
adnanhemani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments, overall LGTM
| try { | ||
| String[] parts = token.split("/"); | ||
| if (parts.length < 1) { | ||
| throw new IllegalArgumentException("Invalid token format: " + token); | ||
| } else if (parts[0].equals(EntityIdPageToken.PREFIX)) { | ||
| int resolvedPageSize = pageSize == null ? Integer.parseInt(parts[2]) : pageSize; | ||
| return new EntityIdPageToken(Long.parseLong(parts[1]), resolvedPageSize); | ||
| } else { | ||
| throw new IllegalArgumentException("Unrecognized page token: " + token); | ||
| } | ||
| } catch (NumberFormatException | IndexOutOfBoundsException e) { | ||
| LOGGER.debug(e.getMessage()); | ||
| throw new IllegalArgumentException("Invalid token format: " + token); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, personally, find this fragment a bit more complex than it may need to be. Is there a reason why we cannot defensively check for the right amount array length after splitting it right away? Same for the NumberFormationException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to structure the code in a way that obviously leaves the door open for other PageToken implementations -- those would have different array length expectations. So we check the prefix first, and then parse the token using the logic appropriate for the PageToken implementation that the prefix corresponds to. Ideally we could even push this parsing logic down into some method in the PageToken.
In the old PR, there were 2 parseable PageToken implementations. I do agree that it looks a little clunky with the single PageToken implementation we have now. If this really is too confusing I can simplify this logic and then we can re-complicate it if/when we add a new PageToken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this really is too confusing I can simplify this logic and then we can re-complicate it if/when we add a new PageToken.
I'd personally prefer this - but don't care enough if we do this or not, I can understand the reasoning.
...ce/common/src/test/java/org/apache/polaris/service/persistence/pagination/PageTokenTest.java
Show resolved
Hide resolved
...ce/common/src/test/java/org/apache/polaris/service/persistence/pagination/PageTokenTest.java
Show resolved
Hide resolved
Following up on apache#1555 * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in PageTokenUtil * PageToken subclasses are no longer necessary. EntityIdPaging handles pagination over ordered result sets with static helper methods. Co-authored-by: Eric Maynard <[email protected]>
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Based on apache#1838, following up on apache#1555 * Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations. * Extensible to (later) support (cryptographic) signatures to prevent tampered page-token * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil` * `PageToken` subclasses are no longer necessary. * Serialzation of `PageToken` uses Jackson serialization (smile format) Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed. Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: Eric Maynard <[email protected]>
Based on #1838, following up on #1555 * Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations. * Extensible to (later) support (cryptographic) signatures to prevent tampered page-token * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil` * `PageToken` subclasses are no longer necessary. * Serialzation of `PageToken` uses Jackson serialization (smile format) Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed. Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: Eric Maynard <[email protected]>
In #1528, we introduced the interface changes necessary to paginate requests to listTables, listViews, and listNamespaces. This PR adds the persistence-level logic for pagination and a new PageToken type
EntityIdPageTokenused to paginate requests based on entity ID.