-
Notifications
You must be signed in to change notification settings - Fork 330
Extensible pagination token implementation #1938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
12ab744 to
1b42120
Compare
|
This approach also works for NoSQL #1189 (last commit in this branch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can avoid it, I think it's better not to change this. This will be a breaking change for any persistence implementation depending on wire-compatibility of the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @dennishuo for visibility as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a private constructor... @eric-maynard @dennishuo : could you give some more details about possible compatibility issues? What's the affected workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any roundtrips through JSON for EntitiesResult in Apache Polaris code. If this is a use case downstream, it would be nice to have tests in this repo to assert the expected behaviour.
That said, I think this specific change is not critical to pagination. EntitiesResult is not involved handling paginated data. So, I suppose this change can be rolled back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a "wire" either.
BTW: The same change wasn't considered an issue in 1838.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto the comment on wire-compatibility with previous versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-maynard : It's good that you're able to identify this change as a regression, but what if you're not available ? :) WDYT about encoding this expectation in a unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we use the property name i?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe short names are meant to reduce the side of the token on the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as explained in Token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is explained as doc in Token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be short for EntityId? Why not use an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That contradicts the intent of being extensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be expected that the iterator has no more items if the limit has been pushed down into whatever provided the items, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! It also looks like we do not push limits down to the database.
I propose to remove !it.hasNext() for the sake of simplicity, accept possible empty last pages, and work on exact page boundaries later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test for this - and it passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether the test passes because limits are not pushed down to Persistence queries anywhere 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments from the previous PR -- encoding/decoding a given PageToken clearly seems to be within the purview of a given PageToken implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one PageToken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rather avoid these if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto the comment on i -- this is not a good property name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is value? This seems a bit abstract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take: value is the implementation-specific token data (e.g. the entity ID in sorted SQL result sets).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible per the spec to request pagination by providing a page token without page size. This logic is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks correct to me. Propagating page size from the previous token in handled in PageTokenUtil.decodePageRequest()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is READ_EVERYTHING a Util? Seems like it would belong in PageToken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PageToken provides a convenience getting for "read everything". PageTokenUtil groups several implementations.
|
Thanks for taking a crack at this. I appreciate the extra extensibility compared to #1838, but considering this diff is twice the size of #1555, I'm not sure there's enough functionality here to justify all the changes. I took a quick pass for now, but will try to circle back and review the rest of the PR in more detail later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why break this into two calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think it came from my PR) No particular reason. Would you prefer a chained call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Token$TokenType looks awkward as a file name ($ and double Token)... WDYT about making TokenType a top-level class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the class name.
I'd really prefer keeping these very related things (Token + TokenType) together - you have to implement both.
polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageToken.java
Outdated
Show resolved
Hide resolved
191d99d to
6cdb73c
Compare
d072060 to
88ae32d
Compare
09b4b76 to
5ff8b7c
Compare
Based on apache#1838, following up on apache#1555 * Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations. * Extensible to (later) support (cryptographic) signatures to prevent tampered page-token * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil` * `PageToken` subclasses are no longer necessary. * Serialzation of `PageToken` uses Jackson serialization (smile format) Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed. Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: Eric Maynard <[email protected]>
5ff8b7c to
b67b787
Compare
|
Thanks all! |
|
This seems to have caused a regression: |
This reverts commit fb418a2.
|
|
||
| final class PageTokenUtil { | ||
|
|
||
| private static final ObjectMapper SMILE_MAPPER = new SmileMapper().findAndRegisterModules(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue I mentioned is likely happening when there are multiple Scala dependencies on the classpath?
If we are calling findAndRegisterModules, I wonder if we should be explicitly adding a dependency for Scala, e.g.:
implementation("org.scala-lang:scala-library:2.12.15")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. That code is for Polaris server, which doesn't have anything from Scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it might, depending on the extensions you have. From what I can tell this actually functions without findAndRegisterModules -- why do we need it?
The enforcement of the LIST_PAGINATION_ENABLED flag was missed in apache#1938. This change make the flag effective as discussed in apache#2296. Note: this causes a change in the default Polaris behaviour (no pagination by default) with respect to the previous state of `main`. However, there is no behaviour change with respect to 1.0.0 or 1.0.1 as previous releases did not have apache#1938.
The enforcement of the LIST_PAGINATION_ENABLED flag was missed in #1938. This change make the flag effective as discussed in #2296. Note: this causes a change in the default Polaris behaviour (no pagination by default) with respect to the previous state of `main`. However, there is no behaviour change with respect to 1.0.0 or 1.0.1 as previous releases did not have #1938.
The enforcement of the LIST_PAGINATION_ENABLED flag was missed in apache#1938. This change make the flag effective as discussed in apache#2296. Note: this causes a change in the default Polaris behaviour (no pagination by default) with respect to the previous state of `main`. However, there is no behaviour change with respect to 1.0.0 or 1.0.1 as previous releases did not have apache#1938.
* feat: enforce LIST_PAGINATION_ENABLED The enforcement of the LIST_PAGINATION_ENABLED flag was missed in #1938. This change make the flag effective as discussed in #2296. Note: this causes a change in the default Polaris behaviour (no pagination by default) with respect to the previous state of `main`. However, there is no behaviour change with respect to 1.0.0 or 1.0.1 as previous releases did not have #1938.
Based on #1838, following up on #1555
Tokenreferencing the "next page", encapsulated inPageToken. No changes topolaris-coreneeded to add customTokenimplementations.PageTokenUtilPageTokensubclasses are no longer necessary.PageTokenuses Jackson serialization (smile format)Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed.