Skip to content

Conversation

@snazy
Copy link
Member

@snazy snazy commented Jun 25, 2025

Based on #1838, following up on #1555

  • Allows multiple implementations of Token referencing the "next page", encapsulated in PageToken. No changes to polaris-core needed to add custom Token implementations.
  • Extensible to (later) support (cryptographic) signatures to prevent tampered page-token
  • Refactor pagination code to delineate API-level page tokens and internal "pointers to data"
  • Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size.
  • Concentrate the logic of combining page size requests and previous tokens in PageTokenUtil
  • PageToken subclasses are no longer necessary.
  • Serialzation of PageToken uses Jackson serialization (smile format)

Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed.

@snazy
Copy link
Member Author

snazy commented Jun 25, 2025

This approach also works for NoSQL #1189 (last commit in this branch)

@dimas-b dimas-b requested a review from eric-maynard June 25, 2025 16:25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can avoid it, I think it's better not to change this. This will be a breaking change for any persistence implementation depending on wire-compatibility of the result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @dennishuo for visibility as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a private constructor... @eric-maynard @dennishuo : could you give some more details about possible compatibility issues? What's the affected workflow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any roundtrips through JSON for EntitiesResult in Apache Polaris code. If this is a use case downstream, it would be nice to have tests in this repo to assert the expected behaviour.

That said, I think this specific change is not critical to pagination. EntitiesResult is not involved handling paginated data. So, I suppose this change can be rolled back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a "wire" either.
BTW: The same change wasn't considered an issue in 1838.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto the comment on wire-compatibility with previous versions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard : It's good that you're able to identify this change as a regression, but what if you're not available ? :) WDYT about encoding this expectation in a unit test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we use the property name i?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe short names are meant to reduce the side of the token on the wire.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as explained in Token

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is explained as doc in Token

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be short for EntityId? Why not use an enum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That contradicts the intent of being extensible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be expected that the iterator has no more items if the limit has been pushed down into whatever provided the items, no?

Copy link
Contributor

@dimas-b dimas-b Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! It also looks like we do not push limits down to the database.

I propose to remove !it.hasNext() for the sake of simplicity, accept possible empty last pages, and work on exact page boundaries later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test for this - and it passes

Copy link
Contributor

@dimas-b dimas-b Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether the test passes because limits are not pushed down to Persistence queries anywhere 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments from the previous PR -- encoding/decoding a given PageToken clearly seems to be within the purview of a given PageToken implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one PageToken.

Comment on lines +30 to +32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rather avoid these if possible

Copy link
Contributor

@eric-maynard eric-maynard Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto the comment on i -- this is not a good property name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is value? This seems a bit abstract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take: value is the implementation-specific token data (e.g. the entity ID in sorted SQL result sets).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible per the spec to request pagination by providing a page token without page size. This logic is not correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks correct to me. Propagating page size from the previous token in handled in PageTokenUtil.decodePageRequest()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is READ_EVERYTHING a Util? Seems like it would belong in PageToken

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PageToken provides a convenience getting for "read everything". PageTokenUtil groups several implementations.

@eric-maynard
Copy link
Contributor

Thanks for taking a crack at this. I appreciate the extra extensibility compared to #1838, but considering this diff is twice the size of #1555, I'm not sure there's enough functionality here to justify all the changes. I took a quick pass for now, but will try to circle back and review the rest of the PR in more detail later.

Comment on lines +377 to +385
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why break this into two calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think it came from my PR) No particular reason. Would you prefer a chained call?

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token$TokenType looks awkward as a file name ($ and double Token)... WDYT about making TokenType a top-level class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the class name.
I'd really prefer keeping these very related things (Token + TokenType) together - you have to implement both.

@snazy
Copy link
Member Author

snazy commented Jun 26, 2025

This change enables pagination also for #1189, whereas #1555 does not.

@snazy snazy force-pushed the pagination-alt-flex branch 3 times, most recently from 191d99d to 6cdb73c Compare July 1, 2025 04:56
@snazy snazy force-pushed the pagination-alt-flex branch 3 times, most recently from d072060 to 88ae32d Compare July 7, 2025 08:49
@snazy snazy force-pushed the pagination-alt-flex branch 5 times, most recently from 09b4b76 to 5ff8b7c Compare July 14, 2025 19:26
Based on apache#1838, following up on apache#1555

* Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations.
* Extensible to (later) support (cryptographic) signatures to prevent tampered page-token
* Refactor pagination code to delineate API-level page tokens and internal "pointers to data"
* Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size.
* Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil`
* `PageToken` subclasses are no longer necessary.
* Serialzation of `PageToken` uses Jackson serialization (smile format)

Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed.

Co-authored-by: Dmitri Bourlatchkov <[email protected]>
Co-authored-by: Eric Maynard <[email protected]>
@snazy snazy force-pushed the pagination-alt-flex branch from 5ff8b7c to b67b787 Compare July 15, 2025 15:55
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jul 15, 2025
@snazy snazy merged commit fb418a2 into apache:main Jul 16, 2025
12 checks passed
@snazy snazy deleted the pagination-alt-flex branch July 16, 2025 05:35
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jul 16, 2025
@snazy
Copy link
Member Author

snazy commented Jul 16, 2025

Thanks all!

@eric-maynard
Copy link
Contributor

This seems to have caused a regression:

2025-07-16 10:01:26,449 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] [,POLARIS] [,,,] (executor-thread-1) HTTP Request to /api/management/v1/catalogs failed, error id: afec5ba7-078c-4917-8d4d-2feaf219fe13-1: java.util.ServiceConfigurationError: com.fasterxml.jackson.databind.Module: Provider com.fasterxml.jackson.module.scala.DefaultScalaModule could not be instantiated
	at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:586)
	at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:813)
	at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:729)
	at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1403)
	at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1161)
	at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1145)
	at com.fasterxml.jackson.databind.ObjectMapper.findAndRegisterModules(ObjectMapper.java:1195)
	at org.apache.polaris.core.persistence.pagination.PageTokenUtil.<clinit>(PageTokenUtil.java:46)
	at org.apache.polaris.core.persistence.pagination.PageToken.readEverything(PageToken.java:70)
	at org.apache.polaris.service.admin.PolarisAdminService.listCatalogsUnsafe(PolarisAdminService.java:964)


final class PageTokenUtil {

private static final ObjectMapper SMILE_MAPPER = new SmileMapper().findAndRegisterModules();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue I mentioned is likely happening when there are multiple Scala dependencies on the classpath?

If we are calling findAndRegisterModules, I wonder if we should be explicitly adding a dependency for Scala, e.g.:

implementation("org.scala-lang:scala-library:2.12.15")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. That code is for Polaris server, which doesn't have anything from Scala.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it might, depending on the extensions you have. From what I can tell this actually functions without findAndRegisterModules -- why do we need it?

dimas-b added a commit to dimas-b/polaris that referenced this pull request Aug 19, 2025
The enforcement of the LIST_PAGINATION_ENABLED flag was missed in apache#1938.
This change make the flag effective as discussed in apache#2296.

Note: this causes a change in the default Polaris behaviour (no pagination
by default) with respect to the previous state of `main`. However, there
is no behaviour change with respect to 1.0.0 or 1.0.1 as previous releases
did not have apache#1938.
dimas-b added a commit that referenced this pull request Aug 20, 2025
The enforcement of the LIST_PAGINATION_ENABLED flag was missed in #1938.
This change make the flag effective as discussed in #2296.

Note: this causes a change in the default Polaris behaviour (no pagination
by default) with respect to the previous state of `main`. However, there
is no behaviour change with respect to 1.0.0 or 1.0.1 as previous releases
did not have #1938.
dimas-b added a commit to dimas-b/polaris that referenced this pull request Aug 20, 2025
The enforcement of the LIST_PAGINATION_ENABLED flag was missed in apache#1938.
This change make the flag effective as discussed in apache#2296.

Note: this causes a change in the default Polaris behaviour (no pagination
by default) with respect to the previous state of `main`. However, there
is no behaviour change with respect to 1.0.0 or 1.0.1 as previous releases
did not have apache#1938.
dimas-b added a commit that referenced this pull request Aug 20, 2025
* feat: enforce LIST_PAGINATION_ENABLED

The enforcement of the LIST_PAGINATION_ENABLED flag was missed in #1938.
This change make the flag effective as discussed in #2296.

Note: this causes a change in the default Polaris behaviour (no pagination
by default) with respect to the previous state of `main`. However, there
is no behaviour change with respect to 1.0.0 or 1.0.1 as previous releases
did not have #1938.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants