Skip to content

Conversation

@XN137
Copy link
Contributor

@XN137 XN137 commented Aug 7, 2025

this is a followup to #2261

currently PolarisMetaStoreManager.listEntities only exposes a limited
subset of the underlying BasePersistence.listEntities functionality.

most of the callers have to post-process the EntityNameLookupRecord of
ListEntitiesResult and call PolarisMetaStoreManager.loadEntity
on the individual items sequentually to transform and filter them.

this is bad for the following reasons:

  • suboptimal performance as we run N+1 queries to basically load every
    entity twice from the persistence backend
  • suffering from race-conditions when entities get dropped between the
    listEntities and loadEntity call
  • a lot of repeated code in all the callers (of which only some are
    dealing with the race-condition by filtering out null values)

as a solution we add PolarisMetaStoreManager.loadEntities that takes
advantage of the already existing BasePersistence methods.
we rename one of the listEntities methods to loadEntities for
consistency.

since many callers dont need paging and want the result as a list, we
add PolarisMetaStoreManager.loadEntitiesAll as a convenient wrapper.

we also remove the PolarisEntity.nameAndId method as callers who only
need name and id should not be loading the full entity to begin with.

note we rework testCatalogNotReturnedWhenDeletedAfterListBeforeGet
from ManagementServiceTest because the simulated race-condition
scenario can no longer happen.

@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Aug 7, 2025
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch 3 times, most recently from b3363a7 to e10c856 Compare August 7, 2025 08:31
@XN137 XN137 marked this pull request as ready for review August 7, 2025 08:57
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch 3 times, most recently from c4acd84 to 4a62e67 Compare August 8, 2025 16:06
@XN137 XN137 marked this pull request as draft August 8, 2025 19:21
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 4a62e67 to 9793869 Compare August 11, 2025 07:27
@XN137 XN137 changed the title Rework PolarisMetaStoreManager.listEntities Add PolarisMetaStoreManager.loadEntities Aug 11, 2025
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 9793869 to f960edd Compare August 11, 2025 07:53
PolarisEntitySubType.NULL_SUBTYPE,
PolicyEntity::of)
.stream()
.filter(policyEntity -> policyType == null || policyEntity.getPolicyType() == policyType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note:
if the required policyType is null we could in theory use the optimized listEntities call, as we only need the name of the entity to build the PolicyIdentifier, but for filtering by policyType we need to load the full entity.

Copy link
Contributor

@dennishuo dennishuo Aug 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note:
if the required policyType is null we could in theory use the optimized listEntities call, as we only need the name of the entity to build the PolicyIdentifier, but for filtering by policyType we need to load the full entity.

This could be worth adding in a code comment here as a // TODO either as an optional followup task or if nothing else, calling attention to the different flavors of listing so that future code changes will put more thought into the choice of listing methods.

Ideally any use of these open-ended filters will either be a short-term crutch or a proven "small" use case where we think pushdown isn't worth the complexity. Longer term we could define a more structured definition of pushdown predicates that is still extensible but communicates filter semantics down to the BasePersistence layer enough to work with different database-level indexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added a TODO and pushed a follow-up to #2370

Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a big improvement, thanks for taking this on!

I can do a more detailed dive, but at a high level, I'd like to see if we can better sort out the responsibilities of the MetaStoreManager layer vs the BasePersistence layer here. Specifically, if we can pull up the entityFilter/transformer from BasePersistence to only live in the MetaStoreManager level, just as your #2317 is a push down of the entitySubType filter. Essentially:

  1. We want the [Base]Persistence layer to most closely represent what a lower-level database is directly capable of doing
  2. We put more "advanced" logic in the MetaStoreManager layer
  3. In the filter/transformer scenario, what we're really saying is "BasePersistence needs to give back complete PolarisBaseEntities instead of only EntityNameLookupRecords", and then something might run Polaris-side filtering/transforms -- logically this would be in MetaStoreManager or higher.

This also tells us why it made sense for the loadTasks to use a Predicate<> as the filter, but why subTypeCode should not be an opaque Predicate<> -- subTypeCode filtering is something the lower-level database is capable of understanding so therefore it is pushed down to the layer that represents the raw database.

Whereas advanced timestamp comparison, leasing, etc., of TaskEntities is a Polaris concept that the database isn't designed to understand directly (for now) so we fallback to the opaque Predicate.

Overall, I'd like to see if we can:

  1. Also rename the filter/transformer variations of listEntities within the BasePersistence, similar to how you introduced a totally different method name in the MetaStoreManager layer -- because now it's really not an overload of the same method, but really a different action entirely. I think loadEntities is reasonable to push down as the method name, but we just might want to consider the difference between "list and load full entities under a parent" and "load the full entities provided in this Collection" that we might need in the future
  2. Pull filter/transformer evaluation out of BasePersistence into MetaStoreManager
  3. Reconsider whether we actually need a Function<PolarisBaseEntity, T> generic type return value -- as far as I recall, originally we just abused that to "transform" a PolarisBaseEntity into a trimmed EntityNameLookupRecord, which was an antipattern as you identified, and then nearly every other case was just the "identity transformation". Now that we clarified the "EntityNameLookup" case, perhaps callsites never actually use the transformer anymore? I didn't have time to dig deeper yet into all the transformer use cases.

@XN137
Copy link
Contributor Author

XN137 commented Aug 13, 2025

thanks for the feedback, since the other PR is somewhat merge-ready i will wait for that and rebase this one afterwards.

Also rename the filter/transformer variations of listEntities within the BasePersistence

yeah we can use the same loadEntities name in BasePersistence imo

Pull filter/transformer evaluation out of BasePersistence into MetaStoreManager

i havent investigated this in detail yet but my current guess is that the filter needs to remain "pushed down" in order for pagination to work correctly, but will double check that later.

Reconsider whether we actually need a Function<PolarisBaseEntity, T> generic type return value

afaict the transformer is used heavily by all callers of load(All)Entities in this PR i.e. to turn base entities into their more specific type (for example via CatalogRoleEntity::of).
so i dont see that going away.

unless you mean that this transformation can also happen on the "outside" ?
this might be possible however one idea I had was that one might not need both the filter and the transformer but we could let the transformer return null for entities the caller does not need (or that cant be converted to the more specific type).
but even then the transformer needs to remain "Pushed down" to work well with pagination most likely.

@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch 2 times, most recently from 18e3529 to 9c7d42d Compare August 14, 2025 09:52
@XN137
Copy link
Contributor Author

XN137 commented Aug 14, 2025

rebased on latest main due to a few merge conflict. also included the rename of one of the BasePersistence.listEntities methods

adutra
adutra previously approved these changes Aug 14, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Aug 14, 2025
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 9c7d42d to 36657d7 Compare August 15, 2025 08:39
@XN137
Copy link
Contributor Author

XN137 commented Aug 15, 2025

rebased after trivial merge-conflict in PolarisAdminService.java

@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 36657d7 to ab37780 Compare August 15, 2025 15:48
Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is on the right track, thanks!

i havent investigated this in detail yet but my current guess is that the filter needs to remain "pushed down" in order for pagination to work correctly, but will double check that later.

Good point, I wonder if we've sufficiently defined the expected semantics of inline-filtering vs post-filtering, especially in how it interacts with pagination. But indeed we shouldn't break any existing behavior. Is there anything other than loadTasks right now that relies on the inline-filtering semantic? I believe the PolicyCatalog filter scenario is a post-filter right now. I'm now wondering if the Filter can just remain only in the BasePersistence to avoid breaking changes, and opting not to add a filter/transformer for now in the PolarisMetaStoreManager layer.

For defining the PolarisMetaStoreManager interface, as the PolarisMetaStoreManager is one of the main SPI components it's worth being somewhat more conservative in adding methods.

unless you mean that this transformation can also happen on the "outside" ?

Yeah, if possible, all the syntactic sugar type-cases like PrincipalEntity.of I'd prefer to just move out to the caller for now.

The more I mull it over the the more it seems like the previous additions of filter and transformer to BasePersistence were a mistake in hindsight, especially as long as it was used simply as syntactic sugar rather than defining any true pushdown semantic, except for the filtering in loadTasks (which we want to rework anyways), and we might want to avoid perpetuating it for now until we:

  1. Have a concrete use case that we're sure needs to be truly pushed-down
  2. Have a plan with guardrails in place to carefully define what's allowed to be pushed down

Right now, every method signature in PolarisMetaStoreManager is PolarisCallContext plus JSON-serializable Plain Old Data types, and this was intentional historically, but maybe never fully documented in developer guidelines. This means PolarisMetaStoreManager is a runtime-configurable SPI type that is capable of being an RPC implementation. (Incidentally, for full disclosure I'm mentioning this not just as a philosophical observation, but can vouch for the RPC pattern being an actual stakeholder use case in production).

It's technically possible to add basic loadEntities that preserves this invariant for now without the filter/transformer arguments, while still providing the benefit of the more fundamental functionality of fetching entire entities all at once.

It's also certainly conceivable to intentionally evolve the interface to no longer satisfy the Plain Old Data types constraint, but would simply be a longer discussion.

How would you feel about these next steps?

  1. Remove filter and transformer from the new PolarisMetaStoreManager::loadEntities method for now
  2. No longer have the different loadEntities vs loadEntitiesAll for now since we'll deal with filters/transformers later, and even if filter/transformer use cases are really added, the base method remains useful forexisting callsites (I appreciate you breaking these out separately though! Very helpful to start distinguishing callsites that wanted syntactic-sugar tranformations without filters, vs those that actually need filters)
  3. Relegate "syntactic-sugar" transformation conversions to be the responsibility of the caller for now -- AFAICT in the current code all of the transformers just mean moving a FooEntity::of out of the method argument down two lines into a stream().map(...) call -- I know technically this is still less efficient than pushdown stream-transformation, but the size of the immediate members of a PolarisBaseEntity are small compared to the actual internalProperties/properties which would be copied by reference to the type-specific wrapper
  4. Followup discussion on mailing list for how to properly think about "filter pushdown", with PolicyCatalog likely being the primary driving use case for PolicyType filtering.

For filtering, it seems to me:

  1. Use cases that are fine with strict post-filtering remain at the caller
  2. Partial-pushdown "stream filtering" is basically what we have for loadTasks today and really nothing else. This doesn't help much with efficiency of what's loaded from the database, but is helpful for already-filtered pagination. Whether we want to generalize this partial-pushdown scenario needs to be discussed
  3. For critical high-scale filtering use cases in the future, we really need a structure that allows more extensible "full pushdown" to the database. This means expressing the filters in well-structured/enumerated ways that cooperate with definitions of secondary indexes on the database. This means a departure from just taking a java Predicate. Structured predicate pushdown here would not only be higher-performance, but would also allow preserving having only Plain Old Data interfaces in PolarisMetaStoreManager.

PolarisEntitySubType.NULL_SUBTYPE,
PolicyEntity::of)
.stream()
.filter(policyEntity -> policyType == null || policyEntity.getPolicyType() == policyType)
Copy link
Contributor

@dennishuo dennishuo Aug 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note:
if the required policyType is null we could in theory use the optimized listEntities call, as we only need the name of the entity to build the PolicyIdentifier, but for filtering by policyType we need to load the full entity.

This could be worth adding in a code comment here as a // TODO either as an optional followup task or if nothing else, calling attention to the different flavors of listing so that future code changes will put more thought into the choice of listing methods.

Ideally any use of these open-ended filters will either be a short-term crutch or a proven "small" use case where we think pushdown isn't worth the complexity. Longer term we could define a more structured definition of pushdown predicates that is still extensible but communicates filter semantics down to the BasePersistence layer enough to work with different database-level indexes.

null,
PolarisEntityType.CATALOG,
PolarisEntitySubType.ANY_SUBTYPE,
CatalogEntity::of)
Copy link
Contributor

@dennishuo dennishuo Aug 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these type-wrappers from https://github.com/apache/polaris/pull/2261/files the only usage of transformer that aren't Identity transformations now? Alongside the PolicyEntity.of in PolicyCatalog?

I guess I can see how it's a bit cleaner being able to pass these into the metaStoreManager and if callers don't end up doing yet another .stream().map(...).toList() maybe there's avoidance of a bit of memory copying between lists, but as we're also starting to clarify better in the mailing list discussion about Polaris SPIs, the PolarisMetaStoreManager is one critical interface boundary, and having the open-ended transformer introduces some subtle pitfalls.

Previously, it was already a bit dicey having the Transformer in the BasePersistence::listEntities method but at least it was contained because BasePersistence has been de-facto package-private in its usage (i.e. IcebergCatalog/PolarisAdminService interacts with MetaStoreManager, not BasePersistence), so the blast radius has luckily remained somewhat contained.

One example pitfall is when considering how this pushdown benefits from now having the ability to achieve SNAPSHOT_READ isolation for list contents, and how that translates into running inside the runInTransaction for the TransactionalMetaStoreManager branch, then if we're willing to run arbitrary transformer functions inside the transactional critical block we need callers to "promise" they'll only provide "trivial" transformers.

Otherwise, I could imagine someday a "convenient" heavyweight transformation, such as for example, a remote-catalog entity resolver that uses a PolarisBaseEntity as just a passthrough facade, leaking in and causing problems by making slow external remote-catalog network requests within the transactional/snapshot read section.

Also, right now the interfaces permit the MetaStoreManager implementation and/or BasePersistence implementation to potentially perform concurrent underlying operations if desired to parallelize i.e. sharded list results; depending on whether the filter/transformer are applied inline or as post-processing, we'd also need to know if the provided functions are thread-safe.

If we really want to expose it, we should at least document some initial basic contraints/guidelines in the method jaavadocs:

  1. Must be "lightweight" without network/IO dependencies
  2. Must not cause re-entrant transactions in the database layer
  3. Must be threadsafe

These probably need to apply to both the Transformer and the Filter.

Otherwise, my preferred approach is to change all of these FooEntity::of transformer use cases to simply pop them out to the caller like:

return metaStoreManager
    .loadEntitiesAll(
        getCurrentPolarisContext(),
        null,
        PolarisEntityType.CATALOG,
        PolarisEntitySubType.ANY_SUBTYPE)  // No 'transformer' arg
   .stream()
   .map(CatalogEntity::of);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've pushed a review commit that removes filter+transformer from the metastoremanager api, as currently its not necessary indeed.

in a followup we should revisit whether the BasePersistence api really needs those params or not but i would prefer to not handle it in this PR.

@collado-mike
Copy link
Contributor

The more I mull it over the the more it seems like the previous additions of filter and transformer to BasePersistence were a mistake in hindsight, especially as long as it was used simply as syntactic sugar rather than defining any true pushdown semantic, except for the filtering in loadTasks (which we want to rework anyways),

the filter/transformer arguments were specifically added to support the loadTasks API 😅. Personally, I see a lot of potential usefulness in the ability to do pushdown filtering especially with regard to pagination (If I specify a page of 10 tables, don't give me back 8 and then tell me there are more pages because you did the filter after the list). But you're totally right regarding the serialization problem. Without a filter/transform argument, we'd have to add a new method for every list* method that needed to support a new filter predicate. Admittedly, the in-memory filter isn't as efficient as it would be if we, say, expressed the predicate in SQL and pushed down into the actual database call, but it's still useful for iterating over statement results and terminating the results iteration after the correct number of results have passed the filter.

It's technically possible to add basic loadEntities that preserves this invariant for now without the filter/transformer arguments, while still providing the benefit of the more fundamental functionality of fetching entire entities all at once.

I think we need to add this anyway. I've been planning on some proposed changes for fetching principal roles during authentication and improving our list iteration. Both loadEntitiesByName and loadEntitiesById seem like they solve the N+1 problem and also avoid trying to serialize lambdas over the wire.

@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from ab37780 to e099796 Compare August 16, 2025 08:07
@XN137
Copy link
Contributor Author

XN137 commented Aug 18, 2025

Both loadEntitiesByName and loadEntitiesById seem like they solve the N+1 problem and also avoid trying to serialize lambdas over the wire.

we could use those but we still would be running 2 queries instead of 1 i.e. for getting all principals.

note that implementations of PolarisMetaStoreManager.loadEntities are free to use whatever mechanism they think is right, i.e. they can continue to use the current anti-pattern or use listEntities + loadEntitiesById ... the important bit is that we can present a uniform view to the callers (of which there are many).

yet while BasePersistence.loadEntities exists, this seems to be most natural/efficient implementation for PolarisMetaStoreManager.loadEntities.

@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from e099796 to 3cc904d Compare August 19, 2025 15:46
dennishuo
dennishuo previously approved these changes Aug 20, 2025
Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates! LGTM other than a minor comment about expanding the javadoc description.

@Nonnull PageToken pageToken);

/**
* Load entities with pagination
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's expand this javadoc description just a bit to make it more obvious that this method is effectively a "list and load entities under a parent", and maybe include some statement pointing at {@link #listEntities} to make the intention clear of making this listing/pagination behavior here match that of listEntities.

Michael's mention of future additions of loadEntitiesById and loadEntitiesByName had just reminded me that the naming could feel ambiguous, so javadocs can save developers time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added another commit to try and clarify the differences in the javadoc, hope that matches your suggestion

@dennishuo
Copy link
Contributor

It's technically possible to add basic loadEntities that preserves this invariant for now without the filter/transformer arguments, while still providing the benefit of the more fundamental functionality of fetching entire entities all at once.

Both loadEntitiesByName and loadEntitiesById seem like they solve the N+1 problem and also avoid trying to serialize lambdas over the wire.

we could use those but we still would be running 2 queries instead of 1 i.e. for getting all principals.

@collado-mike I interpreted your statement to just be an observation that such "batch load" methods would indeed be a useful option while my mention may have been ambiguous between "load entities under parent", "load batch by id", and "load batch by name", rather than a statement against a "load entities under parent", is that right?

adutra
adutra previously approved these changes Aug 20, 2025
XN137 added 3 commits August 20, 2025 10:48
currently `PolarisMetaStoreManager.listEntities` only exposes a limited
subset of the underlying `BasePersistence.listEntities` functionality.

most of the callers have to post-process the `EntityNameLookupRecord` of
`ListEntitiesResult` and call `PolarisMetaStoreManager.loadEntity`
on the individual items sequentually to transform and filter them.

this is bad for the following reasons:

- suboptimal performance as we run N+1 queries to basically load every
  entity twice from the persistence backend
- suffering from race-conditions when entities get dropped between the
  `listEntities` and `loadEntity` call
- a lot of repeated code in all the callers (of which only some are
  dealing with the race-condition by filtering out null values)

as a solution we add `PolarisMetaStoreManager.loadEntities` that takes
advantage of the already existing `BasePersistence` methods.
we rename one of the `listEntities` methods to `loadEntities` for
consistency.

since many callers dont need paging and want the result as a list, we
add `PolarisMetaStoreManager.loadEntitiesAll` as a convenient wrapper.

we also remove the `PolarisEntity.nameAndId` method as callers who only
need name and id should not be loading the full entity to begin with.

note we rework `testCatalogNotReturnedWhenDeletedAfterListBeforeGet`
from `ManagementServiceTest` because the simulated race-condition
scenario can no longer happen.
snazy
snazy previously approved these changes Aug 20, 2025
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this PR is ready to be merged.

@XN137 XN137 dismissed stale reviews from snazy, adutra, and dennishuo via 597e81b August 20, 2025 09:09
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 3cc904d to 597e81b Compare August 20, 2025 09:09
@dennishuo dennishuo merged commit b49cbc5 into apache:main Aug 20, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Aug 20, 2025
@XN137 XN137 deleted the Rework-PolarisMetaStoreManager.listEntities branch August 21, 2025 06:07
XN137 added a commit to XN137/polaris that referenced this pull request Aug 21, 2025
this is a follow-up to apache#2290

the optimization is to use `listEntities` instead of `loadEntities` when
there is `policyType` filter to apply
dimas-b pushed a commit that referenced this pull request Aug 26, 2025
this is a follow-up to #2290

the optimization is to use `listEntities` instead of `loadEntities` when
there is `policyType` filter to apply
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants