Skip to content

Conversation

@XN137
Copy link
Contributor

@XN137 XN137 commented Aug 5, 2025

PolarisAdminService has multiple spots where it is working around the sub-optimal PolarisMetaStoreManager APIs.

This results in multiple fixes like:
#1949
#2258

While eventually the underlying APIs should be improved, for now we can make a single central workaround and clean up some redundant code. Also we can improve the return types as callers are not interested in details of the entity layer.

Copy link
Contributor

@tmater tmater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XN137! Nifty change, overall LGTM!

catalogPath = null;
catalogId = 0;
} else {
catalogPath = PolarisEntity.toCoreList(List.of(catalogEntity));
Copy link
Contributor

@tmater tmater Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simplified, PolarisEntity.toCoreList() returns null when the input is null or empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had noticed that toCoreList has some null handling but if we passed PolarisEntity.toCoreList(List.of(null)) i think it would still result in an NPE ?

imo having the explicit "if null" makes the overall flow clearer either way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, makes sense, forgot about that NPE.

.map(
nameAndId ->
metaStoreManager.loadEntity(
getCurrentPolarisContext(), catalogId, nameAndId.getId(), nameAndId.getType()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick question, is there a specific reason we need to save the catalogId earlier? I'm asking because nameAndId already provides getCatalogId().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be true but i am only following what the existing code was doing... whether getCatalogId() always returns the right value for all types of entities idk, so i was just sticking to the existing code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to elaborate: in listCatalogsUnsafe the returned entity should likely return the same value for getId and getCatalogId (not sure if it does) but it seems like the loadEntity api still requires us to pass 0 since catalogs are top level entites

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying it!

`PolarisAdminService` has multiple spots where it is working around the
sub-optimal `PolarisMetaStoreManager` APIs.

This results in multiple fixes like:
PR-1949
PR-2258

While eventually the underlying APIs should be improved, for now we can
make a single central workaround and clean up some redundant code.
Also we can improve the return types as callers are not interested in
details of the entity layer.
@XN137 XN137 force-pushed the Add-PolarisAdminService.loadEntites branch from 7075d13 to 529774f Compare August 5, 2025 13:57
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The culprit here is that although all matching entities are known during the list-operation, all you get from it is this EntityNameLookupRecord, so you have to load the same entities again.
Having a counterpart of PolarisMetaStoreManager#listEntities that doesn't yield the unnecessarily reduced result but the actual entities would help.
WDYT?

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XN137 for doing this. LGTM!

catalogPath = PolarisEntity.toCoreList(List.of(catalogEntity));
catalogId = catalogEntity.getId();
}
// TODO: add loadEntities method to PolarisMetaStoreManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We will need a discussion on how persistence layer is going to support it, as it affects all types of persistence. Loading everything in one call can provide a consistent view, which is nice, but there are some caveats that the uber call may be too large, so that it hits the limits(e.g., memory limit). With that, I think it's premature to consider this as a TODO item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict under the hood listEntities is already fetching the fully fledged PolarisBaseEntity instances from the database (for the jdbc case):

datasourceOperations.executeSelectOverStream(
query,
new ModelEntity(),
stream -> {
var data = stream.filter(entityFilter);
results.set(Page.mapped(pageToken, data, transformer, EntityIdToken::fromEntity));
});

it includes streaming/pagination.

it just happens that the given transformer turns PolarisBaseEntity into EntityNameLookupRecord... and then later the record is used to look up the full entity again.
so afaict, the memory footprint would not be very different if we had a loadEntities function (or changed listEntities to not only return a EntityNameLookupRecord).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so afaict, the memory footprint would not be very different if we had a loadEntities function

I agree. I also think the overall load is lower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm not against the idea, but we will need a discussion.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Aug 5, 2025
@XN137
Copy link
Contributor Author

XN137 commented Aug 6, 2025

The culprit here is that although all matching entities are known during the list-operation, all you get from it is this EntityNameLookupRecord, so you have to load the same entities again.
Having a counterpart of PolarisMetaStoreManager#listEntities that doesn't yield the unnecessarily reduced result but the actual entities would help.
WDYT?

yeah this is mentioned in the commit message and implied by the added TODO.
but a fix/change at the persistence layer is more involved, so as a first step and immediate replacement for PR-2258 i thought centralizing the workaround is useful, as it doesnt prevent or delay the proper fix.

@snazy snazy merged commit c2b5de1 into apache:main Aug 7, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Aug 7, 2025
@XN137 XN137 deleted the Add-PolarisAdminService.loadEntites branch August 7, 2025 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants