-
Notifications
You must be signed in to change notification settings - Fork 331
Add PolarisAdminService.loadEntities helper #2261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tmater
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XN137! Nifty change, overall LGTM!
| catalogPath = null; | ||
| catalogId = 0; | ||
| } else { | ||
| catalogPath = PolarisEntity.toCoreList(List.of(catalogEntity)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simplified, PolarisEntity.toCoreList() returns null when the input is null or empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had noticed that toCoreList has some null handling but if we passed PolarisEntity.toCoreList(List.of(null)) i think it would still result in an NPE ?
imo having the explicit "if null" makes the overall flow clearer either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, makes sense, forgot about that NPE.
| .map( | ||
| nameAndId -> | ||
| metaStoreManager.loadEntity( | ||
| getCurrentPolarisContext(), catalogId, nameAndId.getId(), nameAndId.getType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick question, is there a specific reason we need to save the catalogId earlier? I'm asking because nameAndId already provides getCatalogId().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be true but i am only following what the existing code was doing... whether getCatalogId() always returns the right value for all types of entities idk, so i was just sticking to the existing code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to elaborate: in listCatalogsUnsafe the returned entity should likely return the same value for getId and getCatalogId (not sure if it does) but it seems like the loadEntity api still requires us to pass 0 since catalogs are top level entites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying it!
`PolarisAdminService` has multiple spots where it is working around the sub-optimal `PolarisMetaStoreManager` APIs. This results in multiple fixes like: PR-1949 PR-2258 While eventually the underlying APIs should be improved, for now we can make a single central workaround and clean up some redundant code. Also we can improve the return types as callers are not interested in details of the entity layer.
7075d13 to
529774f
Compare
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The culprit here is that although all matching entities are known during the list-operation, all you get from it is this EntityNameLookupRecord, so you have to load the same entities again.
Having a counterpart of PolarisMetaStoreManager#listEntities that doesn't yield the unnecessarily reduced result but the actual entities would help.
WDYT?
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XN137 for doing this. LGTM!
| catalogPath = PolarisEntity.toCoreList(List.of(catalogEntity)); | ||
| catalogId = catalogEntity.getId(); | ||
| } | ||
| // TODO: add loadEntities method to PolarisMetaStoreManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: We will need a discussion on how persistence layer is going to support it, as it affects all types of persistence. Loading everything in one call can provide a consistent view, which is nice, but there are some caveats that the uber call may be too large, so that it hits the limits(e.g., memory limit). With that, I think it's premature to consider this as a TODO item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaict under the hood listEntities is already fetching the fully fledged PolarisBaseEntity instances from the database (for the jdbc case):
Lines 507 to 513 in af69d9f
| datasourceOperations.executeSelectOverStream( | |
| query, | |
| new ModelEntity(), | |
| stream -> { | |
| var data = stream.filter(entityFilter); | |
| results.set(Page.mapped(pageToken, data, transformer, EntityIdToken::fromEntity)); | |
| }); |
it includes streaming/pagination.
it just happens that the given transformer turns PolarisBaseEntity into EntityNameLookupRecord... and then later the record is used to look up the full entity again.
so afaict, the memory footprint would not be very different if we had a loadEntities function (or changed listEntities to not only return a EntityNameLookupRecord).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so afaict, the memory footprint would not be very different if we had a loadEntities function
I agree. I also think the overall load is lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm not against the idea, but we will need a discussion.
yeah this is mentioned in the commit message and implied by the added TODO. |
PolarisAdminServicehas multiple spots where it is working around the sub-optimalPolarisMetaStoreManagerAPIs.This results in multiple fixes like:
#1949
#2258
While eventually the underlying APIs should be improved, for now we can make a single central workaround and clean up some redundant code. Also we can improve the return types as callers are not interested in details of the entity layer.