-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Synthetic source numbers in columns #88025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic source numbers in columns #88025
Conversation
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one nit and one question left.
| /** | ||
| * Load all values for all docs up front. This should be much more | ||
| * disk and cpu-friendly than {@link ImmediateLeaf} because it resolves | ||
| * the values all at once, keeping the disk . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping the disk ... in suspense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @Override | ||
| public boolean advanceToDoc(int docId) throws IOException { | ||
| return hasValue = leaf.advanceExact(docId); | ||
| idx = Arrays.binarySearch(docIdsInLeaf, docId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do a binary search here, given that we know we're visiting the docs from docIdsInLeaf in-order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying everytime folks call this we just advance idx and check? I think that's fine, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. Won't make a big difference for small sets of documents, but with size=1000 and lots of fields then I think avoiding the binary search is a win.
We always call in increasing order
|
Pinging @elastic/es-search (Team:Search) |
This speeds up synthetic source for numbers and dates by loading them
column by column. On cached disk blocks, on average it's only 1ms per
for 1k documents, but it seems to help a fair bit in the worst case
and I expect it'll help much more on non-cached disk blocks.