Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 24, 2022

This speeds up synthetic source for numbers and dates by loading them
column by column. On cached disk blocks, on average it's only 1ms per
for 1k documents, but it seems to help a fair bit in the worst case
and I expect it'll help much more on non-cached disk blocks.

|   50th percentile service time | default_1k | 32.9131 | 31.6141 | ms |  -3.95% |
|   90th percentile service time | default_1k | 34.937  | 34.8247 | ms |  -0.32% |
|   99th percentile service time | default_1k | 42.2246 | 40.0853 | ms |  -5.07% |
| 99.9th percentile service time | default_1k | 54.0964 | 41.993  | ms | -22.37% |
|  100th percentile service time | default_1k | 55.2969 | 53.4642 | ms |  -3.31% |

@nik9000 nik9000 requested a review from romseygeek June 24, 2022 16:51
@nik9000 nik9000 marked this pull request as ready for review June 24, 2022 16:53
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit and one question left.

/**
* Load all values for all docs up front. This should be much more
* disk and cpu-friendly than {@link ImmediateLeaf} because it resolves
* the values all at once, keeping the disk .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping the disk ... in suspense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Override
public boolean advanceToDoc(int docId) throws IOException {
return hasValue = leaf.advanceExact(docId);
idx = Arrays.binarySearch(docIdsInLeaf, docId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do a binary search here, given that we know we're visiting the docs from docIdsInLeaf in-order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying everytime folks call this we just advance idx and check? I think that's fine, yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. Won't make a big difference for small sets of documents, but with size=1000 and lots of fields then I think avoiding the binary search is a win.

@nik9000 nik9000 mentioned this pull request Jun 27, 2022
50 tasks
@nik9000 nik9000 merged commit bd14930 into elastic:master Jun 27, 2022
@nik9000 nik9000 added the :Search/Search Search-related issues that do not fall into other categories label Jun 27, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 27, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search/Search Search-related issues that do not fall into other categories :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v8.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants