Synthetic source numbers in columns #88025

nik9000 · 2022-06-24T16:51:57Z

This speeds up synthetic source for numbers and dates by loading them
column by column. On cached disk blocks, on average it's only 1ms per
for 1k documents, but it seems to help a fair bit in the worst case
and I expect it'll help much more on non-cached disk blocks.

|   50th percentile service time | default_1k | 32.9131 | 31.6141 | ms |  -3.95% |
|   90th percentile service time | default_1k | 34.937  | 34.8247 | ms |  -0.32% |
|   99th percentile service time | default_1k | 42.2246 | 40.0853 | ms |  -5.07% |
| 99.9th percentile service time | default_1k | 54.0964 | 41.993  | ms | -22.37% |
|  100th percentile service time | default_1k | 55.2969 | 53.4642 | ms |  -3.31% |

elasticmachine · 2022-06-24T16:53:09Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

romseygeek

LGTM, one nit and one question left.

romseygeek · 2022-06-27T08:27:39Z

server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

+        /**
+         * Load all values for all docs up front. This should be much more
+         * disk and cpu-friendly than {@link ImmediateLeaf} because it resolves
+         * the values all at once, keeping the disk .


keeping the disk ... in suspense?

https://www.youtube.com/watch?v=wlwnbcxBuzI

romseygeek · 2022-06-27T08:30:23Z

server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

                @Override
                public boolean advanceToDoc(int docId) throws IOException {
-                    return hasValue = leaf.advanceExact(docId);
+                    idx = Arrays.binarySearch(docIdsInLeaf, docId);


Do we need to do a binary search here, given that we know we're visiting the docs from docIdsInLeaf in-order?

Are you saying everytime folks call this we just advance idx and check? I think that's fine, yes.

Yes, exactly. Won't make a big difference for small sets of documents, but with size=1000 and lots of fields then I think avoiding the binary search is a win.

We always call in increasing order

elasticmachine · 2022-06-27T17:33:57Z

Pinging @elastic/es-search (Team:Search)

nik9000 added 7 commits June 24, 2022 10:58

WIP

847ea80

This?

d15f19e

asfd

849ef5a

Better?

b55988a

Merge branch 'master' into synthetic_source_numbers_in_columns

92662a0

So busted

a2aa322

ADSF

69d44a9

nik9000 added >non-issue :StorageEngine/TSDB You know, for Metrics v8.4.0 labels Jun 24, 2022

nik9000 requested a review from romseygeek June 24, 2022 16:51

nik9000 marked this pull request as ready for review June 24, 2022 16:53

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 24, 2022

nik9000 added 2 commits June 24, 2022 13:03

Fixup

97e83dd

What

77fc9a9

romseygeek approved these changes Jun 27, 2022

View reviewed changes

nik9000 added 3 commits June 27, 2022 08:10

Merge branch 'master' into synthetic_source_numbers_in_columns

02293f0

words are hard

077fbbb

Remove a binary search

3e994d4

We always call in increasing order

nik9000 mentioned this pull request Jun 27, 2022

Synthetic Source #86603

Closed

50 tasks

Merge branch 'master' into synthetic_source_numbers_in_columns

c17a7dd

nik9000 merged commit bd14930 into elastic:master Jun 27, 2022

nik9000 added the :Search/Search Search-related issues that do not fall into other categories label Jun 27, 2022

elasticmachine added the Team:Search Meta label for search team label Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Synthetic source numbers in columns #88025

Synthetic source numbers in columns #88025

Uh oh!

nik9000 commented Jun 24, 2022

Uh oh!

elasticmachine commented Jun 24, 2022

Uh oh!

romseygeek left a comment

Uh oh!

romseygeek Jun 27, 2022

Uh oh!

nik9000 Jun 27, 2022

Uh oh!

romseygeek Jun 27, 2022

Uh oh!

nik9000 Jun 27, 2022

Uh oh!

romseygeek Jun 27, 2022

Uh oh!

elasticmachine commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Synthetic source numbers in columns #88025

Synthetic source numbers in columns #88025

Uh oh!

Conversation

nik9000 commented Jun 24, 2022

Uh oh!

elasticmachine commented Jun 24, 2022

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

romseygeek Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

romseygeek Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

romseygeek Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants