TSDB: Speed up _id query #84928

nik9000 · 2022-03-14T01:44:59Z

This speeds up the term query on _id in time series indices by
skipping segments that don't contain any matching @timestamps.

nik9000 · 2022-03-14T01:49:59Z

@henningandersen told me that he and @jpountz had talked about using @timestamp to skip whole segments for the _id query that we use for duplicate checking. I realized tonight that it'd be reasonably easy to do in time series mode because we know that there is an @timestamp and we can extract it from the _id. My totally non-scientific tests show this is faster. On my data not earth shakingly faster, but faster.

This speeds up the `term` query on `_id` in time series indices by skipping segments that don't contain any matching `@timestamp`s.

jpountz · 2022-03-14T11:27:26Z

LUCENE-8980 is the Lucene JIRA that added the early exit when the term to look up isn't in the range managed by a segment. And you can see the associated speedup with annotation CS on nightly benchmarks.

jpountz · 2022-03-14T11:32:56Z

@nik9000 I'm not familiar with the index-time logic of deduplication for TSDB, would this change only result in a search-time speedup for term queries, or would it also speedup ingestion by more efficiently skipping irrelevant segments?

nik9000 · 2022-03-14T12:01:19Z

We'd have to plumb it into the deduplicetion logic.

…

On Mon, Mar 14, 2022, 7:33 AM Adrien Grand ***@***.***> wrote: @nik9000 <https://github.com/nik9000> I'm not familiar with the index-time logic of deduplication for TSDB, would this change only result in a search-time speedup for term queries, or would it also speedup ingestion by more efficiently skipping irrelevant segments? — Reply to this email directly, view it on GitHub <#84928 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUXIROZHQUEH4WE5TFGNTU74PXJANCNFSM5QUFITSQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

nik9000 · 2022-03-14T12:04:30Z

The speedup I saw in my tests was fairly in line with the benchmark. I'd have to use rally to have a ton of confidence in the numbers. But it's in that ballpark.

…

On Mon, Mar 14, 2022, 8:01 AM Nikolas Everett ***@***.***> wrote: We'd have to plumb it into the deduplicetion logic. On Mon, Mar 14, 2022, 7:33 AM Adrien Grand ***@***.***> wrote: > @nik9000 <https://github.com/nik9000> I'm not familiar with the > index-time logic of deduplication for TSDB, would this change only result > in a search-time speedup for term queries, or would it also speedup > ingestion by more efficiently skipping irrelevant segments? > > — > Reply to this email directly, view it on GitHub > <#84928 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AABUXIROZHQUEH4WE5TFGNTU74PXJANCNFSM5QUFITSQ> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

jpountz · 2022-03-14T12:53:31Z

server/src/main/java/org/elasticsearch/index/mapper/TsidExtractingIdFieldMapper.java

+                    return new MatchNoDocsQuery();
+                }
+                long timestamp = ByteUtils.readLongLE(suffix, 8);
+                return new TermQuery(new Term(NAME, new BytesRef(id))) {


Would it be an option to encode the timestamp as a prefix of the id in big endian order so that the optimization would work out-of-the-box without needing a custom query?

Yeah. I'm certainly thinking about it.

I put the timestamp last to get the most shared prefixes. Maybe a bad choice, but it's what I was doing then. We can change it.

I encoded the timestamp in little endian because we had a little endian method. I still have an open follow up to evaluate flipping it.

A (very late) update - we care a lot about the size of the _id now. And we think that an encoding scheme like @jpountz mentioned would make it much bigger. So we're likely going to need this query.

jpountz · 2022-03-21T16:56:53Z

It looks correct to me. The thing I'm unclear about is whether the complexity is worth the benefits as I can't think of many use-cases for doing ID queries on timeseries data. It feels to me like the important thing to do would be to have this skipping logic for index-time deduplication?

nik9000 · 2022-03-21T19:30:14Z

It looks correct to me. The thing I'm unclear about is whether the complexity is worth the benefits as I can't think of many use-cases for doing ID queries on timeseries data. It feels to me like the important thing to do would be to have this skipping logic for index-time deduplication?

Same. I'll try and pick this up and some point and rig up indexing deduplication. And I'll also see what it'd cost us to get the deduplication for free by putting the timestamp at the front of the id.

elasticsearchmachine added the v8.2.0 label Mar 14, 2022

TSDB: Speed up _id query

9939d28

This speeds up the `term` query on `_id` in time series indices by skipping segments that don't contain any matching `@timestamp`s.

jpountz reviewed Mar 14, 2022

View reviewed changes

nik9000 mentioned this pull request Mar 14, 2022

Add better support for metric data types (TSDB) #74660

Closed

salvatore-campagna added v8.3.0 and removed v8.2.0 labels Mar 30, 2022

craigtaverner added v8.4.0 and removed v8.3.0 labels May 25, 2022

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:08

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022

kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

pugnascotia added v8.10.0 and removed v8.9.0 labels Jun 22, 2023

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

nik9000 removed the v8.16.0 label Aug 14, 2024

nik9000 closed this Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TSDB: Speed up _id query #84928

TSDB: Speed up _id query #84928

Uh oh!

nik9000 commented Mar 14, 2022

Uh oh!

nik9000 commented Mar 14, 2022

Uh oh!

jpountz commented Mar 14, 2022

Uh oh!

jpountz commented Mar 14, 2022

Uh oh!

nik9000 commented Mar 14, 2022 via email

Uh oh!

nik9000 commented Mar 14, 2022 via email

Uh oh!

jpountz Mar 14, 2022 •

edited

Loading

Uh oh!

nik9000 Mar 14, 2022

Uh oh!

nik9000 Mar 14, 2022

Uh oh!

nik9000 Oct 31, 2022

Uh oh!

jpountz commented Mar 21, 2022

Uh oh!

nik9000 commented Mar 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

TSDB: Speed up _id query #84928

TSDB: Speed up _id query #84928

Uh oh!

Conversation

nik9000 commented Mar 14, 2022

Uh oh!

nik9000 commented Mar 14, 2022

Uh oh!

jpountz commented Mar 14, 2022

Uh oh!

jpountz commented Mar 14, 2022

Uh oh!

nik9000 commented Mar 14, 2022 via email

Uh oh!

nik9000 commented Mar 14, 2022 via email

Uh oh!

jpountz Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Oct 31, 2022

Choose a reason for hiding this comment

Uh oh!

jpountz commented Mar 21, 2022

Uh oh!

nik9000 commented Mar 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

jpountz Mar 14, 2022 •

edited

Loading