Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 13, 2021

I found myself needing support for something like filter_path on
XContentParser. It was simple enough to plug it in so I did. Then I
realized that it might offer more memory efficient source filtering
(#25168) so I put together a quick benchmark comparing the source
filtering that we do in _search.

Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:

Benchmark                                          (excludes)  (includes)  (source)  Mode  Cnt     Score    Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message     short  avgt    5  2360.342 ±  4.715  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message     short  avgt    5  2010.278 ± 15.042  ns/op
FetchSourcePhaseBenchmark.filterXContentOnParser                  message     short  avgt    5  1588.446 ± 18.593  ns/op

The top line is the way we filter now. The middle line is adding a
filter to XContentBuilder - something we can do right now without any
of my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.

This isn't particularly impresive. 33% sounds great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.

But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:

Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt     Score     Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4k_field  avgt    5  5957.128 ± 117.402  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4k_field  avgt    5  4999.073 ±  96.003  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4k_field  avgt    5  3261.478 ±  48.879  ns/op

That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!

But have a look at how things come out when you want a single field from
a 4 megabyte document:

Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt        Score        Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4m_field  avgt    5  8266343.036 ± 176197.077  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4m_field  avgt    5  6227560.013 ±  68306.318  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4m_field  avgt    5  1617153.472 ±  80164.547  ns/op

These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.

Take a look at what you get when I turn on gc metrics:

FetchSourcePhaseBenchmark.filterObjects                          message  one_4m_field  avgt    5   7036097.561 ±  84721.312   ns/op
FetchSourcePhaseBenchmark.filterObjects:·gc.alloc.rate           message  one_4m_field  avgt    5      2166.613 ±     25.975  MB/sec
FetchSourcePhaseBenchmark.filterXContentOnBuilder                message  one_4m_field  avgt    5   6104595.992 ±  55445.508   ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder:·gc.alloc.rate message  one_4m_field  avgt    5      2496.978 ±     22.650  MB/sec
FetchSourcePhaseBenchmark.filterXContentonParser                 message  one_4m_field  avgt    5   1614980.846 ±  31716.956   ns/op
FetchSourcePhaseBenchmark.filterXContentonParser:·gc.alloc.rate  message  one_4m_field  avgt    5         1.755 ±      0.035  MB/sec

I found myself needing support for something like `filter_path` on
`XContentParser`. It was simple enough to plug it in so I did. Then I
realized that it might offer more memory efficient source filtering
(elastic#25168) so I put together a quick benchmark comparing the source
filtering that we do in `_search`.

Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:
```
Benchmark                                          (excludes)  (includes)  (source)  Mode  Cnt     Score    Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message     short  avgt    5  2360.342 ±  4.715  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message     short  avgt    5  2010.278 ± 15.042  ns/op
FetchSourcePhaseBenchmark.filterXContentOnParser                  message     short  avgt    5  1588.446 ± 18.593  ns/op
```

The top line is the way we filter now. The middle line is adding a
filter to `XContentBuilder` - something we can do right now without any
of my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.

This isn't particularly impresive. 33% *sounds* great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.

But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt     Score     Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4k_field  avgt    5  5957.128 ± 117.402  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4k_field  avgt    5  4999.073 ±  96.003  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4k_field  avgt    5  3261.478 ±  48.879  ns/op
```

That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!

But have a look at how things come out when you want a single field from
a 4 *megabyte* document:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt        Score        Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4m_field  avgt    5  8266343.036 ± 176197.077  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4m_field  avgt    5  6227560.013 ±  68306.318  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4m_field  avgt    5  1617153.472 ±  80164.547  ns/op
```

These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.

Take a look at what you get when I turn on gc metrics:
```
FetchSourcePhaseBenchmark.filterObjects                          message  one_4m_field  avgt    5   7036097.561 ±  84721.312   ns/op
FetchSourcePhaseBenchmark.filterObjects:·gc.alloc.rate           message  one_4m_field  avgt    5      2166.613 ±     25.975  MB/sec
FetchSourcePhaseBenchmark.filterXContentOnBuilder                message  one_4m_field  avgt    5   6104595.992 ±  55445.508   ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder:·gc.alloc.rate message  one_4m_field  avgt    5      2496.978 ±     22.650  MB/sec
FetchSourcePhaseBenchmark.filterXContentonParser                 message  one_4m_field  avgt    5   1614980.846 ±  31716.956   ns/op
FetchSourcePhaseBenchmark.filterXContentonParser:·gc.alloc.rate  message  one_4m_field  avgt    5         1.755 ±      0.035  MB/sec
```
@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 13, 2021
@elasticsearchmachine elasticsearchmachine merged commit 05af243 into elastic:7.x Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport v7.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants