Give significance lookups their own home #57903

nik9000 · 2020-06-09T20:12:25Z

This moves the code to look up significance heuristics information like
background frequency and superset size out of
SignificantTermsAggregatorFactory and into its own home so that it is
easier to pass around. This will:

Make us feel better about ourselves for not passing around the
factory, which is really supposed to be a throw away thing.
Abstract the significance lookup logic so we can reuse it for the
significant_text aggregation.
Make if very simple to cache the background frequencies which should
speed up when the agg is a sub-agg. We had done this for numerics
but not string-shaped significant terms.

This moves the code to look up significance heuristics information like background frequency and superset size out of `SignificantTermsAggregatorFactory` and into its own home so that it is easier to pass around. This will: 1. Make us feel better about ourselves for not passing around the factory, which is really *supposed* to be a throw away thing. 2. Abstract the significance lookup logic so we can reuse it for the `significant_text` aggregation. 3. Make if very simple to cache the background frequencies which should speed up when the agg is a sub-agg. We had done this for numerics but not string-shaped significant terms.

elasticmachine · 2020-06-09T20:12:27Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

nik9000 · 2020-06-10T17:26:18Z

This improves performance of string significant_terms by about 15%. before after

nik9000 · 2020-06-10T17:33:11Z

This improves performance of string significant_terms by about 15%. before after

And for that reason I've marked this an >enhancement

not-napoleon

Looks good

not-napoleon · 2020-06-10T18:01:06Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/SignificanceLookup.java

+    }
+
+    /**
+     * Get the background frequency of a {@link BytesRef} term.


Is this javadoc correct? Looks like it's operating on a long term below.

It wasn't! I fixed it.

This moves the code to look up significance heuristics information like background frequency and superset size out of `SignificantTermsAggregatorFactory` and into its own home so that it is easier to pass around. This will: 1. Make us feel better about ourselves for not passing around the factory, which is really *supposed* to be a throw away thing. 2. Abstract the significance lookup logic so we can reuse it for the `significant_text` aggregation. 3. Make if very simple to cache the background frequencies which should speed up when the agg is a sub-agg. We had done this for numerics but not string-shaped significant terms.

nik9000 added >non-issue :Analytics/Aggregations Aggregations v8.0.0 v7.9.0 labels Jun 9, 2020

nik9000 requested a review from not-napoleon June 9, 2020 20:12

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 9, 2020

nik9000 requested a review from polyfractal June 9, 2020 20:12

Merge branch 'master' into significance_lookup

7852439

nik9000 mentioned this pull request Jun 10, 2020

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Closed

16 tasks

nik9000 added >enhancement and removed >non-issue labels Jun 10, 2020

not-napoleon approved these changes Jun 10, 2020

View reviewed changes

Javadoc!

dc1b8a8

nik9000 merged commit 62e2d85 into elastic:master Jun 10, 2020

nik9000 added the backport pending label Jun 10, 2020

nik9000 removed the backport pending label Jun 12, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Give significance lookups their own home #57903

Give significance lookups their own home #57903

Uh oh!

nik9000 commented Jun 9, 2020

Uh oh!

elasticmachine commented Jun 9, 2020

Uh oh!

nik9000 commented Jun 10, 2020

Uh oh!

nik9000 commented Jun 10, 2020

Uh oh!

not-napoleon left a comment

Uh oh!

not-napoleon Jun 10, 2020

Uh oh!

nik9000 Jun 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Give significance lookups their own home #57903

Give significance lookups their own home #57903

Uh oh!

Conversation

nik9000 commented Jun 9, 2020

Uh oh!

elasticmachine commented Jun 9, 2020

Uh oh!

nik9000 commented Jun 10, 2020

Uh oh!

nik9000 commented Jun 10, 2020

Uh oh!

not-napoleon left a comment

Choose a reason for hiding this comment

Uh oh!

not-napoleon Jun 10, 2020

Choose a reason for hiding this comment

Uh oh!

nik9000 Jun 10, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants