Skip to content

Conversation

@csoulios
Copy link
Contributor

@csoulios csoulios commented Dec 3, 2020

Backports #65776 to 7.x

A while back, Lucene introduced the ability to index custom term frequencies, ie. giving users the ability to provide a numeric value that should be indexed as a term frequency rather than letting Lucene compute the term frequency by itself based on the number of occurrences of a term.

This PR modifies the _doc_count field so that it is stored as Lucene custom term frequency.

A benefit of moving to custom term frequencies is that Lucene will automatically compute global term statistics like totalTermFreq which will let us know the sum of the values of the _doc_count field across an entire shard. This could in-turn be useful to generalize optimizations to rollup indices, e.g. buckets aggregations where all documents fall into the same bucket.

Relates to #64503

A while back, Lucene introduced the ability to index custom term frequencies, ie. giving users 
the ability to provide a numeric value that should be indexed as a term frequency rather than 
letting Lucene compute the term frequency by itself based on the number of occurrences of 
a term.

This PR modifies the _doc_count field so that it is stored as Lucene custom term frequency.

A benefit of moving to custom term frequencies is that Lucene will automatically compute global term 
statistics like totalTermFreq which will let us know the sum of the values of the _doc_count field across 
an entire shard. This could in-turn be useful to generalize optimizations to rollup indices,
 e.g. buckets aggregations where all documents fall into the same bucket.

Relates to elastic#64503
@csoulios csoulios added :Search Foundations/Mapping Index mappings, including merging and defining field types backport labels Dec 3, 2020
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 3, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@csoulios csoulios merged commit c3ff707 into elastic:7.x Dec 3, 2020
@csoulios csoulios deleted the doc_count_term_freq_7.x branch December 3, 2020 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants