Skip to content

Minhash token filter needs better documentation #20757

@rpedela

Description

@rpedela

The Minhash Token Filter documentation only describes the interface for the token filter. That is fine for most token filters, but this one is more complicated.

  1. It should list possible use cases such as an alternative to the "more like this" query.
  2. It should talk about the recommended number of shingles: 5.
  3. It should give small but complete examples for 1 and 2.

In the Lucene issue, they discuss Jaccard and cosine similarities. Did that make it into the final patch? If so, should that be exposed as a setting?

Metadata

Metadata

Labels

:Search Relevance/AnalysisHow text is split into tokens>docsGeneral docs changesTeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearch

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions