-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
We have a number of filters that can help make search faster:
- shingles for faster phrases
- ngrams for infix search
- edge n-grams for prefix/suffix search
Yet leveraging them to improve search speed typically makes Elasticsearch much harder to use since query parsers are not aware of whether these filters are in use.
To give the example of prefix search, I'm wondering whether we should add a MappedFieldType.prefixQuery factory method that would be called by query parsers. Regular text fields would still create a PrefixQuery but we could have a new field type that would be optimized for prefix search which would automatically add a filter to the analysis chain at index time. It would be like the edge n-gram filter except that it would add a marker to differenciate prefixes from actual terms. For instance if we want to optimize prefix queries for prefixes that would be up to 4 chars, we could analyze foobar as [foobar, \0f, \0fo, \0foo, \0foob]. I'm using \0 here but anything that can help differenciate prefixes from the original term while preventing collisions would work.
Then at search time, MappedFieldType.prefixQuery would look at the length of the term and prepend a \0 and run a term query if there are 4 chars or less, and run a regular PrefixQuery otherwise.
We could do the same for infix search or phrase queries using similar ideas.