-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Open
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>bugTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearchpriority:normalA label for assessing bug priority to be used by ES engineersA label for assessing bug priority to be used by ES engineers
Description
Benchmarks on real data have steered me towards this token filter as other forms of stemmer are generally too aggressive for ecommerce (e.g. loafers==loaf).
Good plural-stemming is ideally what is required because most user searches are plural and yet product descriptions are singular (e.g. "dresses" search should match product "red dress").
Good examples of plural stemming by this existing filter include:
| Search string | Good stemmed form |
|---|---|
cases |
case |
shades |
shade |
bottles |
bottle |
However, these terms fail to match because of bad stemming:
| Search string | Bad stemmed form |
|---|---|
dresses |
dresse |
watches |
watche |
brushes |
brushe |
boxes |
boxe |
Example reproduction:
DELETE test
PUT test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"filter_english_minimal"
]
}
},
"filter": {
"filter_english_minimal": {
"type": "stemmer",
"name": "minimal_english"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
POST test/_doc/1
{
"name":"red dress"
}
# Does not match (search stems to "dresse")
GET test/_search
{
"query":{
"match":{
"name":"dresses"
}
}
}
Solution
It would be good to fix these poor examples of stemming but would obviously need to worry about backwards compatibility.
mayya-sharipova, softwaredoug, pmallela and IllyaMoskvin
Metadata
Metadata
Assignees
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>bugTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearchpriority:normalA label for assessing bug priority to be used by ES engineersA label for assessing bug priority to be used by ES engineers