-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Context: logstash-plugins/logstash-output-elasticsearch#462
The new keyword type introduced by #12394 feels very confusing to me.
I often index data that has multiple words in it, such as sports player names, location names, server names, etc, and had historically made this data in Elasticsearch mapped as a "string which is not_analyzed". This model made sense to me, as it expressed pretty much exactly what I wanted -- a string where the entire contents of the string is the "term" (in ES/Lucene terminology).
The name chosen to call the keyword feature feels wrong, to me. "United States" is two words. "San Jose Sharks" is three words. "www.google.com" is a network address with 3 parts. For all of these examples, I have wanted to do terms aggregations or similar operations which require the whole text to be considered a single term. Under this new keyword label, these things which have multiple parts are all now called a singular word and that very much confuses me.
As a note, I understand the keyword type uses only the keyword tokenizer, and that we may say that because of this, keyword is the correct name for the type. My counter to this is that the keyword tokenizer is incorrectly named for the same reason.
Can we change the keyword to something that may better reflect what it is?