-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DOCS] Reformat apostrophe token filter docs #48076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-search (:Search/Analysis) |
|
Pinging @elastic/es-docs (>docs) |
| The `apostrophe` token filter produces the following tokens: | ||
|
|
||
| [source,text] | ||
| --------------------------- | ||
| [ Istanbul, veya, Istanbul ] | ||
| --------------------------- | ||
|
|
||
| ///////////////////// | ||
| [source,console-result] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love any feedback here.
The token example is based on the one from the simple analyzer:
https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-simple-analyzer.html#_example_7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, this token filter was built to be used with Turkish. For clarity, it could be worth mentioning that explicitly, perhaps linking to the Turkish analyzer (which makes use of this filter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea. I've added this information to the description at the top with 04a33a6.
jtibshirani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall format seems like a nice improvement to me. It's great to give an example of how the text gets analyzed, and I think using the _analyze API will help more users become aware of this helpful endpoint.
One thought about the documentation format -- we could consider linking to the Lucene documentation when it exists, as it often contains more detailed information or paper references. This also gives more clarity around where the implementation lives, so users know where to go to dig into code or file a bug.
| The `apostrophe` token filter produces the following tokens: | ||
|
|
||
| [source,text] | ||
| --------------------------- | ||
| [ Istanbul, veya, Istanbul ] | ||
| --------------------------- | ||
|
|
||
| ///////////////////// | ||
| [source,console-result] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, this token filter was built to be used with Turkish. For clarity, it could be worth mentioning that explicitly, perhaps linking to the Turkish analyzer (which makes use of this filter).
| "settings" : { | ||
| "analysis" : { | ||
| "analyzer" : { | ||
| "default" : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, maybe we don't want to always call the analyzer default. We could use a more specific name like apostrophe or standard_apostrophe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the analyzer name to standard_apostrophe with 04a33a6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to my comment: when experimenting with token filters, it seems more likely that the user will want to create a custom analyzer as opposed to overriding the default one. Maybe we could make a few small tweaks to clarify this, including using a specific analyzer name other than defaultand linking to the custom analyzer docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I had a race condition with my comment. This looks good to me, we could also link to the custom analyzer docs if you think it's helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks for your feedback @jtibshirani. I made a few changes based on comments with 04a33a6. This includes linking to Lucene's docs, which is a good idea! |
| ==== Add to an analyzer | ||
|
|
||
| The following <<indices-create-index,create index API>> request adds the | ||
| apostrophe token filter to an analyzer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last small suggestion -- instead of 'adding to an an analyzer' it could be clearer/ more precise to say 'uses the token filter to configure a new analyzer'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification! Reworded with a847eeb..
jtibshirani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
Reformats the
apostrophetoken filter docs. This PR adds:I hope to re-use this format for other token filter docs. All feedback is welcome!