Skip to content

Conversation

@jrodewig
Copy link
Contributor

Reformats the apostrophe token filter docs. This PR adds:

  • A title abbreviation
  • An analyze API example with resulting tokens
  • An example adding the token filter to an analyzer

I hope to re-use this format for other token filter docs. All feedback is welcome!

@jrodewig jrodewig added >docs General docs changes :Search Relevance/Analysis How text is split into tokens v8.0.0 v7.5.0 v7.4.1 labels Oct 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

Comment on lines 25 to 33
The `apostrophe` token filter produces the following tokens:

[source,text]
---------------------------
[ Istanbul, veya, Istanbul ]
---------------------------

/////////////////////
[source,console-result]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love any feedback here.

The token example is based on the one from the simple analyzer:
https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-simple-analyzer.html#_example_7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, this token filter was built to be used with Turkish. For clarity, it could be worth mentioning that explicitly, perhaps linking to the Turkish analyzer (which makes use of this filter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea. I've added this information to the description at the top with 04a33a6.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall format seems like a nice improvement to me. It's great to give an example of how the text gets analyzed, and I think using the _analyze API will help more users become aware of this helpful endpoint.

One thought about the documentation format -- we could consider linking to the Lucene documentation when it exists, as it often contains more detailed information or paper references. This also gives more clarity around where the implementation lives, so users know where to go to dig into code or file a bug.

Comment on lines 25 to 33
The `apostrophe` token filter produces the following tokens:

[source,text]
---------------------------
[ Istanbul, veya, Istanbul ]
---------------------------

/////////////////////
[source,console-result]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, this token filter was built to be used with Turkish. For clarity, it could be worth mentioning that explicitly, perhaps linking to the Turkish analyzer (which makes use of this filter).

"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, maybe we don't want to always call the analyzer default. We could use a more specific name like apostrophe or standard_apostrophe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the analyzer name to standard_apostrophe with 04a33a6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to my comment: when experimenting with token filters, it seems more likely that the user will want to create a custom analyzer as opposed to overriding the default one. Maybe we could make a few small tweaks to clarify this, including using a specific analyzer name other than defaultand linking to the custom analyzer docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I had a race condition with my comment. This looks good to me, we could also link to the custom analyzer docs if you think it's helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! That makes sense to me.

I added a link to the custom analyzer docs with 701a349
and reworded this paragraph with a847eeb.

@jrodewig
Copy link
Contributor Author

Thanks for your feedback @jtibshirani. I made a few changes based on comments with 04a33a6. This includes linking to Lucene's docs, which is a good idea!

==== Add to an analyzer

The following <<indices-create-index,create index API>> request adds the
apostrophe token filter to an analyzer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last small suggestion -- instead of 'adding to an an analyzer' it could be clearer/ more precise to say 'uses the token filter to configure a new analyzer'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! Reworded with a847eeb..

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@jrodewig jrodewig merged commit c367c5c into elastic:master Oct 16, 2019
@jrodewig jrodewig deleted the reformat.apos-token-filter branch October 16, 2019 12:50
@tomcallahan tomcallahan added v7.4.2 and removed v7.4.1 labels Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes :Search Relevance/Analysis How text is split into tokens v7.4.2 v7.5.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants