11[[analysis-elision-tokenfilter]]
2- === Elision Token Filter
2+ === Elision token filter
3+ ++++
4+ <titleabbrev>Elision</titleabbrev>
5+ ++++
36
4- A token filter which removes elisions. For example, "l'avion" (the
5- plane) will tokenized as "avion" (plane).
7+ Removes specified https://en.wikipedia.org/wiki/Elision[elisions] from
8+ the beginning of tokens. For example, you can use this filter to change
9+ `l'avion` to `avion`.
610
7- Requires either an `articles` parameter which is a set of stop word articles, or
8- `articles_path` which points to a text file containing the stop set. Also optionally
9- accepts `articles_case`, which indicates whether the filter treats those articles as
10- case sensitive.
11+ When not customized, the filter removes the following French elisions by default:
1112
12- For example:
13+ `l'`, `m'`, `t'`, `qu'`, `n'`, `s'`, `j'`, `d'`, `c'`, `jusqu'`, `quoiqu'`,
14+ `lorsqu'`, `puisqu'`
15+
16+ Customized versions of this filter are included in several of {es}'s built-in
17+ <<analysis-lang-analyzer,language analyzers>>:
18+
19+ * <<catalan-analyzer, Catalan analyzer>>
20+ * <<french-analyzer, French analyzer>>
21+ * <<irish-analyzer, Irish analyzer>>
22+ * <<italian-analyzer, Italian analyzer>>
23+
24+ This filter uses Lucene's
25+ https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html[ElisionFilter].
26+
27+ [[analysis-elision-tokenfilter-analyze-ex]]
28+ ==== Example
29+
30+ The following <<indices-analyze,analyze API>> request uses the `elision`
31+ filter to remove `j'` from `j’examine près du wharf`:
32+
33+ [source,console]
34+ --------------------------------------------------
35+ GET _analyze
36+ {
37+ "tokenizer" : "standard",
38+ "filter" : ["elision"],
39+ "text" : "j’examine près du wharf"
40+ }
41+ --------------------------------------------------
42+
43+ The filter produces the following tokens:
44+
45+ [source,text]
46+ --------------------------------------------------
47+ [ examine, près, du, wharf ]
48+ --------------------------------------------------
49+
50+ /////////////////////
51+ [source,console-result]
52+ --------------------------------------------------
53+ {
54+ "tokens" : [
55+ {
56+ "token" : "examine",
57+ "start_offset" : 0,
58+ "end_offset" : 9,
59+ "type" : "<ALPHANUM>",
60+ "position" : 0
61+ },
62+ {
63+ "token" : "près",
64+ "start_offset" : 10,
65+ "end_offset" : 14,
66+ "type" : "<ALPHANUM>",
67+ "position" : 1
68+ },
69+ {
70+ "token" : "du",
71+ "start_offset" : 15,
72+ "end_offset" : 17,
73+ "type" : "<ALPHANUM>",
74+ "position" : 2
75+ },
76+ {
77+ "token" : "wharf",
78+ "start_offset" : 18,
79+ "end_offset" : 23,
80+ "type" : "<ALPHANUM>",
81+ "position" : 3
82+ }
83+ ]
84+ }
85+ --------------------------------------------------
86+ /////////////////////
87+
88+ [[analysis-elision-tokenfilter-analyzer-ex]]
89+ ==== Add to an analyzer
90+
91+ The following <<indices-create-index,create index API>> request uses the
92+ `elision` filter to configure a new
93+ <<analysis-custom-analyzer,custom analyzer>>.
1394
1495[source,console]
1596--------------------------------------------------
@@ -18,16 +99,85 @@ PUT /elision_example
1899 "settings" : {
19100 "analysis" : {
20101 "analyzer" : {
21- "default " : {
22- "tokenizer" : "standard ",
102+ "whitespace_elision " : {
103+ "tokenizer" : "whitespace ",
23104 "filter" : ["elision"]
24105 }
106+ }
107+ }
108+ }
109+ }
110+ --------------------------------------------------
111+
112+ [[analysis-elision-tokenfilter-configure-parms]]
113+ ==== Configurable parameters
114+
115+ [[analysis-elision-tokenfilter-articles]]
116+ `articles`::
117+ +
118+ --
119+ (Required+++*+++, array of string)
120+ List of elisions to remove.
121+
122+ To be removed, the elision must be at the beginning of a token and be
123+ immediately followed by an apostrophe. Both the elision and apostrophe are
124+ removed.
125+
126+ For custom `elision` filters, either this parameter or `articles_path` must be
127+ specified.
128+ --
129+
130+ `articles_path`::
131+ +
132+ --
133+ (Required+++*+++, string)
134+ Path to a file that contains a list of elisions to remove.
135+
136+ This path must be absolute or relative to the `config` location, and the file
137+ must be UTF-8 encoded. Each elision in the file must be separated by a line
138+ break.
139+
140+ To be removed, the elision must be at the beginning of a token and be
141+ immediately followed by an apostrophe. Both the elision and apostrophe are
142+ removed.
143+
144+ For custom `elision` filters, either this parameter or `articles` must be
145+ specified.
146+ --
147+
148+ `articles_case`::
149+ (Optional, boolean)
150+ If `true`, the filter treats any provided elisions as case sensitive.
151+ Defaults to `false`.
152+
153+ [[analysis-elision-tokenfilter-customize]]
154+ ==== Customize
155+
156+ To customize the `elision` filter, duplicate it to create the basis
157+ for a new custom token filter. You can modify the filter using its configurable
158+ parameters.
159+
160+ For example, the following request creates a custom case-sensitive `elision`
161+ filter that removes the `l'`, `m'`, `t'`, `qu'`, `n'`, `s'`,
162+ and `j'` elisions:
163+
164+ [source,console]
165+ --------------------------------------------------
166+ PUT /elision_case_sensitive_example
167+ {
168+ "settings" : {
169+ "analysis" : {
170+ "analyzer" : {
171+ "default" : {
172+ "tokenizer" : "whitespace",
173+ "filter" : ["elision_case_sensitive"]
174+ }
25175 },
26176 "filter" : {
27- "elision " : {
177+ "elision_case_sensitive " : {
28178 "type" : "elision",
29- "articles_case": true ,
30- "articles" : ["l", "m", "t", "qu", "n", "s", "j"]
179+ "articles" : ["l", "m", "t", "qu", "n", "s", "j"] ,
180+ "articles_case": true
31181 }
32182 }
33183 }
0 commit comments