1
1
[[analysis-elision-tokenfilter]]
2
- === Elision Token Filter
2
+ === Elision token filter
3
+ ++++
4
+ <titleabbrev>Elision</titleabbrev>
5
+ ++++
3
6
4
- A token filter which removes elisions. For example, "l'avion" (the
5
- plane) will tokenized as "avion" (plane).
7
+ Removes specified https://en.wikipedia.org/wiki/Elision[elisions] from
8
+ the beginning of tokens. For example, you can use this filter to change
9
+ `l'avion` to `avion`.
6
10
7
- Requires either an `articles` parameter which is a set of stop word articles, or
8
- `articles_path` which points to a text file containing the stop set. Also optionally
9
- accepts `articles_case`, which indicates whether the filter treats those articles as
10
- case sensitive.
11
+ When not customized, the filter removes the following French elisions by default:
11
12
12
- For example:
13
+ `l'`, `m'`, `t'`, `qu'`, `n'`, `s'`, `j'`, `d'`, `c'`, `jusqu'`, `quoiqu'`,
14
+ `lorsqu'`, `puisqu'`
15
+
16
+ Customized versions of this filter are included in several of {es}'s built-in
17
+ <<analysis-lang-analyzer,language analyzers>>:
18
+
19
+ * <<catalan-analyzer, Catalan analyzer>>
20
+ * <<french-analyzer, French analyzer>>
21
+ * <<irish-analyzer, Irish analyzer>>
22
+ * <<italian-analyzer, Italian analyzer>>
23
+
24
+ This filter uses Lucene's
25
+ https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html[ElisionFilter].
26
+
27
+ [[analysis-elision-tokenfilter-analyze-ex]]
28
+ ==== Example
29
+
30
+ The following <<indices-analyze,analyze API>> request uses the `elision`
31
+ filter to remove `j'` from `j’examine près du wharf`:
32
+
33
+ [source,console]
34
+ --------------------------------------------------
35
+ GET _analyze
36
+ {
37
+ "tokenizer" : "standard",
38
+ "filter" : ["elision"],
39
+ "text" : "j’examine près du wharf"
40
+ }
41
+ --------------------------------------------------
42
+
43
+ The filter produces the following tokens:
44
+
45
+ [source,text]
46
+ --------------------------------------------------
47
+ [ examine, près, du, wharf ]
48
+ --------------------------------------------------
49
+
50
+ /////////////////////
51
+ [source,console-result]
52
+ --------------------------------------------------
53
+ {
54
+ "tokens" : [
55
+ {
56
+ "token" : "examine",
57
+ "start_offset" : 0,
58
+ "end_offset" : 9,
59
+ "type" : "<ALPHANUM>",
60
+ "position" : 0
61
+ },
62
+ {
63
+ "token" : "près",
64
+ "start_offset" : 10,
65
+ "end_offset" : 14,
66
+ "type" : "<ALPHANUM>",
67
+ "position" : 1
68
+ },
69
+ {
70
+ "token" : "du",
71
+ "start_offset" : 15,
72
+ "end_offset" : 17,
73
+ "type" : "<ALPHANUM>",
74
+ "position" : 2
75
+ },
76
+ {
77
+ "token" : "wharf",
78
+ "start_offset" : 18,
79
+ "end_offset" : 23,
80
+ "type" : "<ALPHANUM>",
81
+ "position" : 3
82
+ }
83
+ ]
84
+ }
85
+ --------------------------------------------------
86
+ /////////////////////
87
+
88
+ [[analysis-elision-tokenfilter-analyzer-ex]]
89
+ ==== Add to an analyzer
90
+
91
+ The following <<indices-create-index,create index API>> request uses the
92
+ `elision` filter to configure a new
93
+ <<analysis-custom-analyzer,custom analyzer>>.
13
94
14
95
[source,console]
15
96
--------------------------------------------------
@@ -18,16 +99,85 @@ PUT /elision_example
18
99
"settings" : {
19
100
"analysis" : {
20
101
"analyzer" : {
21
- "default " : {
22
- "tokenizer" : "standard ",
102
+ "whitespace_elision " : {
103
+ "tokenizer" : "whitespace ",
23
104
"filter" : ["elision"]
24
105
}
106
+ }
107
+ }
108
+ }
109
+ }
110
+ --------------------------------------------------
111
+
112
+ [[analysis-elision-tokenfilter-configure-parms]]
113
+ ==== Configurable parameters
114
+
115
+ [[analysis-elision-tokenfilter-articles]]
116
+ `articles`::
117
+ +
118
+ --
119
+ (Required+++*+++, array of string)
120
+ List of elisions to remove.
121
+
122
+ To be removed, the elision must be at the beginning of a token and be
123
+ immediately followed by an apostrophe. Both the elision and apostrophe are
124
+ removed.
125
+
126
+ For custom `elision` filters, either this parameter or `articles_path` must be
127
+ specified.
128
+ --
129
+
130
+ `articles_path`::
131
+ +
132
+ --
133
+ (Required+++*+++, string)
134
+ Path to a file that contains a list of elisions to remove.
135
+
136
+ This path must be absolute or relative to the `config` location, and the file
137
+ must be UTF-8 encoded. Each elision in the file must be separated by a line
138
+ break.
139
+
140
+ To be removed, the elision must be at the beginning of a token and be
141
+ immediately followed by an apostrophe. Both the elision and apostrophe are
142
+ removed.
143
+
144
+ For custom `elision` filters, either this parameter or `articles` must be
145
+ specified.
146
+ --
147
+
148
+ `articles_case`::
149
+ (Optional, boolean)
150
+ If `true`, the filter treats any provided elisions as case sensitive.
151
+ Defaults to `false`.
152
+
153
+ [[analysis-elision-tokenfilter-customize]]
154
+ ==== Customize
155
+
156
+ To customize the `elision` filter, duplicate it to create the basis
157
+ for a new custom token filter. You can modify the filter using its configurable
158
+ parameters.
159
+
160
+ For example, the following request creates a custom case-sensitive `elision`
161
+ filter that removes the `l'`, `m'`, `t'`, `qu'`, `n'`, `s'`,
162
+ and `j'` elisions:
163
+
164
+ [source,console]
165
+ --------------------------------------------------
166
+ PUT /elision_case_sensitive_example
167
+ {
168
+ "settings" : {
169
+ "analysis" : {
170
+ "analyzer" : {
171
+ "default" : {
172
+ "tokenizer" : "whitespace",
173
+ "filter" : ["elision_case_sensitive"]
174
+ }
25
175
},
26
176
"filter" : {
27
- "elision " : {
177
+ "elision_case_sensitive " : {
28
178
"type" : "elision",
29
- "articles_case": true ,
30
- "articles" : ["l", "m", "t", "qu", "n", "s", "j"]
179
+ "articles" : ["l", "m", "t", "qu", "n", "s", "j"] ,
180
+ "articles_case": true
31
181
}
32
182
}
33
183
}
0 commit comments