1
1
[[analysis-asciifolding-tokenfilter]]
2
- === ASCII Folding Token Filter
2
+ === ASCII folding token filter
3
+ ++++
4
+ <titleabbrev>ASCII folding</titleabbrev>
5
+ ++++
3
6
4
- A token filter of type `asciifolding` that converts alphabetic, numeric,
5
- and symbolic Unicode characters which are not in the first 127 ASCII
6
- characters (the "Basic Latin" Unicode block) into their ASCII
7
- equivalents, if one exists. Example:
7
+ Converts alphabetic, numeric, and symbolic characters that are not in the Basic
8
+ Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
9
+ one exists. For example, the filter changes `à` to `a`.
10
+
11
+ This filter uses Lucene's
12
+ https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
13
+
14
+ [[analysis-asciifolding-tokenfilter-analyze-ex]]
15
+ ==== Example
16
+
17
+ The following <<indices-analyze,analyze API>> request uses the `asciifolding`
18
+ filter to drop the diacritical marks in `açaí à la carte`:
19
+
20
+ [source,console]
21
+ --------------------------------------------------
22
+ GET /_analyze
23
+ {
24
+ "tokenizer" : "standard",
25
+ "filter" : ["asciifolding"],
26
+ "text" : "açaí à la carte"
27
+ }
28
+ --------------------------------------------------
29
+
30
+ The filter produces the following tokens:
31
+
32
+ [source,text]
33
+ --------------------------------------------------
34
+ [ acai, a, la, carte ]
35
+ --------------------------------------------------
36
+
37
+ /////////////////////
38
+ [source,console-result]
39
+ --------------------------------------------------
40
+ {
41
+ "tokens" : [
42
+ {
43
+ "token" : "acai",
44
+ "start_offset" : 0,
45
+ "end_offset" : 4,
46
+ "type" : "<ALPHANUM>",
47
+ "position" : 0
48
+ },
49
+ {
50
+ "token" : "a",
51
+ "start_offset" : 5,
52
+ "end_offset" : 6,
53
+ "type" : "<ALPHANUM>",
54
+ "position" : 1
55
+ },
56
+ {
57
+ "token" : "la",
58
+ "start_offset" : 7,
59
+ "end_offset" : 9,
60
+ "type" : "<ALPHANUM>",
61
+ "position" : 2
62
+ },
63
+ {
64
+ "token" : "carte",
65
+ "start_offset" : 10,
66
+ "end_offset" : 15,
67
+ "type" : "<ALPHANUM>",
68
+ "position" : 3
69
+ }
70
+ ]
71
+ }
72
+ --------------------------------------------------
73
+ /////////////////////
74
+
75
+ [[analysis-asciifolding-tokenfilter-analyzer-ex]]
76
+ ==== Add to an analyzer
77
+
78
+ The following <<indices-create-index,create index API>> request uses the
79
+ `asciifolding` filter to configure a new
80
+ <<analysis-custom-analyzer,custom analyzer>>.
8
81
9
82
[source,console]
10
83
--------------------------------------------------
@@ -13,7 +86,7 @@ PUT /asciifold_example
13
86
"settings" : {
14
87
"analysis" : {
15
88
"analyzer" : {
16
- "default " : {
89
+ "standard_asciifolding " : {
17
90
"tokenizer" : "standard",
18
91
"filter" : ["asciifolding"]
19
92
}
@@ -23,9 +96,23 @@ PUT /asciifold_example
23
96
}
24
97
--------------------------------------------------
25
98
26
- Accepts `preserve_original` setting which defaults to false but if true
27
- will keep the original token as well as emit the folded token. For
28
- example:
99
+ [[analysis-asciifolding-tokenfilter-configure-parms]]
100
+ ==== Configurable parameters
101
+
102
+ `preserve_original`::
103
+ (Optional, boolean)
104
+ If `true`, emit both original tokens and folded tokens.
105
+ Defaults to `false`.
106
+
107
+ [[analysis-asciifolding-tokenfilter-customize]]
108
+ ==== Customize
109
+
110
+ To customize the `asciifolding` filter, duplicate it to create the basis
111
+ for a new custom token filter. You can modify the filter using its configurable
112
+ parameters.
113
+
114
+ For example, the following request creates a custom `asciifolding` filter with
115
+ `preserve_original` set to true:
29
116
30
117
[source,console]
31
118
--------------------------------------------------
@@ -34,7 +121,7 @@ PUT /asciifold_example
34
121
"settings" : {
35
122
"analysis" : {
36
123
"analyzer" : {
37
- "default " : {
124
+ "standard_asciifolding " : {
38
125
"tokenizer" : "standard",
39
126
"filter" : ["my_ascii_folding"]
40
127
}
0 commit comments