Skip to content

Commit 31fc615

Browse files
authored
[DOCS] Reformat ASCII folding token filter docs (#48143)
1 parent df83eb9 commit 31fc615

File tree

2 files changed

+98
-11
lines changed

2 files changed

+98
-11
lines changed

docs/reference/analysis/tokenfilters/apostrophe-tokenfilter.asciidoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Strips all characters after an apostrophe, including the apostrophe itself.
88

99
This filter is included in {es}'s built-in <<turkish-analyzer,Turkish language
1010
analyzer>>. It uses Lucene's
11-
https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
11+
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
1212
which was built for the Turkish language.
1313

1414

docs/reference/analysis/tokenfilters/asciifolding-tokenfilter.asciidoc

+97-10
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,83 @@
11
[[analysis-asciifolding-tokenfilter]]
2-
=== ASCII Folding Token Filter
2+
=== ASCII folding token filter
3+
++++
4+
<titleabbrev>ASCII folding</titleabbrev>
5+
++++
36

4-
A token filter of type `asciifolding` that converts alphabetic, numeric,
5-
and symbolic Unicode characters which are not in the first 127 ASCII
6-
characters (the "Basic Latin" Unicode block) into their ASCII
7-
equivalents, if one exists. Example:
7+
Converts alphabetic, numeric, and symbolic characters that are not in the Basic
8+
Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
9+
one exists. For example, the filter changes `à` to `a`.
10+
11+
This filter uses Lucene's
12+
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
13+
14+
[[analysis-asciifolding-tokenfilter-analyze-ex]]
15+
==== Example
16+
17+
The following <<indices-analyze,analyze API>> request uses the `asciifolding`
18+
filter to drop the diacritical marks in `açaí à la carte`:
19+
20+
[source,console]
21+
--------------------------------------------------
22+
GET /_analyze
23+
{
24+
"tokenizer" : "standard",
25+
"filter" : ["asciifolding"],
26+
"text" : "açaí à la carte"
27+
}
28+
--------------------------------------------------
29+
30+
The filter produces the following tokens:
31+
32+
[source,text]
33+
--------------------------------------------------
34+
[ acai, a, la, carte ]
35+
--------------------------------------------------
36+
37+
/////////////////////
38+
[source,console-result]
39+
--------------------------------------------------
40+
{
41+
"tokens" : [
42+
{
43+
"token" : "acai",
44+
"start_offset" : 0,
45+
"end_offset" : 4,
46+
"type" : "<ALPHANUM>",
47+
"position" : 0
48+
},
49+
{
50+
"token" : "a",
51+
"start_offset" : 5,
52+
"end_offset" : 6,
53+
"type" : "<ALPHANUM>",
54+
"position" : 1
55+
},
56+
{
57+
"token" : "la",
58+
"start_offset" : 7,
59+
"end_offset" : 9,
60+
"type" : "<ALPHANUM>",
61+
"position" : 2
62+
},
63+
{
64+
"token" : "carte",
65+
"start_offset" : 10,
66+
"end_offset" : 15,
67+
"type" : "<ALPHANUM>",
68+
"position" : 3
69+
}
70+
]
71+
}
72+
--------------------------------------------------
73+
/////////////////////
74+
75+
[[analysis-asciifolding-tokenfilter-analyzer-ex]]
76+
==== Add to an analyzer
77+
78+
The following <<indices-create-index,create index API>> request uses the
79+
`asciifolding` filter to configure a new
80+
<<analysis-custom-analyzer,custom analyzer>>.
881

982
[source,console]
1083
--------------------------------------------------
@@ -13,7 +86,7 @@ PUT /asciifold_example
1386
"settings" : {
1487
"analysis" : {
1588
"analyzer" : {
16-
"default" : {
89+
"standard_asciifolding" : {
1790
"tokenizer" : "standard",
1891
"filter" : ["asciifolding"]
1992
}
@@ -23,9 +96,23 @@ PUT /asciifold_example
2396
}
2497
--------------------------------------------------
2598

26-
Accepts `preserve_original` setting which defaults to false but if true
27-
will keep the original token as well as emit the folded token. For
28-
example:
99+
[[analysis-asciifolding-tokenfilter-configure-parms]]
100+
==== Configurable parameters
101+
102+
`preserve_original`::
103+
(Optional, boolean)
104+
If `true`, emit both original tokens and folded tokens.
105+
Defaults to `false`.
106+
107+
[[analysis-asciifolding-tokenfilter-customize]]
108+
==== Customize
109+
110+
To customize the `asciifolding` filter, duplicate it to create the basis
111+
for a new custom token filter. You can modify the filter using its configurable
112+
parameters.
113+
114+
For example, the following request creates a custom `asciifolding` filter with
115+
`preserve_original` set to true:
29116

30117
[source,console]
31118
--------------------------------------------------
@@ -34,7 +121,7 @@ PUT /asciifold_example
34121
"settings" : {
35122
"analysis" : {
36123
"analyzer" : {
37-
"default" : {
124+
"standard_asciifolding" : {
38125
"tokenizer" : "standard",
39126
"filter" : ["my_ascii_folding"]
40127
}

0 commit comments

Comments
 (0)