Skip to content

Commit 92abe04

Browse files
jrodewigSivagurunathanV
authored andcommitted
[DOCS] Reformat truncate token filter docs (elastic#50687)
* Updates the description and adds a Lucene link * Adds analyze, custom analyzer, and custom filter snippets * Adds parameter documentation
1 parent eb7bf44 commit 92abe04

File tree

1 file changed

+141
-4
lines changed

1 file changed

+141
-4
lines changed

docs/reference/analysis/tokenfilters/truncate-tokenfilter.asciidoc

Lines changed: 141 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,145 @@
44
<titleabbrev>Truncate</titleabbrev>
55
++++
66

7-
The `truncate` token filter can be used to truncate tokens into a
8-
specific length.
7+
Truncates tokens that exceed a specified character limit. This limit defaults to
8+
`10` but can be customized using the `length` parameter.
99

10-
It accepts a `length` parameter which control the number of characters
11-
to truncate to, defaults to `10`.
10+
For example, you can use the `truncate` filter to shorten all tokens to
11+
`3` characters or fewer, changing `jumping fox` to `jum fox`.
12+
13+
This filter uses Lucene's
14+
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html[TruncateTokenFilter].
15+
16+
[[analysis-truncate-tokenfilter-analyze-ex]]
17+
==== Example
18+
19+
The following <<indices-analyze,analyze API>> request uses the `truncate` filter
20+
to shorten tokens that exceed 10 characters in
21+
`the quinquennial extravaganza carried on`:
22+
23+
[source,console]
24+
--------------------------------------------------
25+
GET _analyze
26+
{
27+
"tokenizer" : "whitespace",
28+
"filter" : ["truncate"],
29+
"text" : "the quinquennial extravaganza carried on"
30+
}
31+
--------------------------------------------------
32+
33+
The filter produces the following tokens:
34+
35+
[source,text]
36+
--------------------------------------------------
37+
[ the, quinquenni, extravagan, carried, on ]
38+
--------------------------------------------------
39+
40+
/////////////////////
41+
[source,console-result]
42+
--------------------------------------------------
43+
{
44+
"tokens" : [
45+
{
46+
"token" : "the",
47+
"start_offset" : 0,
48+
"end_offset" : 3,
49+
"type" : "word",
50+
"position" : 0
51+
},
52+
{
53+
"token" : "quinquenni",
54+
"start_offset" : 4,
55+
"end_offset" : 16,
56+
"type" : "word",
57+
"position" : 1
58+
},
59+
{
60+
"token" : "extravagan",
61+
"start_offset" : 17,
62+
"end_offset" : 29,
63+
"type" : "word",
64+
"position" : 2
65+
},
66+
{
67+
"token" : "carried",
68+
"start_offset" : 30,
69+
"end_offset" : 37,
70+
"type" : "word",
71+
"position" : 3
72+
},
73+
{
74+
"token" : "on",
75+
"start_offset" : 38,
76+
"end_offset" : 40,
77+
"type" : "word",
78+
"position" : 4
79+
}
80+
]
81+
}
82+
--------------------------------------------------
83+
/////////////////////
84+
85+
[[analysis-truncate-tokenfilter-analyzer-ex]]
86+
==== Add to an analyzer
87+
88+
The following <<indices-create-index,create index API>> request uses the
89+
`truncate` filter to configure a new
90+
<<analysis-custom-analyzer,custom analyzer>>.
91+
92+
[source,console]
93+
--------------------------------------------------
94+
PUT custom_truncate_example
95+
{
96+
"settings" : {
97+
"analysis" : {
98+
"analyzer" : {
99+
"standard_truncate" : {
100+
"tokenizer" : "standard",
101+
"filter" : ["truncate"]
102+
}
103+
}
104+
}
105+
}
106+
}
107+
--------------------------------------------------
108+
109+
[[analysis-truncate-tokenfilter-configure-parms]]
110+
==== Configurable parameters
111+
112+
`length`::
113+
(Optional, integer)
114+
Character limit for each token. Tokens exceeding this limit are truncated.
115+
Defaults to `10`.
116+
117+
[[analysis-truncate-tokenfilter-customize]]
118+
==== Customize
119+
120+
To customize the `truncate` filter, duplicate it to create the basis
121+
for a new custom token filter. You can modify the filter using its configurable
122+
parameters.
123+
124+
For example, the following request creates a custom `truncate` filter,
125+
`5_char_trunc`, that shortens tokens to a `length` of `5` or fewer characters:
126+
127+
[source,console]
128+
--------------------------------------------------
129+
PUT 5_char_words_example
130+
{
131+
"settings": {
132+
"analysis": {
133+
"analyzer": {
134+
"lowercase_5_char": {
135+
"tokenizer": "lowercase",
136+
"filter": [ "5_char_trunc" ]
137+
}
138+
},
139+
"filter": {
140+
"5_char_trunc": {
141+
"type": "truncate",
142+
"length": 5
143+
}
144+
}
145+
}
146+
}
147+
}
148+
--------------------------------------------------

0 commit comments

Comments
 (0)