Skip to content

Commit d336faa

Browse files
committed
[DOCS] Reformat trim token filter docs (#51649)
Makes the following changes to the `trim` token filter docs: * Updates description * Adds a link to the related Lucene filter * Adds tip about removing whitespace using tokenizers * Adds detailed analyze snippets * Adds custom analyzer snippet
1 parent f5bccad commit d336faa

File tree

1 file changed

+104
-1
lines changed

1 file changed

+104
-1
lines changed

docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc

+104-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,107 @@
44
<titleabbrev>Trim</titleabbrev>
55
++++
66

7-
The `trim` token filter trims the whitespace surrounding a token.
7+
Removes leading and trailing whitespace from each token in a stream.
8+
9+
The `trim` filter uses Lucene's
10+
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].
11+
12+
[TIP]
13+
====
14+
Many commonly used tokenizers, such as the
15+
<<analysis-standard-tokenizer,`standard`>> or
16+
<<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
17+
default. When using these tokenizers, you don't need to add a separate `trim`
18+
filter.
19+
====
20+
21+
[[analysis-trim-tokenfilter-analyze-ex]]
22+
==== Example
23+
24+
To see how the `trim` filter works, you first need to produce a token
25+
containing whitespace.
26+
27+
The following <<indices-analyze,analyze API>> request uses the
28+
<<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for
29+
`" fox "`.
30+
31+
[source,console]
32+
----
33+
GET _analyze
34+
{
35+
"tokenizer" : "keyword",
36+
"text" : " fox "
37+
}
38+
----
39+
40+
The API returns the following response. Note the `" fox "` token contains
41+
the original text's whitespace.
42+
43+
[source,console-result]
44+
----
45+
{
46+
"tokens": [
47+
{
48+
"token": " fox ",
49+
"start_offset": 0,
50+
"end_offset": 5,
51+
"type": "word",
52+
"position": 0
53+
}
54+
]
55+
}
56+
----
57+
58+
To remove the whitespace, add the `trim` filter to the previous analyze API
59+
request.
60+
61+
[source,console]
62+
----
63+
GET _analyze
64+
{
65+
"tokenizer" : "keyword",
66+
"filter" : ["trim"],
67+
"text" : " fox "
68+
}
69+
----
70+
71+
The API returns the following response. The returned `fox` token does not
72+
include any leading or trailing whitespace.
73+
74+
[source,console-result]
75+
----
76+
{
77+
"tokens": [
78+
{
79+
"token": "fox",
80+
"start_offset": 0,
81+
"end_offset": 5,
82+
"type": "word",
83+
"position": 0
84+
}
85+
]
86+
}
87+
----
88+
89+
[[analysis-trim-tokenfilter-analyzer-ex]]
90+
==== Add to an analyzer
91+
92+
The following <<indices-create-index,create index API>> request uses the `trim`
93+
filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
94+
95+
[source,console]
96+
----
97+
PUT trim_example
98+
{
99+
"settings": {
100+
"analysis": {
101+
"analyzer": {
102+
"keyword_trim": {
103+
"tokenizer": "keyword",
104+
"filter": [ "trim" ]
105+
}
106+
}
107+
}
108+
}
109+
}
110+
----

0 commit comments

Comments
 (0)