Skip to content

Commit cd04021

Browse files
committed
[DOCS] Reformat token count limit filter docs (elastic#49835)
1 parent 4637610 commit cd04021

File tree

1 file changed

+118
-16
lines changed

1 file changed

+118
-16
lines changed

docs/reference/analysis/tokenfilters/limit-token-count-tokenfilter.asciidoc

+118-16
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,134 @@
11
[[analysis-limit-token-count-tokenfilter]]
2-
=== Limit Token Count Token Filter
2+
=== Limit token count token filter
3+
++++
4+
<titleabbrev>Limit token count</titleabbrev>
5+
++++
36

4-
Limits the number of tokens that are indexed per document and field.
7+
Limits the number of output tokens. The `limit` filter is commonly used to limit
8+
the size of document field values based on token count.
59

6-
[cols="<,<",options="header",]
7-
|=======================================================================
8-
|Setting |Description
9-
|`max_token_count` |The maximum number of tokens that should be indexed
10-
per document and field. The default is `1`
10+
By default, the `limit` filter keeps only the first token in a stream. For
11+
example, the filter can change the token stream `[ one, two, three ]` to
12+
`[ one ]`.
1113

12-
|`consume_all_tokens` |If set to `true` the filter exhaust the stream
13-
even if `max_token_count` tokens have been consumed already. The default
14-
is `false`.
15-
|=======================================================================
14+
This filter uses Lucene's
15+
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html[LimitTokenCountFilter].
1616

17-
Here is an example:
17+
[TIP]
18+
====
19+
If you want to limit the size of field values based on
20+
_character length_, use the <<ignore-above,`ignore_above`>> mapping parameter.
21+
====
22+
23+
[[analysis-limit-token-count-tokenfilter-configure-parms]]
24+
==== Configurable parameters
25+
26+
`max_token_count`::
27+
(Optional, integer)
28+
Maximum number of tokens to keep. Once this limit is reached, any remaining
29+
tokens are excluded from the output. Defaults to `1`.
30+
31+
`consume_all_tokens`::
32+
(Optional, boolean)
33+
If `true`, the `limit` filter exhausts the token stream, even if the
34+
`max_token_count` has already been reached. Defaults to `false`.
35+
36+
[[analysis-limit-token-count-tokenfilter-analyze-ex]]
37+
==== Example
38+
39+
The following <<indices-analyze,analyze API>> request uses the `limit`
40+
filter to keep only the first two tokens in `quick fox jumps over lazy dog`:
41+
42+
[source,console]
43+
--------------------------------------------------
44+
GET _analyze
45+
{
46+
"tokenizer": "standard",
47+
"filter": [
48+
{
49+
"type": "limit",
50+
"max_token_count": 2
51+
}
52+
],
53+
"text": "quick fox jumps over lazy dog"
54+
}
55+
--------------------------------------------------
56+
57+
The filter produces the following tokens:
58+
59+
[source,text]
60+
--------------------------------------------------
61+
[ quick, fox ]
62+
--------------------------------------------------
63+
64+
/////////////////////
65+
[source,console-result]
66+
--------------------------------------------------
67+
{
68+
"tokens": [
69+
{
70+
"token": "quick",
71+
"start_offset": 0,
72+
"end_offset": 5,
73+
"type": "<ALPHANUM>",
74+
"position": 0
75+
},
76+
{
77+
"token": "fox",
78+
"start_offset": 6,
79+
"end_offset": 9,
80+
"type": "<ALPHANUM>",
81+
"position": 1
82+
}
83+
]
84+
}
85+
--------------------------------------------------
86+
/////////////////////
87+
88+
[[analysis-limit-token-count-tokenfilter-analyzer-ex]]
89+
==== Add to an analyzer
90+
91+
The following <<indices-create-index,create index API>> request uses the
92+
`limit` filter to configure a new
93+
<<analysis-custom-analyzer,custom analyzer>>.
1894

1995
[source,console]
2096
--------------------------------------------------
21-
PUT /limit_example
97+
PUT limit_example
2298
{
2399
"settings": {
24100
"analysis": {
25101
"analyzer": {
26-
"limit_example": {
27-
"type": "custom",
102+
"standard_one_token_limit": {
28103
"tokenizer": "standard",
29-
"filter": ["lowercase", "five_token_limit"]
104+
"filter": [ "limit" ]
105+
}
106+
}
107+
}
108+
}
109+
}
110+
--------------------------------------------------
111+
112+
[[analysis-limit-token-count-tokenfilter-customize]]
113+
==== Customize
114+
115+
To customize the `limit` filter, duplicate it to create the basis
116+
for a new custom token filter. You can modify the filter using its configurable
117+
parameters.
118+
119+
For example, the following request creates a custom `limit` filter that keeps
120+
only the first five tokens of a stream:
121+
122+
[source,console]
123+
--------------------------------------------------
124+
PUT custom_limit_example
125+
{
126+
"settings": {
127+
"analysis": {
128+
"analyzer": {
129+
"whitespace_five_token_limit": {
130+
"tokenizer": "whitespace",
131+
"filter": [ "five_token_limit" ]
30132
}
31133
},
32134
"filter": {

0 commit comments

Comments
 (0)