Skip to content

Commit 6c5c41e

Browse files
jrodewigjkakavas
authored andcommitted
[DOCS] Reformats analyze API (elastic#45986)
1 parent 1c0cc85 commit 6c5c41e

File tree

1 file changed

+171
-27
lines changed

1 file changed

+171
-27
lines changed

docs/reference/indices/analyze.asciidoc

+171-27
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,177 @@
11
[[indices-analyze]]
2-
=== Analyze
2+
=== Analyze API
3+
++++
4+
<titleabbrev>Analyze</titleabbrev>
5+
++++
36

4-
Performs the analysis process on a text and return the tokens breakdown
5-
of the text.
7+
Performs <<analysis,analysis>> on a text string
8+
and returns the resulting tokens.
69

7-
Can be used without specifying an index against one of the many built in
8-
analyzers:
10+
[source,js]
11+
--------------------------------------------------
12+
GET /_analyze
13+
{
14+
"analyzer" : "standard",
15+
"text" : "Quick Brown Foxes!"
16+
}
17+
--------------------------------------------------
18+
// CONSOLE
19+
20+
21+
[[analyze-api-request]]
22+
==== {api-request-title}
23+
24+
`GET /_analyze`
25+
26+
`POST /_analyze`
27+
28+
`GET /<index>/_analyze`
29+
30+
`POST /<index>/_analyze`
31+
32+
33+
[[analyze-api-path-params]]
34+
==== {api-path-parms-title}
35+
36+
`<index>`::
37+
+
38+
--
39+
(Optional, string)
40+
Index used to derive the analyzer.
41+
42+
If specified,
43+
the `analyzer` or `<field>` parameter overrides this value.
44+
45+
If no analyzer or field are specified,
46+
the analyze API uses the default analyzer for the index.
47+
48+
If no index is specified
49+
or the index does not have a default analyzer,
50+
the analyze API uses the <<analysis-standard-analyzer,standard analyzer>>.
51+
--
52+
53+
54+
[[analyze-api-query-params]]
55+
==== {api-query-parms-title}
56+
57+
`analyzer`::
58+
+
59+
--
60+
(Optional, string or <<analysis-custom-analyzer,custom analyzer object>>)
61+
Analyzer used to analyze for the provided `text`.
62+
63+
See <<analysis-analyzers>> for a list of built-in analyzers.
64+
You can also provide a <<analysis-custom-analyzer,custom analyzer>>.
65+
66+
If this parameter is not specified,
67+
the analyze API uses the analyzer defined in the field's mapping.
68+
69+
If no field is specified,
70+
the analyze API uses the default analyzer for the index.
71+
72+
If no index is specified,
73+
or the index does not have a default analyzer,
74+
the analyze API uses the <<analysis-standard-analyzer,standard analyzer>>.
75+
--
76+
77+
`attributes`::
78+
(Optional, array of strings)
79+
Array of token attributes used to filter the output of the `explain` parameter.
80+
81+
`char_filter`::
82+
(Optional, array of strings)
83+
Array of character filters used to preprocess characters before the tokenizer.
84+
See <<analysis-charfilters>> for a list of character filters.
85+
86+
`explain`::
87+
(Optional, boolean)
88+
If `true`, the response includes token attributes and additional details.
89+
Defaults to `false`.
90+
experimental:[The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.]
91+
92+
`field`::
93+
+
94+
--
95+
(Optional, string)
96+
Field used to derive the analyzer.
97+
To use this parameter,
98+
you must specify an index.
99+
100+
If specified,
101+
the `analyzer` parameter overrides this value.
102+
103+
If no field is specified,
104+
the analyze API uses the default analyzer for the index.
105+
106+
If no index is specified
107+
or the index does not have a default analyzer,
108+
the analyze API uses the <<analysis-standard-analyzer,standard analyzer>>.
109+
--
110+
111+
`filter`::
112+
(Optional, Array of strings)
113+
Array of token filters used to apply after the tokenizer.
114+
See <<analysis-tokenfilters>> for a list of token filters.
115+
116+
`normalizer`::
117+
(Optional, string)
118+
Normalizer to use to convert text into a single token.
119+
See <<analysis-normalizers>> for a list of normalizers.
120+
121+
`text`::
122+
(Required, string or array of strings)
123+
Text to analyze.
124+
If an array of strings is provided, it is analyzed as a multi-value field.
125+
126+
`tokenizer`::
127+
(Optional, string)
128+
Tokenizer to use to convert text into tokens.
129+
See <<analysis-tokenizers>> for a list of tokenizers.
130+
131+
[[analyze-api-example]]
132+
==== {api-examples-title}
133+
134+
[[analyze-api-no-index-ex]]
135+
===== No index specified
136+
137+
You can apply any of the built-in analyzers to the text string without
138+
specifying an index.
9139

10140
[source,js]
11141
--------------------------------------------------
12-
GET _analyze
142+
GET /_analyze
13143
{
14144
"analyzer" : "standard",
15145
"text" : "this is a test"
16146
}
17147
--------------------------------------------------
18148
// CONSOLE
19149

20-
If text parameter is provided as array of strings, it is analyzed as a multi-valued field.
150+
[[analyze-api-text-array-ex]]
151+
===== Array of text strings
152+
153+
If the `text` parameter is provided as array of strings, it is analyzed as a multi-value field.
21154

22155
[source,js]
23156
--------------------------------------------------
24-
GET _analyze
157+
GET /_analyze
25158
{
26159
"analyzer" : "standard",
27160
"text" : ["this is a test", "the second text"]
28161
}
29162
--------------------------------------------------
30163
// CONSOLE
31164

32-
Or by building a custom transient analyzer out of tokenizers,
33-
token filters and char filters. Token filters can use the shorter 'filter'
34-
parameter name:
165+
[[analyze-api-custom-analyzer-ex]]
166+
===== Custom analyzer
167+
168+
You can use the analyze API to test a custom transient analyzer built from
169+
tokenizers, token filters, and char filters. Token filters use the `filter`
170+
parameter:
35171

36172
[source,js]
37173
--------------------------------------------------
38-
GET _analyze
174+
GET /_analyze
39175
{
40176
"tokenizer" : "keyword",
41177
"filter" : ["lowercase"],
@@ -46,7 +182,7 @@ GET _analyze
46182

47183
[source,js]
48184
--------------------------------------------------
49-
GET _analyze
185+
GET /_analyze
50186
{
51187
"tokenizer" : "keyword",
52188
"filter" : ["lowercase"],
@@ -62,7 +198,7 @@ Custom tokenizers, token filters, and character filters can be specified in the
62198

63199
[source,js]
64200
--------------------------------------------------
65-
GET _analyze
201+
GET /_analyze
66202
{
67203
"tokenizer" : "whitespace",
68204
"filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
@@ -71,11 +207,14 @@ GET _analyze
71207
--------------------------------------------------
72208
// CONSOLE
73209

74-
It can also run against a specific index:
210+
[[analyze-api-specific-index-ex]]
211+
===== Specific index
212+
213+
You can also run the analyze API against a specific index:
75214

76215
[source,js]
77216
--------------------------------------------------
78-
GET analyze_sample/_analyze
217+
GET /analyze_sample/_analyze
79218
{
80219
"text" : "this is a test"
81220
}
@@ -89,7 +228,7 @@ can also be provided to use a different analyzer:
89228

90229
[source,js]
91230
--------------------------------------------------
92-
GET analyze_sample/_analyze
231+
GET /analyze_sample/_analyze
93232
{
94233
"analyzer" : "whitespace",
95234
"text" : "this is a test"
@@ -98,11 +237,14 @@ GET analyze_sample/_analyze
98237
// CONSOLE
99238
// TEST[setup:analyze_sample]
100239

101-
Also, the analyzer can be derived based on a field mapping, for example:
240+
[[analyze-api-field-ex]]
241+
===== Derive analyzer from a field mapping
242+
243+
The analyzer can be derived based on a field mapping, for example:
102244

103245
[source,js]
104246
--------------------------------------------------
105-
GET analyze_sample/_analyze
247+
GET /analyze_sample/_analyze
106248
{
107249
"field" : "obj1.field1",
108250
"text" : "this is a test"
@@ -114,11 +256,14 @@ GET analyze_sample/_analyze
114256
Will cause the analysis to happen based on the analyzer configured in the
115257
mapping for `obj1.field1` (and if not, the default index analyzer).
116258

259+
[[analyze-api-normalizer-ex]]
260+
===== Normalizer
261+
117262
A `normalizer` can be provided for keyword field with normalizer associated with the `analyze_sample` index.
118263

119264
[source,js]
120265
--------------------------------------------------
121-
GET analyze_sample/_analyze
266+
GET /analyze_sample/_analyze
122267
{
123268
"normalizer" : "my_normalizer",
124269
"text" : "BaR"
@@ -131,7 +276,7 @@ Or by building a custom transient normalizer out of token filters and char filte
131276

132277
[source,js]
133278
--------------------------------------------------
134-
GET _analyze
279+
GET /_analyze
135280
{
136281
"filter" : ["lowercase"],
137282
"text" : "BaR"
@@ -140,7 +285,7 @@ GET _analyze
140285
// CONSOLE
141286

142287
[[explain-analyze-api]]
143-
==== Explain Analyze
288+
===== Explain analyze
144289

145290
If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
146291
You can filter token attributes you want to output by setting `attributes` option.
@@ -149,7 +294,7 @@ NOTE: The format of the additional detail information is labelled as experimenta
149294

150295
[source,js]
151296
--------------------------------------------------
152-
GET _analyze
297+
GET /_analyze
153298
{
154299
"tokenizer" : "standard",
155300
"filter" : ["snowball"],
@@ -210,8 +355,7 @@ The request returns the following result:
210355
<1> Output only "keyword" attribute, since specify "attributes" in the request.
211356

212357
[[tokens-limit-settings]]
213-
[float]
214-
=== Settings to prevent tokens explosion
358+
===== Setting a token limit
215359
Generating excessive amount of tokens may cause a node to run out of memory.
216360
The following setting allows to limit the number of tokens that can be produced:
217361

@@ -225,7 +369,7 @@ The following setting allows to limit the number of tokens that can be produced:
225369

226370
[source,js]
227371
--------------------------------------------------
228-
PUT analyze_sample
372+
PUT /analyze_sample
229373
{
230374
"settings" : {
231375
"index.analyze.max_token_count" : 20000
@@ -237,7 +381,7 @@ PUT analyze_sample
237381

238382
[source,js]
239383
--------------------------------------------------
240-
GET analyze_sample/_analyze
384+
GET /analyze_sample/_analyze
241385
{
242386
"text" : "this is a test"
243387
}

0 commit comments

Comments
 (0)