1
1
[[indices-analyze]]
2
- === Analyze
2
+ === Analyze API
3
+ ++++
4
+ <titleabbrev>Analyze</titleabbrev>
5
+ ++++
3
6
4
- Performs the analysis process on a text and return the tokens breakdown
5
- of the text .
7
+ Performs << analysis,analysis>> on a text string
8
+ and returns the resulting tokens .
6
9
7
- Can be used without specifying an index against one of the many built in
8
- analyzers:
10
+ [source,js]
11
+ --------------------------------------------------
12
+ GET /_analyze
13
+ {
14
+ "analyzer" : "standard",
15
+ "text" : "Quick Brown Foxes!"
16
+ }
17
+ --------------------------------------------------
18
+ // CONSOLE
19
+
20
+
21
+ [[analyze-api-request]]
22
+ ==== {api-request-title}
23
+
24
+ `GET /_analyze`
25
+
26
+ `POST /_analyze`
27
+
28
+ `GET /<index>/_analyze`
29
+
30
+ `POST /<index>/_analyze`
31
+
32
+
33
+ [[analyze-api-path-params]]
34
+ ==== {api-path-parms-title}
35
+
36
+ `<index>`::
37
+ +
38
+ --
39
+ (Optional, string)
40
+ Index used to derive the analyzer.
41
+
42
+ If specified,
43
+ the `analyzer` or `<field>` parameter overrides this value.
44
+
45
+ If no analyzer or field are specified,
46
+ the analyze API uses the default analyzer for the index.
47
+
48
+ If no index is specified
49
+ or the index does not have a default analyzer,
50
+ the analyze API uses the <<analysis-standard-analyzer,standard analyzer>>.
51
+ --
52
+
53
+
54
+ [[analyze-api-query-params]]
55
+ ==== {api-query-parms-title}
56
+
57
+ `analyzer`::
58
+ +
59
+ --
60
+ (Optional, string or <<analysis-custom-analyzer,custom analyzer object>>)
61
+ Analyzer used to analyze for the provided `text`.
62
+
63
+ See <<analysis-analyzers>> for a list of built-in analyzers.
64
+ You can also provide a <<analysis-custom-analyzer,custom analyzer>>.
65
+
66
+ If this parameter is not specified,
67
+ the analyze API uses the analyzer defined in the field's mapping.
68
+
69
+ If no field is specified,
70
+ the analyze API uses the default analyzer for the index.
71
+
72
+ If no index is specified,
73
+ or the index does not have a default analyzer,
74
+ the analyze API uses the <<analysis-standard-analyzer,standard analyzer>>.
75
+ --
76
+
77
+ `attributes`::
78
+ (Optional, array of strings)
79
+ Array of token attributes used to filter the output of the `explain` parameter.
80
+
81
+ `char_filter`::
82
+ (Optional, array of strings)
83
+ Array of character filters used to preprocess characters before the tokenizer.
84
+ See <<analysis-charfilters>> for a list of character filters.
85
+
86
+ `explain`::
87
+ (Optional, boolean)
88
+ If `true`, the response includes token attributes and additional details.
89
+ Defaults to `false`.
90
+ experimental:[The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.]
91
+
92
+ `field`::
93
+ +
94
+ --
95
+ (Optional, string)
96
+ Field used to derive the analyzer.
97
+ To use this parameter,
98
+ you must specify an index.
99
+
100
+ If specified,
101
+ the `analyzer` parameter overrides this value.
102
+
103
+ If no field is specified,
104
+ the analyze API uses the default analyzer for the index.
105
+
106
+ If no index is specified
107
+ or the index does not have a default analyzer,
108
+ the analyze API uses the <<analysis-standard-analyzer,standard analyzer>>.
109
+ --
110
+
111
+ `filter`::
112
+ (Optional, Array of strings)
113
+ Array of token filters used to apply after the tokenizer.
114
+ See <<analysis-tokenfilters>> for a list of token filters.
115
+
116
+ `normalizer`::
117
+ (Optional, string)
118
+ Normalizer to use to convert text into a single token.
119
+ See <<analysis-normalizers>> for a list of normalizers.
120
+
121
+ `text`::
122
+ (Required, string or array of strings)
123
+ Text to analyze.
124
+ If an array of strings is provided, it is analyzed as a multi-value field.
125
+
126
+ `tokenizer`::
127
+ (Optional, string)
128
+ Tokenizer to use to convert text into tokens.
129
+ See <<analysis-tokenizers>> for a list of tokenizers.
130
+
131
+ [[analyze-api-example]]
132
+ ==== {api-examples-title}
133
+
134
+ [[analyze-api-no-index-ex]]
135
+ ===== No index specified
136
+
137
+ You can apply any of the built-in analyzers to the text string without
138
+ specifying an index.
9
139
10
140
[source,js]
11
141
--------------------------------------------------
12
- GET _analyze
142
+ GET / _analyze
13
143
{
14
144
"analyzer" : "standard",
15
145
"text" : "this is a test"
16
146
}
17
147
--------------------------------------------------
18
148
// CONSOLE
19
149
20
- If text parameter is provided as array of strings, it is analyzed as a multi-valued field.
150
+ [[analyze-api-text-array-ex]]
151
+ ===== Array of text strings
152
+
153
+ If the `text` parameter is provided as array of strings, it is analyzed as a multi-value field.
21
154
22
155
[source,js]
23
156
--------------------------------------------------
24
- GET _analyze
157
+ GET / _analyze
25
158
{
26
159
"analyzer" : "standard",
27
160
"text" : ["this is a test", "the second text"]
28
161
}
29
162
--------------------------------------------------
30
163
// CONSOLE
31
164
32
- Or by building a custom transient analyzer out of tokenizers,
33
- token filters and char filters. Token filters can use the shorter 'filter'
34
- parameter name:
165
+ [[analyze-api-custom-analyzer-ex]]
166
+ ===== Custom analyzer
167
+
168
+ You can use the analyze API to test a custom transient analyzer built from
169
+ tokenizers, token filters, and char filters. Token filters use the `filter`
170
+ parameter:
35
171
36
172
[source,js]
37
173
--------------------------------------------------
38
- GET _analyze
174
+ GET / _analyze
39
175
{
40
176
"tokenizer" : "keyword",
41
177
"filter" : ["lowercase"],
@@ -46,7 +182,7 @@ GET _analyze
46
182
47
183
[source,js]
48
184
--------------------------------------------------
49
- GET _analyze
185
+ GET / _analyze
50
186
{
51
187
"tokenizer" : "keyword",
52
188
"filter" : ["lowercase"],
@@ -62,7 +198,7 @@ Custom tokenizers, token filters, and character filters can be specified in the
62
198
63
199
[source,js]
64
200
--------------------------------------------------
65
- GET _analyze
201
+ GET / _analyze
66
202
{
67
203
"tokenizer" : "whitespace",
68
204
"filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
@@ -71,11 +207,14 @@ GET _analyze
71
207
--------------------------------------------------
72
208
// CONSOLE
73
209
74
- It can also run against a specific index:
210
+ [[analyze-api-specific-index-ex]]
211
+ ===== Specific index
212
+
213
+ You can also run the analyze API against a specific index:
75
214
76
215
[source,js]
77
216
--------------------------------------------------
78
- GET analyze_sample/_analyze
217
+ GET / analyze_sample/_analyze
79
218
{
80
219
"text" : "this is a test"
81
220
}
@@ -89,7 +228,7 @@ can also be provided to use a different analyzer:
89
228
90
229
[source,js]
91
230
--------------------------------------------------
92
- GET analyze_sample/_analyze
231
+ GET / analyze_sample/_analyze
93
232
{
94
233
"analyzer" : "whitespace",
95
234
"text" : "this is a test"
@@ -98,11 +237,14 @@ GET analyze_sample/_analyze
98
237
// CONSOLE
99
238
// TEST[setup:analyze_sample]
100
239
101
- Also, the analyzer can be derived based on a field mapping, for example:
240
+ [[analyze-api-field-ex]]
241
+ ===== Derive analyzer from a field mapping
242
+
243
+ The analyzer can be derived based on a field mapping, for example:
102
244
103
245
[source,js]
104
246
--------------------------------------------------
105
- GET analyze_sample/_analyze
247
+ GET / analyze_sample/_analyze
106
248
{
107
249
"field" : "obj1.field1",
108
250
"text" : "this is a test"
@@ -114,11 +256,14 @@ GET analyze_sample/_analyze
114
256
Will cause the analysis to happen based on the analyzer configured in the
115
257
mapping for `obj1.field1` (and if not, the default index analyzer).
116
258
259
+ [[analyze-api-normalizer-ex]]
260
+ ===== Normalizer
261
+
117
262
A `normalizer` can be provided for keyword field with normalizer associated with the `analyze_sample` index.
118
263
119
264
[source,js]
120
265
--------------------------------------------------
121
- GET analyze_sample/_analyze
266
+ GET / analyze_sample/_analyze
122
267
{
123
268
"normalizer" : "my_normalizer",
124
269
"text" : "BaR"
@@ -131,7 +276,7 @@ Or by building a custom transient normalizer out of token filters and char filte
131
276
132
277
[source,js]
133
278
--------------------------------------------------
134
- GET _analyze
279
+ GET / _analyze
135
280
{
136
281
"filter" : ["lowercase"],
137
282
"text" : "BaR"
@@ -140,7 +285,7 @@ GET _analyze
140
285
// CONSOLE
141
286
142
287
[[explain-analyze-api]]
143
- ==== Explain Analyze
288
+ ===== Explain analyze
144
289
145
290
If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
146
291
You can filter token attributes you want to output by setting `attributes` option.
@@ -149,7 +294,7 @@ NOTE: The format of the additional detail information is labelled as experimenta
149
294
150
295
[source,js]
151
296
--------------------------------------------------
152
- GET _analyze
297
+ GET / _analyze
153
298
{
154
299
"tokenizer" : "standard",
155
300
"filter" : ["snowball"],
@@ -210,8 +355,7 @@ The request returns the following result:
210
355
<1> Output only "keyword" attribute, since specify "attributes" in the request.
211
356
212
357
[[tokens-limit-settings]]
213
- [float]
214
- === Settings to prevent tokens explosion
358
+ ===== Setting a token limit
215
359
Generating excessive amount of tokens may cause a node to run out of memory.
216
360
The following setting allows to limit the number of tokens that can be produced:
217
361
@@ -225,7 +369,7 @@ The following setting allows to limit the number of tokens that can be produced:
225
369
226
370
[source,js]
227
371
--------------------------------------------------
228
- PUT analyze_sample
372
+ PUT / analyze_sample
229
373
{
230
374
"settings" : {
231
375
"index.analyze.max_token_count" : 20000
@@ -237,7 +381,7 @@ PUT analyze_sample
237
381
238
382
[source,js]
239
383
--------------------------------------------------
240
- GET analyze_sample/_analyze
384
+ GET / analyze_sample/_analyze
241
385
{
242
386
"text" : "this is a test"
243
387
}
0 commit comments