1
1
[[docs-bulk]]
2
2
=== Bulk API
3
+ ++++
4
+ <titleabbrev>Bulk</titleabbrev>
5
+ ++++
3
6
4
- The bulk API makes it possible to perform many index/delete operations
5
- in a single API call. This can greatly increase the indexing speed.
7
+ Performs multiple indexing or delete operations in a single API call.
8
+ This reduces overhead and can greatly increase indexing speed.
6
9
7
- .Client support for bulk requests
8
- *********************************************
9
-
10
- Some of the officially supported clients provide helpers to assist with
11
- bulk requests and reindexing of documents from one index to another:
10
+ [source,console]
11
+ --------------------------------------------------
12
+ POST _bulk
13
+ { "index" : { "_index" : "test", "_id" : "1" } }
14
+ { "field1" : "value1" }
15
+ { "delete" : { "_index" : "test", "_id" : "2" } }
16
+ { "create" : { "_index" : "test", "_id" : "3" } }
17
+ { "field1" : "value3" }
18
+ { "update" : {"_id" : "1", "_index" : "test"} }
19
+ { "doc" : {"field2" : "value2"} }
20
+ --------------------------------------------------
12
21
13
- Perl::
22
+ [[docs-bulk-api-request]]
23
+ ==== {api-request-title}
14
24
15
- See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
16
- and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
25
+ `POST /_bulk`
17
26
18
- Python::
27
+ `POST /<index>/_bulk`
19
28
20
- See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
29
+ [[docs-bulk-api-desc]]
30
+ ==== {api-description-title}
21
31
22
- *********************************************
32
+ Provides a way to perform multiple `index`, `create`, `delete`, and `update` actions in a single request.
23
33
24
- The REST API endpoint is `/_bulk`, and it expects the following newline delimited JSON
25
- (NDJSON) structure:
34
+ The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
26
35
27
36
[source,js]
28
37
--------------------------------------------------
@@ -36,19 +45,70 @@ optional_source\n
36
45
--------------------------------------------------
37
46
// NOTCONSOLE
38
47
39
- *NOTE*: The final line of data must end with a newline character `\n`. Each newline character
40
- may be preceded by a carriage return `\r`. When sending requests to this endpoint the
41
- `Content-Type` header should be set to `application/x-ndjson`.
48
+ The `index` and `create` actions expect a source on the next line,
49
+ and have the same semantics as the `op_type` parameter in the standard index API:
50
+ create fails if a document with the same name already exists in the index,
51
+ index adds or replaces a document as necessary.
52
+
53
+ `update` expects that the partial doc, upsert,
54
+ and script and its options are specified on the next line.
55
+
56
+ `delete` does not expect a source on the next line and
57
+ has the same semantics as the standard delete API.
58
+
59
+ [NOTE]
60
+ ====
61
+ The final line of data must end with a newline character `\n`.
62
+ Each newline character may be preceded by a carriage return `\r`.
63
+ When sending requests to the `_bulk` endpoint,
64
+ the `Content-Type` header should be set to `application/x-ndjson`.
65
+ ====
66
+
67
+ Because this format uses literal `\n`'s as delimiters,
68
+ make sure that the JSON actions and sources are not pretty printed.
69
+
70
+ If you specify an index in the request URI,
71
+ it is used for any actions that don't explicitly specify an index.
72
+
73
+ A note on the format: The idea here is to make processing of this as
74
+ fast as possible. As some of the actions are redirected to other
75
+ shards on other nodes, only `action_meta_data` is parsed on the
76
+ receiving node side.
77
+
78
+ Client libraries using this protocol should try and strive to do
79
+ something similar on the client side, and reduce buffering as much as
80
+ possible.
81
+
82
+ The response to a bulk action is a large JSON structure with
83
+ the individual results of each action performed,
84
+ in the same order as the actions that appeared in the request.
85
+ The failure of a single action does not affect the remaining actions.
86
+
87
+ There is no "correct" number of actions to perform in a single bulk request.
88
+ Experiment with different settings to find the optimal size for your particular workload.
89
+
90
+ When using the HTTP API, make sure that the client does not send HTTP chunks,
91
+ as this will slow things down.
92
+
93
+ [float]
94
+ [[bulk-clients]]
95
+ ===== Client support for bulk requests
96
+
97
+ Some of the officially supported clients provide helpers to assist with
98
+ bulk requests and reindexing of documents from one index to another:
99
+
100
+ Perl::
101
+
102
+ See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
103
+ and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
104
+
105
+ Python::
106
+
107
+ See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
42
108
43
- The possible actions are `index`, `create`, `delete`, and `update`.
44
- `index` and `create` expect a source on the next
45
- line, and have the same semantics as the `op_type` parameter to the
46
- standard index API (i.e. create will fail if a document with the same
47
- index exists already, whereas index will add or replace a
48
- document as necessary). `delete` does not expect a source on the
49
- following line, and has the same semantics as the standard delete API.
50
- `update` expects that the partial doc, upsert and script and its options
51
- are specified on the next line.
109
+ [float]
110
+ [[bulk-curl]]
111
+ ===== Submitting bulk requests with cURL
52
112
53
113
If you're providing text file input to `curl`, you *must* use the
54
114
`--data-binary` flag instead of plain `-d`. The latter doesn't preserve
@@ -65,9 +125,97 @@ $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --
65
125
// NOTCONSOLE
66
126
// Not converting to console because this shows how curl works
67
127
68
- Because this format uses literal `\n`'s as delimiters, please be sure
69
- that the JSON actions and sources are not pretty printed. Here is an
70
- example of a correct sequence of bulk commands:
128
+ [float]
129
+ [[bulk-optimistic-concurrency-control]]
130
+ ===== Optimistic Concurrency Control
131
+
132
+ Each `index` and `delete` action within a bulk API call may include the
133
+ `if_seq_no` and `if_primary_term` parameters in their respective action
134
+ and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
135
+ how operations are executed, based on the last modification to existing
136
+ documents. See <<optimistic-concurrency-control>> for more details.
137
+
138
+
139
+ [float]
140
+ [[bulk-versioning]]
141
+ ===== Versioning
142
+
143
+ Each bulk item can include the version value using the
144
+ `version` field. It automatically follows the behavior of the
145
+ index / delete operation based on the `_version` mapping. It also
146
+ support the `version_type` (see <<index-versioning, versioning>>).
147
+
148
+ [float]
149
+ [[bulk-routing]]
150
+ ===== Routing
151
+
152
+ Each bulk item can include the routing value using the
153
+ `routing` field. It automatically follows the behavior of the
154
+ index / delete operation based on the `_routing` mapping.
155
+
156
+ [float]
157
+ [[bulk-wait-for-active-shards]]
158
+ ===== Wait For Active Shards
159
+
160
+ When making bulk calls, you can set the `wait_for_active_shards`
161
+ parameter to require a minimum number of shard copies to be active
162
+ before starting to process the bulk request. See
163
+ <<index-wait-for-active-shards,here>> for further details and a usage
164
+ example.
165
+
166
+ [float]
167
+ [[bulk-refresh]]
168
+ ===== Refresh
169
+
170
+ Control when the changes made by this request are visible to search. See
171
+ <<docs-refresh,refresh>>.
172
+
173
+ NOTE: Only the shards that receive the bulk request will be affected by
174
+ `refresh`. Imagine a `_bulk?refresh=wait_for` request with three
175
+ documents in it that happen to be routed to different shards in an index
176
+ with five shards. The request will only wait for those three shards to
177
+ refresh. The other two shards that make up the index do not
178
+ participate in the `_bulk` request at all.
179
+
180
+ [float]
181
+ [[bulk-security]]
182
+ ===== Security
183
+
184
+ See <<url-access-control>>.
185
+
186
+ [float]
187
+ [[bulk-partial-responses]]
188
+ ===== Partial responses
189
+ To ensure fast responses, the bulk API will respond with partial results if one or more shards fail.
190
+ See <<shard-failures, Shard failures>> for more information.
191
+
192
+ [[docs-bulk-api-path-params]]
193
+ ==== {api-path-parms-title}
194
+
195
+ `<index>`::
196
+ (Optional, string) Name of the index to perform the bulk actions against.
197
+
198
+ [[docs-bulk-api-query-params]]
199
+ ==== {api-query-parms-title}
200
+
201
+ include::{docdir}/rest-api/common-parms.asciidoc[tag=pipeline]
202
+
203
+ include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
204
+
205
+ include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
206
+
207
+ include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
208
+
209
+ include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
210
+
211
+ include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
212
+
213
+ include::{docdir}/rest-api/common-parms.asciidoc[tag=timeout]
214
+
215
+ include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
216
+
217
+ [[docs-bulk-api-example]]
218
+ ==== {api-examples-title}
71
219
72
220
[source,console]
73
221
--------------------------------------------------
@@ -81,7 +229,7 @@ POST _bulk
81
229
{ "doc" : {"field2" : "value2"} }
82
230
--------------------------------------------------
83
231
84
- The result of this bulk operation is :
232
+ The API returns the following result :
85
233
86
234
[source,console-result]
87
235
--------------------------------------------------
@@ -171,85 +319,9 @@ The result of this bulk operation is:
171
319
// TESTRESPONSE[s/"_seq_no" : 3/"_seq_no" : $body.items.3.update._seq_no/]
172
320
// TESTRESPONSE[s/"_primary_term" : 4/"_primary_term" : $body.items.3.update._primary_term/]
173
321
174
- The endpoints are `/_bulk` and `/{index}/_bulk`. When the index is provided, it
175
- will be used by default on bulk items that don't provide it explicitly.
176
-
177
- A note on the format. The idea here is to make processing of this as
178
- fast as possible. As some of the actions will be redirected to other
179
- shards on other nodes, only `action_meta_data` is parsed on the
180
- receiving node side.
181
-
182
- Client libraries using this protocol should try and strive to do
183
- something similar on the client side, and reduce buffering as much as
184
- possible.
185
-
186
- The response to a bulk action is a large JSON structure with the individual
187
- results of each action that was performed in the same order as the actions that
188
- appeared in the request. The failure of a single action does not affect the
189
- remaining actions.
190
-
191
- There is no "correct" number of actions to perform in a single bulk
192
- call. You should experiment with different settings to find the optimum
193
- size for your particular workload.
194
-
195
- If using the HTTP API, make sure that the client does not send HTTP
196
- chunks, as this will slow things down.
197
-
198
- [float]
199
- [[bulk-optimistic-concurrency-control]]
200
- ==== Optimistic Concurrency Control
201
-
202
- Each `index` and `delete` action within a bulk API call may include the
203
- `if_seq_no` and `if_primary_term` parameters in their respective action
204
- and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
205
- how operations are executed, based on the last modification to existing
206
- documents. See <<optimistic-concurrency-control>> for more details.
207
-
208
-
209
- [float]
210
- [[bulk-versioning]]
211
- ==== Versioning
212
-
213
- Each bulk item can include the version value using the
214
- `version` field. It automatically follows the behavior of the
215
- index / delete operation based on the `_version` mapping. It also
216
- support the `version_type` (see <<index-versioning, versioning>>).
217
-
218
- [float]
219
- [[bulk-routing]]
220
- ==== Routing
221
-
222
- Each bulk item can include the routing value using the
223
- `routing` field. It automatically follows the behavior of the
224
- index / delete operation based on the `_routing` mapping.
225
-
226
- [float]
227
- [[bulk-wait-for-active-shards]]
228
- ==== Wait For Active Shards
229
-
230
- When making bulk calls, you can set the `wait_for_active_shards`
231
- parameter to require a minimum number of shard copies to be active
232
- before starting to process the bulk request. See
233
- <<index-wait-for-active-shards,here>> for further details and a usage
234
- example.
235
-
236
- [float]
237
- [[bulk-refresh]]
238
- ==== Refresh
239
-
240
- Control when the changes made by this request are visible to search. See
241
- <<docs-refresh,refresh>>.
242
-
243
- NOTE: Only the shards that receive the bulk request will be affected by
244
- `refresh`. Imagine a `_bulk?refresh=wait_for` request with three
245
- documents in it that happen to be routed to different shards in an index
246
- with five shards. The request will only wait for those three shards to
247
- refresh. The other two shards that make up the index do not
248
- participate in the `_bulk` request at all.
249
-
250
322
[float]
251
323
[[bulk-update]]
252
- ==== Update
324
+ ===== Bulk update example
253
325
254
326
When using the `update` action, `retry_on_conflict` can be used as a field in
255
327
the action itself (not in the extra payload line), to specify how many
@@ -276,13 +348,3 @@ POST _bulk
276
348
--------------------------------------------------
277
349
// TEST[continued]
278
350
279
- [float]
280
- [[bulk-security]]
281
- ==== Security
282
-
283
- See <<url-access-control>>.
284
-
285
- [float]
286
- [[bulk-partial-responses]]
287
- ==== Partial responses
288
- To ensure fast responses, the bulk API will respond with partial results if one or more shards fail. See <<shard-failures, Shard failures>> for more information.
0 commit comments