From 862bb2f837cd0ec75da8b6bb22115feb965d1d2c Mon Sep 17 00:00:00 2001 From: Deb Adair Date: Wed, 2 Oct 2019 15:36:01 -0700 Subject: [PATCH 1/5] Reformats bulk API. --- docs/reference/docs/bulk.asciidoc | 296 ++++++++++++++++++------------ 1 file changed, 177 insertions(+), 119 deletions(-) diff --git a/docs/reference/docs/bulk.asciidoc b/docs/reference/docs/bulk.asciidoc index 2bf023045e38d..12bbd161e3548 100644 --- a/docs/reference/docs/bulk.asciidoc +++ b/docs/reference/docs/bulk.asciidoc @@ -1,28 +1,36 @@ [[docs-bulk]] === Bulk API +++++ +Bulk +++++ -The bulk API makes it possible to perform many index/delete operations -in a single API call. This can greatly increase the indexing speed. +Performs multiple indexing or delete operations in a single API call. +This reduces overhead and can greatly increase indexing speed. -.Client support for bulk requests -********************************************* - -Some of the officially supported clients provide helpers to assist with -bulk requests and reindexing of documents from one index to another: - -Perl:: +[source,console] +-------------------------------------------------- +POST _bulk +{ "index" : { "_index" : "test", "_id" : "1" } } +{ "field1" : "value1" } +{ "delete" : { "_index" : "test", "_id" : "2" } } +{ "create" : { "_index" : "test", "_id" : "3" } } +{ "field1" : "value3" } +{ "update" : {"_id" : "1", "_index" : "test"} } +{ "doc" : {"field2" : "value2"} } +-------------------------------------------------- - See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk] - and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll] +[[docs-bulk-api-request]] +==== {api-request-title} -Python:: +`POST /_bulk` +`POST //_bulk` - See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*] +[[docs-bulk-api-desc]] +==== {api-description-title} -********************************************* +Provides a way to perform multiple `index`, `create`, `delete`, and `update` actions in a single request. -The REST API endpoint is `/_bulk`, and it expects the following newline delimited JSON -(NDJSON) structure: +The actions are specified in the request body using a newline delimited JSON (NDJSON) structure: [source,js] -------------------------------------------------- @@ -36,19 +44,67 @@ optional_source\n -------------------------------------------------- // NOTCONSOLE -*NOTE*: The final line of data must end with a newline character `\n`. Each newline character -may be preceded by a carriage return `\r`. When sending requests to this endpoint the -`Content-Type` header should be set to `application/x-ndjson`. +The `index` and `create` actions expect a source on the next line, +and have the same semantics as the `op_type` parameter in the standard index API: +create fails if a document with the same name already exists in the index, +index adds or replaces a document as necessary. + +`update` expects that the partial doc, upsert, +and script and its options are specified on the next line. + +`delete` does not expect a source on the next line and +has the same semantics as the standard delete API. + +*NOTE*: The final line of data must end with a newline character `\n`. +Each newline character may be preceded by a carriage return `\r`. +When sending requests to the `_bulk` endpoint, + the `Content-Type` header should be set to `application/x-ndjson`. + +Because this format uses literal `\n`'s as delimiters, +make sure that the JSON actions and sources are not pretty printed. + +If you specify an index in the request URI, +it is used for any actions that don't explicitly specify an index. + +A note on the format. The idea here is to make processing of this as +fast as possible. As some of the actions are redirected to other +shards on other nodes, only `action_meta_data` is parsed on the +receiving node side. + +Client libraries using this protocol should try and strive to do +something similar on the client side, and reduce buffering as much as +possible. + +The response to a bulk action is a large JSON structure with +the individual results of each action performed, +in the same order as the actions that appeared in the request. +The failure of a single action does not affect the remaining actions. + +There is no "correct" number of actions to perform in a single bulk request. +Experiment with different settings to find the optimal size for your particular workload. + +When using the HTTP API, make sure that the client does not send HTTP chunks, +as this will slow things down. + +[float] +[[bulk-clients]] +===== Client support for bulk requests + +Some of the officially supported clients provide helpers to assist with +bulk requests and reindexing of documents from one index to another: + +Perl:: + + See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk] + and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll] + +Python:: + + See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*] -The possible actions are `index`, `create`, `delete`, and `update`. -`index` and `create` expect a source on the next -line, and have the same semantics as the `op_type` parameter to the -standard index API (i.e. create will fail if a document with the same -index exists already, whereas index will add or replace a -document as necessary). `delete` does not expect a source on the -following line, and has the same semantics as the standard delete API. -`update` expects that the partial doc, upsert and script and its options -are specified on the next line. +[float] +[[bulk-curl]] +===== Submitting bulk requests with cURL If you're providing text file input to `curl`, you *must* use the `--data-binary` flag instead of plain `-d`. The latter doesn't preserve @@ -65,9 +121,97 @@ $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk -- // NOTCONSOLE // Not converting to console because this shows how curl works -Because this format uses literal `\n`'s as delimiters, please be sure -that the JSON actions and sources are not pretty printed. Here is an -example of a correct sequence of bulk commands: +[float] +[[bulk-optimistic-concurrency-control]] +===== Optimistic Concurrency Control + +Each `index` and `delete` action within a bulk API call may include the +`if_seq_no` and `if_primary_term` parameters in their respective action +and meta data lines. The `if_seq_no` and `if_primary_term` parameters control +how operations are executed, based on the last modification to existing +documents. See <> for more details. + + +[float] +[[bulk-versioning]] +===== Versioning + +Each bulk item can include the version value using the +`version` field. It automatically follows the behavior of the +index / delete operation based on the `_version` mapping. It also +support the `version_type` (see <>). + +[float] +[[bulk-routing]] +===== Routing + +Each bulk item can include the routing value using the +`routing` field. It automatically follows the behavior of the +index / delete operation based on the `_routing` mapping. + +[float] +[[bulk-wait-for-active-shards]] +===== Wait For Active Shards + +When making bulk calls, you can set the `wait_for_active_shards` +parameter to require a minimum number of shard copies to be active +before starting to process the bulk request. See +<> for further details and a usage +example. + +[float] +[[bulk-refresh]] +===== Refresh + +Control when the changes made by this request are visible to search. See +<>. + +NOTE: Only the shards that receive the bulk request will be affected by +`refresh`. Imagine a `_bulk?refresh=wait_for` request with three +documents in it that happen to be routed to different shards in an index +with five shards. The request will only wait for those three shards to +refresh. The other two shards that make up the index do not +participate in the `_bulk` request at all. + +[float] +[[bulk-security]] +===== Security + +See <>. + +[float] +[[bulk-partial-responses]] +===== Partial responses +To ensure fast responses, the bulk API will respond with partial results if one or more shards fail. +See <> for more information. + +[[docs-bulk-api-path-params]] +==== {api-path-parms-title} + +``:: +(Optional, string) Name of the index to perform the bulk actions against. + +[[docs-bulk-api-query-params]] +==== {api-query-parms-title} + +include::{docdir}/rest-api/common-parms.asciidoc[tag=pipeline] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=routing] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=source] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=timeout] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards] + +[[docs-bulk-api-example]] +==== {api-examples-title} [source,console] -------------------------------------------------- @@ -81,7 +225,7 @@ POST _bulk { "doc" : {"field2" : "value2"} } -------------------------------------------------- -The result of this bulk operation is: +The API returns the following result: [source,console-result] -------------------------------------------------- @@ -171,85 +315,9 @@ The result of this bulk operation is: // TESTRESPONSE[s/"_seq_no" : 3/"_seq_no" : $body.items.3.update._seq_no/] // TESTRESPONSE[s/"_primary_term" : 4/"_primary_term" : $body.items.3.update._primary_term/] -The endpoints are `/_bulk` and `/{index}/_bulk`. When the index is provided, it -will be used by default on bulk items that don't provide it explicitly. - -A note on the format. The idea here is to make processing of this as -fast as possible. As some of the actions will be redirected to other -shards on other nodes, only `action_meta_data` is parsed on the -receiving node side. - -Client libraries using this protocol should try and strive to do -something similar on the client side, and reduce buffering as much as -possible. - -The response to a bulk action is a large JSON structure with the individual -results of each action that was performed in the same order as the actions that -appeared in the request. The failure of a single action does not affect the -remaining actions. - -There is no "correct" number of actions to perform in a single bulk -call. You should experiment with different settings to find the optimum -size for your particular workload. - -If using the HTTP API, make sure that the client does not send HTTP -chunks, as this will slow things down. - -[float] -[[bulk-optimistic-concurrency-control]] -==== Optimistic Concurrency Control - -Each `index` and `delete` action within a bulk API call may include the -`if_seq_no` and `if_primary_term` parameters in their respective action -and meta data lines. The `if_seq_no` and `if_primary_term` parameters control -how operations are executed, based on the last modification to existing -documents. See <> for more details. - - -[float] -[[bulk-versioning]] -==== Versioning - -Each bulk item can include the version value using the -`version` field. It automatically follows the behavior of the -index / delete operation based on the `_version` mapping. It also -support the `version_type` (see <>). - -[float] -[[bulk-routing]] -==== Routing - -Each bulk item can include the routing value using the -`routing` field. It automatically follows the behavior of the -index / delete operation based on the `_routing` mapping. - -[float] -[[bulk-wait-for-active-shards]] -==== Wait For Active Shards - -When making bulk calls, you can set the `wait_for_active_shards` -parameter to require a minimum number of shard copies to be active -before starting to process the bulk request. See -<> for further details and a usage -example. - -[float] -[[bulk-refresh]] -==== Refresh - -Control when the changes made by this request are visible to search. See -<>. - -NOTE: Only the shards that receive the bulk request will be affected by -`refresh`. Imagine a `_bulk?refresh=wait_for` request with three -documents in it that happen to be routed to different shards in an index -with five shards. The request will only wait for those three shards to -refresh. The other two shards that make up the index do not -participate in the `_bulk` request at all. - [float] [[bulk-update]] -==== Update +===== Bulk update example When using the `update` action, `retry_on_conflict` can be used as a field in the action itself (not in the extra payload line), to specify how many @@ -276,13 +344,3 @@ POST _bulk -------------------------------------------------- // TEST[continued] -[float] -[[bulk-security]] -==== Security - -See <>. - -[float] -[[bulk-partial-responses]] -==== Partial responses -To ensure fast responses, the bulk API will respond with partial results if one or more shards fail. See <> for more information. \ No newline at end of file From 3c70f7b8acc1be5777b5a2945e697d1afd93bb3f Mon Sep 17 00:00:00 2001 From: debadair Date: Thu, 3 Oct 2019 18:05:09 -0700 Subject: [PATCH 2/5] Update docs/reference/docs/bulk.asciidoc Co-Authored-By: James Rodewig --- docs/reference/docs/bulk.asciidoc | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/reference/docs/bulk.asciidoc b/docs/reference/docs/bulk.asciidoc index 12bbd161e3548..1928fb3b7b897 100644 --- a/docs/reference/docs/bulk.asciidoc +++ b/docs/reference/docs/bulk.asciidoc @@ -23,6 +23,7 @@ POST _bulk ==== {api-request-title} `POST /_bulk` + `POST //_bulk` [[docs-bulk-api-desc]] From 703c7e2d3c768ef30dfb4f3c1ad33d9348aaff79 Mon Sep 17 00:00:00 2001 From: debadair Date: Thu, 3 Oct 2019 18:05:27 -0700 Subject: [PATCH 3/5] Update docs/reference/docs/bulk.asciidoc Co-Authored-By: James Rodewig --- docs/reference/docs/bulk.asciidoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/reference/docs/bulk.asciidoc b/docs/reference/docs/bulk.asciidoc index 1928fb3b7b897..5ddad6fd90857 100644 --- a/docs/reference/docs/bulk.asciidoc +++ b/docs/reference/docs/bulk.asciidoc @@ -59,7 +59,8 @@ has the same semantics as the standard delete API. *NOTE*: The final line of data must end with a newline character `\n`. Each newline character may be preceded by a carriage return `\r`. When sending requests to the `_bulk` endpoint, - the `Content-Type` header should be set to `application/x-ndjson`. +the `Content-Type` header should be set to `application/x-ndjson`. +==== Because this format uses literal `\n`'s as delimiters, make sure that the JSON actions and sources are not pretty printed. From e2c0f5302029191cdc463cd74706abbabe61cfc0 Mon Sep 17 00:00:00 2001 From: debadair Date: Thu, 3 Oct 2019 18:07:10 -0700 Subject: [PATCH 4/5] Update docs/reference/docs/bulk.asciidoc Co-Authored-By: James Rodewig --- docs/reference/docs/bulk.asciidoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/reference/docs/bulk.asciidoc b/docs/reference/docs/bulk.asciidoc index 5ddad6fd90857..f25940206d0fd 100644 --- a/docs/reference/docs/bulk.asciidoc +++ b/docs/reference/docs/bulk.asciidoc @@ -56,7 +56,9 @@ and script and its options are specified on the next line. `delete` does not expect a source on the next line and has the same semantics as the standard delete API. -*NOTE*: The final line of data must end with a newline character `\n`. +[NOTE] +==== +The final line of data must end with a newline character `\n`. Each newline character may be preceded by a carriage return `\r`. When sending requests to the `_bulk` endpoint, the `Content-Type` header should be set to `application/x-ndjson`. From e07a85b851f3dd782ff39e87b686c2feee062ce2 Mon Sep 17 00:00:00 2001 From: debadair Date: Thu, 3 Oct 2019 18:07:55 -0700 Subject: [PATCH 5/5] Update docs/reference/docs/bulk.asciidoc Co-Authored-By: James Rodewig --- docs/reference/docs/bulk.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/docs/bulk.asciidoc b/docs/reference/docs/bulk.asciidoc index f25940206d0fd..6e6b61d73574f 100644 --- a/docs/reference/docs/bulk.asciidoc +++ b/docs/reference/docs/bulk.asciidoc @@ -70,7 +70,7 @@ make sure that the JSON actions and sources are not pretty printed. If you specify an index in the request URI, it is used for any actions that don't explicitly specify an index. -A note on the format. The idea here is to make processing of this as +A note on the format: The idea here is to make processing of this as fast as possible. As some of the actions are redirected to other shards on other nodes, only `action_meta_data` is parsed on the receiving node side.