Skip to content

BulkProcessor no error when dynamic is strict #55043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
moifort opened this issue Apr 10, 2020 · 11 comments
Closed

BulkProcessor no error when dynamic is strict #55043

moifort opened this issue Apr 10, 2020 · 11 comments
Assignees
Labels
>docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@moifort
Copy link

moifort commented Apr 10, 2020

Elasticsearch version (bin/elasticsearch --version): docker image elasticsearch:7.4.2

Plugins installed: [] default

JVM version (java -version): docker image elasticsearch:7.4.2

OS version (uname -a if on a Unix-like system): docker image elasticsearch:7.4.2

elasticsearch-rest-high-level-client : 7.6.2 on JDK 11

Description of the problem including expected versus actual behavior:

When using dynamic: strict mapping setting:

  • bulk processor return success instead of error when data to save has new field.
  • no data is store on Elasticsearch.

Steps to reproduce:

mapping.json

{
  "dynamic": "strict",
  "properties": {
    "id": {
      "type": "long"
    },
    "name": {
      "type": "text",
      "analyzer": "french_light",
      "boost": 10
    }
}

storeData.java

class StoreData {  
  private final RestHighLevelClient client;

  public seed() {
    try (var bulkProcessor = BulkProcessor
            .builder(client::bulkAsync, new BulkListener())
            .setBulkActions(500)
            .setConcurrentRequests(0)
            .setGlobalIndex("myIndex")
            .build()) {
            productDao.listIndexable(product -> {
                var indexProduct = new IndexRequest()
                    .id(product.getId().toString())
                    .source(elasticMapper.toIndexableProductViewJson(product), XContentType.JSON);
                bulkProcessor.add(indexProduct);
            });
        }
   }

  public void bulkAsync(BulkRequest request, ActionListener<BulkResponse> bulkListener) {
        client.bulkAsync(request, RequestOptions.DEFAULT, bulkListener);
    }

}

bulkListener.java

public class BulkListener implements BulkProcessor.Listener {

        @Override
        public void beforeBulk(long executionId, BulkRequest request) {
            var numberOfActions = request.numberOfActions();
            LOG.trace("Executing bulk [{}] with {} requests", executionId, numberOfActions);
        }

        @Override
        public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
            LOG.trace("Bulk [{}] completed in {} milliseconds", executionId, response.getTook().getMillis());
        }

        @Override
        public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
            LOG.error("Failed to execute bulk", failure);
        }
    }

Provide logs (if relevant):

When I call StoreData.seed() with field that is no defined on mapping.json

log

[2020-04-10 07:45:35,522] [elastic-product-indexer-2-thread-1] INFO  o.m.i.e.ElasticClient$BulkListener - Executing bulk [1] with 500 requests
[2020-04-10 07:45:37,236] [I/O dispatcher 1] DEBUG o.e.c.RestClient - request [POST http://localhost:9200/_bulk?timeout=1m] returned [HTTP/1.1 200 OK]
[2020-04-10 07:45:37,649] [I/O dispatcher 1] INFO  o.m.i.e.ElasticClient$BulkListener - Bulk [1] completed in 2106 milliseconds
[2020-04-10 07:45:37,750] [elastic-product-indexer-2-thread-1] INFO  o.m.i.e.ElasticClient$BulkListener - Executing bulk [2] with 500 requests
[2020-04-10 07:45:39,428] [I/O dispatcher 1] DEBUG o.e.c.RestClient - request [POST http://localhost:9200/_bulk?timeout=1m] returned [HTTP/1.1 200 OK]
[2020-04-10 07:45:39,445] [I/O dispatcher 1] INFO  o.m.i.e.ElasticClient$BulkListener - Bulk [2] completed in 1694 milliseconds
[2020-04-10 07:45:39,461] [elastic-product-indexer-2-thread-1] INFO  o.m.i.e.ElasticClient$BulkListener - Executing bulk [3] with 184 requests
[2020-04-10 07:45:40,164] [I/O dispatcher 1] DEBUG o.e.c.RestClient - request [POST http://localhost:9200/_bulk?timeout=1m] returned [HTTP/1.1 200 OK]
[2020-04-10 07:45:40,170] [I/O dispatcher 1] INFO  o.m.i.e.ElasticClient$BulkListener - Bulk [3] completed in 708 milliseconds

kibana

GET myIndex/_search

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
@jtibshirani jtibshirani added :Core/Features/Java High Level REST Client :Search Foundations/Mapping Index mappings, including merging and defining field types labels Apr 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Java High Level REST Client)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@jtibshirani
Copy link
Contributor

Thanks @moifort for raising this. To help us with debugging, would you be able to provide a single example document where you expect to see an error, but you see a successful response instead?

@moifort
Copy link
Author

moifort commented Apr 10, 2020

mapping.json

{
  "dynamic": "strict",
  "properties": {
    "id": {
      "type": "long"
    },
    "name": {
      "type": "text",
    }
}

Bulk success but no data in Elasticsearch

Documents below do not respect mapping

[
  {
    "id" : 1,
    "name" : "name1-s5",
    "summaryHtml" : "the-summary1-s5",
    "categoryId" : 10004
  },
  {
    "id" : 2,
    "name" : "name2-s5",
    "summaryHtml" : "the-summary2-s5",
    "categoryId" : 10004
  },
  {
    "id" : 3,
    "name" : "name3-s5",
    "summaryHtml" : "the-summary3-s5",
    "categoryId" : 10004
  }
]

Bulk success and data are stored in Elasticsearch

[
  {
    "id" : 1,
    "name" : "name1-s5"
  },
  {
    "id" : 2,
    "name" : "name2-s5"
  },
  {
    "id" : 3,
    "name" : "name3-s5"
  }
]

@cbuescher cbuescher self-assigned this Apr 15, 2020
@cbuescher
Copy link
Member

@moifort thanks for the reproduction. I think the behaviour is expected. Note that in your example you are expecting an error for the whole bulk request if some or all of the operations fail, but that's not how the bulk API works. Please note that if you send a "_bulk" request via REST api with one document that should succeed (has no unknown fields that would cause the "strict" setting to error) and another that should fail like this:

POST /index/_bulk
{ "index" : { "_index" : "index", "_id" : "1" } }
{ "id" : "1", "name" :  "name1-s1"}
{ "index" : { "_index" : "index", "_id" : "2" } }
{ "id" : "2", "name" :  "name1-s5", "summaryHtml" : "the-summary2-s5", "categoryId" : 10004}

the response contains the top-level error property set to true to indicate that there have been errors in some bulk actions and the failed action contains a 400 status code and an error message.

{
  "took" : 27,
  "errors" : true,
  "items" : [
    {
      "index" : {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 200
      }
    },
    {
      "index" : {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "2",
        "status" : 400,
        "error" : {
          "type" : "strict_dynamic_mapping_exception",
          "reason" : "mapping set to strict, dynamic introduction of [summaryHtml] within [_doc] is not allowed"
        }
      }
    }
  ]
}

I believe that with the Java client you can check those properties with BulkResponse#hasFailures(), BulkResponse#getItems() and `BulkItemResponse#getFailureMessage()" respectively. Let me know if this solves your needs.

I will check our documentation if we have examples for this in the REST Api docs and the High Level Client docs because I can imagine this being a more frequent ask, maybe we should add a few lines and/or an example to the docs.

@cbuescher
Copy link
Member

Re: the docs situation: at least the High Level Rest Client documentation has a section and examples about operations with failures in the section about the BulkResponse.
We might want to add some small example about the errors flag in the response and in the items to the REST API docs as well, wdyt @jrodewig?

@cbuescher cbuescher added the >docs General docs changes label Apr 15, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@jrodewig
Copy link
Contributor

Thanks @cbuescher. I agree.

We do mention failures in the REST Bulk API docs:

The response to a bulk action is a large JSON structure with the individual results of each action performed, in the same order as the actions that appeared in the request. The failure of a single action does not affect the remaining actions.

However, it doesn't look like we document the error type or reason or include any example responses with the errors flag of true. Adding those would be helpful. I've created #55237 to track and will take care of the changes.

@cbuescher
Copy link
Member

Thanks @jrodewig, that should cover the documentation side.
@moifort I will close this issue then since the behaviour is intended, please re-open if you object or have any further questions. For general usage questions please also try the support forums over at https://discuss.elastic.co.

@moifort
Copy link
Author

moifort commented Apr 15, 2020

Thank you all, for your reactivity and answers!

Indeed, I should test BulkResponse in my BulkListener! With the below implementation it's work 🥳 !

BulkListener.java

   @Slf4j
   public class BulkListener implements BulkProcessor.Listener {

       @Override
       public void beforeBulk(long executionId, BulkRequest request) {
           var numberOfActions = request.numberOfActions();
           LOG.trace("Executing bulk [{}] with {} requests", executionId, numberOfActions);
       }

       @Override
       public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
           LOG.trace("Bulk [{}] completed in {} milliseconds", executionId, response.getTook().getMillis());
           if (response.hasFailures()) { // Catch the strict error
               LOG.error(response.buildFailureMessage());
           }
       }

       @Override
       public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
           LOG.error("Failed to execute bulk", failure);
       }
   }

log

index [local_product_1586964940916], type [_doc], id [1392], message [ElasticsearchException[Elasticsearch exception [type=strict_dynamic_mapping_exception, reason=mapping set to strict, dynamic introduction of [id] within [_doc] is not allowed]]]
[263]: index [local_product_1586964940916], type [_doc], id [1393], message [ElasticsearchException[Elasticsearch exception [type=strict_dynamic_mapping_exception, reason=mapping set to strict, dynamic introduction of [id] within [_doc] is not allowed]]]
[264]: index [local_product_1586964940916], type [_doc], id [1394], message [ElasticsearchException[Elasticsearch exception [type=strict_dynamic_mapping_exception, reason=mapping set to strict, dynamic introduction of [id] within [_doc] is not allowed]]]
[265]: index [local_product_1586964940916], type [_doc], id [1398], message [ElasticsearchException[Elasticsearch exception [type=strict_dynamic_mapping_exception, reason=mapping set to strict, dynamic introduction of [id] within [_doc] is not allowed]]]
[266]: index [local_product_1586964940916], type [_doc], id [1399], message [ElasticsearchException[Elasticsearch exception [type=strict_dynamic_mapping_exception, reason=mapping set to strict, dynamic introduction of [id] within [_doc] is not allowed]]]
...

@jrodewig where I can make a PR to update the sample and the documentation?

@jrodewig
Copy link
Contributor

Hi @moifort

A PR would be most welcome!

The source file for the REST Bulk API docs is here:
https://github.com/elastic/elasticsearch/edit/master/docs/reference/docs/bulk.asciidoc

However, please note that the REST API docs are separate from the Java HLRC docs. The Java HLRC docs already include documentation for bulkResponse.hasFailures:
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-document-bulk.html#java-rest-high-document-bulk-response

If you'd like to raise a PR for the Java HLRC docs, the source file is here:
https://github.com/elastic/elasticsearch/edit/master/docs/java-rest/high-level/document/bulk.asciidoc

I'm glad to hear you were able to resolve the issue!

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

6 participants