Set the chunk size in the bulk helper based in bytes #199

pmajmudar · 2015-02-17T15:54:56Z

Hi,

Is it possible to have an option to set the chunk size in bytes for the bulk indexing helper?

The reason is that we want to bulk index documents, but our documents are not all necessarily the same size. Therefore if we have a mix of document lengths, it makes more sense to bulk index in chunks of bytes.

E.g. I might set the chunk size to 1MB (this could be 100 documents of 10K, or 5 documents of 200k).

This would prevent us from encountering memory issues with Elasticsearch.

Is this an enhancement you would consider?

Thanks,

Prash

honzakral · 2015-02-17T16:39:25Z

This is definitely something I'd consider. The only reason why I didn't include this from the start is that I was trying to find a better way to deal with the serialization - right now it requires accessing client.transport.serializer. But I guess that is ok.

If you want to take a stab at it go ahead, otherwise I am happy to implement it myself.

friedmans · 2015-07-29T23:35:44Z

In addition, helpers.streaming_bulk() blindly tries to post the entire chunk's worth of data regardless of the maximum size ES will accept. (I am occasionally getting org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: HTTP content length exceeded 104857600 bytes.). By chunking on bytes, streaming_bulk() would never issue a call to client.bulk() that would raise this exception.

kpanic · 2015-09-10T19:01:23Z

@honzakral I would like to give it a try. chunk_size='100mb' / chunk_size=100 ?
or a different param like bulk_size=100 (in mb?) -- your thoughts?

honzakral · 2015-09-10T19:16:17Z

I think max_chunk_bytes would be a good name. Then the chunk should be at most that size and contain at most bulk_size documents

Fixes elastic#199

honzakral added the Category: Enhancement label Feb 17, 2015

honzakral mentioned this issue Aug 26, 2015

Error while pushing data in a bulk. #263

Closed

honzakral closed this as completed in 3400179 Sep 30, 2015

rciorba added a commit to rciorba/elasticsearch-py that referenced this issue Mar 2, 2018

Allos the size of the bulk request to be defined in bytes

5feb7bf

Fixes elastic#199

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set the chunk size in the bulk helper based in bytes #199

Set the chunk size in the bulk helper based in bytes #199

pmajmudar commented Feb 17, 2015

honzakral commented Feb 17, 2015

friedmans commented Jul 29, 2015

kpanic commented Sep 10, 2015

honzakral commented Sep 10, 2015

Set the chunk size in the bulk helper based in bytes #199

Set the chunk size in the bulk helper based in bytes #199

Comments

pmajmudar commented Feb 17, 2015

honzakral commented Feb 17, 2015

friedmans commented Jul 29, 2015

kpanic commented Sep 10, 2015

honzakral commented Sep 10, 2015