Skip to content

Can not Reindex from remote even with 1 batch size - Remote responded with a chunk that was too large. Use a smaller batch size. #73261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
phinzin opened this issue May 20, 2021 · 7 comments
Labels
>bug :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down feedback_needed Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@phinzin
Copy link

phinzin commented May 20, 2021

Elasticsearch version (bin/elasticsearch --version): old: 5.6.3 new: 6.8.9

Plugins installed: None

JVM version (java -version): Open JDK 1.8

OS version (uname -a if on a Unix-like system): window 10

Description of the problem including expected versus actual behavior:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Remote responded with a chunk that was too large. Use a smaller batch size."
}
],
"type": "illegal_argument_exception",
"reason": "Remote responded with a chunk that was too large. Use a smaller batch size.",
"caused_by": {
"type": "content_too_long_exception",
"reason": "entity content is too long [255064703] for the configured buffer limit [104857600]"
}
},
"status": 400
}

Steps to reproduce:

Please include a minimal but complete recreation of the problem,
including (e.g.) index creation, mappings, settings, query etc. The easier
you make for us to reproduce it, the more likely that somebody will take the
time to look at it.

  1. Reindex with batch size 1
    POST http://dest_host:9200/_reindex
    {
    "source": {
    "remote": {
    "host": "source_host:9200"
    },
    "index": "index name",
    "size": 5000
    },
    "dest": {
    "index": "index name"
    }
    }

Provide logs (if relevant):

@phinzin phinzin added >bug needs:triage Requires assignment of a team area label labels May 20, 2021
@pgomulka
Copy link
Contributor

I don't think we would be able to reproduce this with so little information
can you share the logs? the stacktrace from the cluster? elasticserach yaml configuration?
mappings from the source and dest indices?

@pgomulka pgomulka added :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down feedback_needed labels May 24, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@pgomulka pgomulka removed the needs:triage Requires assignment of a team area label label May 24, 2021
@nik9000
Copy link
Member

nik9000 commented Jun 15, 2021

If the document is very very very large reindex won't be able to store it in memory. It looks like the response from the remote host was 250mb. You probably should exclude that document. We probably should change the error message. In the past we'd resisted making the buffer size configurable. I wonder if it'd make sense to do so now. Or make it a portion of the heap or something. That buffer is part of the apache async http client - I don't think we could easily move it off heap.

@henningandersen
Copy link
Contributor

@phinzin in your reproduction example (and title of this issue), you mention:

Reindex with batch size 1

But the batch size is set to 5000:

"size": 5000

did you try it out with "size": 1 too?

@phinzin
Copy link
Author

phinzin commented Jun 24, 2021

@phinzin in your reproduction example (and title of this issue), you mention:

Reindex with batch size 1

But the batch size is set to 5000:

"size": 5000

did you try it out with "size": 1 too?

@henningandersen yes i tried already, the only thing strange here is that even i changed size to 1,
still same error message i got with same number "entity content is too long [255064703] for the configured buffer limit [104857600]"

@henningandersen
Copy link
Contributor

@phinzin is a document of this size normal for your deployment? Would you perhaps be willing to find that document and figure out how it got in to ES to begin with?

For one, our http handler has a default limit of 100MB, so plain indexing such a document should fail unless that setting has been tweaked.

It could surely be updated enough times to grow that big but I would be interested in the background here before we decide on whether and/or how we will want to address this.

@phinzin
Copy link
Author

phinzin commented Feb 4, 2025

some index contains incorrect data format and causing the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down feedback_needed Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

5 participants