Skip to content

Reindex - provide option to specify batch size in bytes #90195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mciricean opened this issue Sep 21, 2022 · 1 comment
Open

Reindex - provide option to specify batch size in bytes #90195

mciricean opened this issue Sep 21, 2022 · 1 comment
Labels
:Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@mciricean
Copy link

Description

The current reindex API provides only the batch size as number of documents in a batch, but documents size can vary and as of now i have to account for worst case scenario (getting largest docs in one batch) to be able to finish the reindex task.

size
{Optional, integer) The number of documents to index per batch. Use when indexing from remote to ensure that the batches fit within the on-heap buffer, which defaults to a maximum size of 100 MB.

By default _reindex uses scroll batches of 1000. You can change the batch size with the size field in the source element.

But I cannot say max batch size is 100MB for example.

I'm requesting to be able to specify the size in bytes for each batch. Also it should be limited to max 10% of heap to avoid exceptions by exceeding the limit when indexing.

@mciricean mciricean added >enhancement needs:triage Requires assignment of a team area label labels Sep 21, 2022
@gwbrown gwbrown added :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down and removed needs:triage Requires assignment of a team area label labels Sep 22, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

3 participants