Skip to content

Throttling support for reindex #17039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 22, 2016
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 9, 2016

The throttling is done between batches so that we can add the wait time to the scroll timeout.

@nik9000
Copy link
Member Author

nik9000 commented Mar 9, 2016

This throttling support doesn't include throttling a running reindex request. That seems like a wonderful feature but it doesn't need to live in this PR.

@nik9000 nik9000 added the review label Mar 9, 2016
@nik9000 nik9000 mentioned this pull request Mar 9, 2016
7 tasks
@@ -961,6 +961,14 @@ public XContentBuilder dateValueField(XContentBuilderString rawFieldName, XConte
return this;
}

public XContentBuilder timeValueField(String rawFieldName, String readableFieldName, TimeValue timeValue) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no need to duplicate the functionality here, this can call:

timeValueField(rawFieldName, readableFieldName, timeValue.millis(), TimeUnit.MILLIS);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or reverse them, so that a new TimeValue is only constructed if a readable rawTime is used)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you had it in the right order the first time.

@nik9000
Copy link
Member Author

nik9000 commented Mar 10, 2016

@dakrone I think this is ready for another round of review. Thanks for the review so far!

throttle the number of requests per second that the reindex issues. The
throttling is done between bulk batches so that it can manipulate the scroll
timeout. At this point it looks like it slightly over-throttles, producing
slightly lower requests per second than you asked for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid documentation with this sort of voice:

"At this point it looks like it slightly over-throttles, producing slightly lower requests per second than you asked for."

Perhaps instead:

"Ensures that requests never exceed the requests_per_second setting at the cost of potentially over-throttling."

@dakrone
Copy link
Member

dakrone commented Mar 10, 2016

LGTM, left another minor comment. Another thing I'd love to see is an explanation in the docs about exactly how the throttling is implemented so that people monitoring their clusters can understand what the expected behavior is. For example, discussing how the initial scroll is executed, then processing documents is delayed depending on <formula>.

I think it'd be beneficial to have in the documentation, but it's up to you if you don't think it should be there.

@nik9000
Copy link
Member Author

nik9000 commented Mar 10, 2016

@dakrone I've written the throttling docs a bit. Better?

@dakrone
Copy link
Member

dakrone commented Mar 10, 2016

Sure, I still don't care of using terms like "bursty" because they are harder to understand from a non-native English speaker's perspective (as are all colloquialisms)

@nik9000
Copy link
Member Author

nik9000 commented Mar 11, 2016

Since we're done talking about the code I'm going to squash and rebase with master.

@nik9000
Copy link
Member Author

nik9000 commented Mar 11, 2016

Rebase wasn't automatic but was mostly clean. Mostly just a matter of adding both sets of changes. In one case I had to add a timeout parameter to a new test to make it all compile. Small changes.

@nik9000
Copy link
Member Author

nik9000 commented Mar 11, 2016

Rebasing discovered some issues with headers and context. I'll add another test and fix.

@nik9000
Copy link
Member Author

nik9000 commented Mar 11, 2016

This is now blocked on #17077

@nik9000 nik9000 removed the v2.3.0 label Mar 16, 2016
@nik9000
Copy link
Member Author

nik9000 commented Mar 16, 2016

This is now blocked on #17077

And that is now merged. So we can have this now.

@nik9000
Copy link
Member Author

nik9000 commented Mar 18, 2016

@dakrone, do you want to re-review now that #17077 is in? Maybe just look at the last two commits?

@dakrone
Copy link
Member

dakrone commented Mar 18, 2016

LGTM

The throttle is applied when starting the next scroll request so that its
timeout can include the throttle time.
@nik9000 nik9000 merged commit da96b6e into elastic:master Mar 22, 2016
@lcawl lcawl added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants