-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Reindex search resiliency #45497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex search resiliency #45497
Conversation
Local reindex can now survive loosing data nodes that contain source data. The original query will be restarted with a filter for `_seq_no >= last_seq_no` when a failure is detected. Part of elastic#42612 and split out from elastic#43187
Pinging @elastic/es-distributed |
I will take a look at this tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly looking good to me. I don't see any issues with the approach or your design. It seems likely that the test failures are related to this change. Do you want to let me know when I should look again to officially approve?
@@ -135,7 +136,9 @@ | |||
this.listener = listener; | |||
BackoffPolicy backoffPolicy = buildBackoffPolicy(); | |||
bulkRetry = new Retry(BackoffPolicy.wrap(backoffPolicy, worker::countBulkRetry), threadPool); | |||
scrollSource = buildScrollableResultSource(backoffPolicy); | |||
// todo: this is trappy, since if a subclass override relies on subclass fields, they are not initialized. We should fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This was causing me problems in a recent PR.
…_search_resiliency
No longer retry/restart the original search request, since this is not what we used to do and it leads to long wait time to get the info back that a search request is bad.
Thanks @tbrooks8 and sorry for not fixing the test failure before now. It should be ready for another round (provided tests succeed). |
No longer fail on the empty index. So far we consider this a workaround/hack more than the solution.
Use unmapped_type to circumvent problem with newly created indices without a mapping, since this is local to reindex. Once type removal is complete, ensuring that the metadata fields are always available likely becomes easier, so we will defer a solution to the search/sort problem.
@elasticmachine run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Local reindex can now survive loosing data nodes that contain source
data. The original query will be restarted with a filter for
_seq_no >= last_seq_no
when a failure is detected.Part of #42612 and split out from #43187
A couple of follow-ups will be done (all indicated with todos) in addition to what is in the meta issue:
allowPartialSearchResult
in masterScrollableHitSource
separately rather than in super constructor.