Skip to content

Store reindexing result in reindex index #45260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 14, 2019

Conversation

Tim-Brooks
Copy link
Contributor

Currently the result of a reindex persistent task is propogated and
stored in the cluster state. This commit changes this so that only the
ephemeral task-id, headers, and reindex state is store in the cluster
state. Any result (exception or response) is stored in the reindex
index.

Relates to #42612.

@Tim-Brooks Tim-Brooks added >non-issue v8.0.0 :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down labels Aug 6, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor Author

@ywelsch @henningandersen - This work has exposed a few issues. Currently BulkByScrollResponse serializes some failures/search failures using ElasticsearchException#generateThrowableXContent. When you deserialize the response, it does not preserve the exception types. The exception types are necessary for the rest layer to return an appropriate status code.

Any thought how I should go about fixing this?

@Tim-Brooks
Copy link
Contributor Author

I could serialize the response as a raw byte field using the internal transport serialization? Or I could add the status codes as a field in the x content?

@ywelsch
Copy link
Contributor

ywelsch commented Aug 8, 2019

We discussed this yesterday and decided to go with adding the RestStatus to ScrollableHitSource.SearchFailure, similar as was done for BulkItemResponse.Failure

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks @tbrooks8. I added a few smaller comments and then a couple to handle in follow-ups.

assert (reindexResponse == null) || (jobException == null) : "Either response or exception must be null";
this.reindexResponse = reindexResponse;
this.jobException = jobException;
this.status = status;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that status is != null? Mostly to ensure that the isDone() method cannot return true on null.

});
}

public void createReindexTaskDoc(String taskId, ReindexTaskIndexState reindexState, boolean indexExists,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to make this method private and instead add a method that does not have the indexExists flag. The ReindexClient should then also receive the ClusterService in its constructor, enabling it to make the check on whether the index exists itself.

updateClusterStateToFailed(shouldStoreResult, ReindexJobState.Status.FAILED_TO_WRITE_TO_REINDEX_INDEX, ex);
}
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: superfluous newline.

TaskManager taskManager = getTaskManager();
assert taskManager != null : "TaskManager should have been set before reindex started";

updatePersistentTaskState(new ReindexJobState(taskId, null, wrapException(ex)), new ActionListener<>() {
ReindexTaskIndexState reindexState = new ReindexTaskIndexState(reindexRequest, response, null);
reindexIndexClient.updateReindexTaskDoc(getPersistentTaskId(), reindexState, new ActionListener<>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now store the result in to two places (3 if we include the .tasks index), both the index and persistent cluster state. I think we risk storing the index state and then dying - and waking up to not be able to read the index. We would then have the index indicate success and the task indicate failure. Also, we risk someone seeing that it succeeded in the index and then afterwards it failed (this I think we cannot guarantee 100% but can likely handle better). I think resolving these things are definitely outside the scope of this PR, but wanted to mention it here for awareness.

}

private void sendStartedNotification(boolean shouldStoreResult, Runnable listener) {
updatePersistentTaskState(new ReindexJobState(taskId, null, null), new ActionListener<>() {
private void updateClusterStateToStarted(boolean shouldStoreResult, Runnable listener) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this name is the best name? I think the status would already be STARTED? Or is there a null status initially? I lean towards preferring the old sendStartedNotification name.

} else {
listener.onFailure(reindexState.getException());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we then delete the persistent task, given that we responded and everything is done? Probably to be done in a follow-up rather than in this PR.

@@ -128,11 +115,30 @@ private void waitForReindexDone(String taskId, ActionListener<StartReindexJobAct
@Override
public void onResponse(PersistentTasksCustomMetaData.PersistentTask<ReindexJob> task) {
ReindexJobState state = (ReindexJobState) task.getState();
if (state.getJobException() == null) {
listener.onResponse(new StartReindexJobAction.Response(taskId, state.getReindexResponse()));
if (state.getStatus() == ReindexJobState.Status.FAILED_TO_READ_FROM_REINDEX_INDEX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for a follow-up: it could be legitimate that we cannot read from the reindex index, for instance in a full system restart situation. I wonder if we should add some kind of retries with backoff to handle this?

@Tim-Brooks Tim-Brooks merged commit 9c8143f into elastic:reindex_v2 Aug 14, 2019
@Tim-Brooks Tim-Brooks deleted the store_response_in_index branch December 18, 2019 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down >non-issue v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants