Skip to content

Commit 5040bae

Browse files
committed
[ML] Fix race condition between job opening and feature reset
There was a point during the job opening sequence where performing a feature reset could hang. This happened when the kill request issued by feature reset was executed after the job's persistent task was assigned but before the job's native process was started. The persistent task was incorrectly left running in this situation, yet the job opening sequence was aborted which meant the subsequent close request issued by feature reset would wait for a very long time for the persistent task to disappear. The fix is to make the kill process request cancel the persistent task consistently based on its request parameters and not on the current state of the task. Fixes elastic#74141
1 parent a01ba7f commit 5040bae

File tree

1 file changed

+24
-3
lines changed
  • x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/process/autodetect

1 file changed

+24
-3
lines changed

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/process/autodetect/ProcessContext.java

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,14 @@ boolean shouldBeKilled() {
5252
}
5353

5454
void killIt() {
55+
// This method should only be called for a process that's known to be connected
56+
if (autodetectCommunicator == null) {
57+
throw new IllegalArgumentException("Unable to kill job as its communicator is not connected");
58+
}
5559
if (latestKillRequest == null) {
5660
throw new IllegalArgumentException("Unable to kill job as previous request is not completed");
5761
}
58-
latestKillRequest.kill();
62+
latestKillRequest.killConnectedProcess(false);
5963
}
6064

6165
ProcessStateName getState() {
@@ -128,10 +132,27 @@ KillBuilder setShouldFinalizeJob(boolean shouldFinalizeJob) {
128132
}
129133

130134
void kill() {
131-
if (autodetectCommunicator == null) {
135+
if (autodetectCommunicator != null) {
136+
killConnectedProcess(finish);
137+
} else {
132138
latestKillRequest = this;
133-
return;
139+
// Killing a connected process would also complete the persistent task if `finish` was true,
140+
// so we should do the same here even though the process wasn't yet connected at the time of
141+
// the kill
142+
if (finish) {
143+
jobTask.markAsCompleted();
144+
}
134145
}
146+
}
147+
148+
/**
149+
* @param finish This argument overrides the member variable of the same name. Sometimes this method is called
150+
* immediately after a process is started, if it was requested to be killed before it started,
151+
* and in this situation the persistent task has already been completed if this was desired.
152+
*/
153+
private void killConnectedProcess(boolean finish) {
154+
assert autodetectCommunicator != null;
155+
135156
String jobId = jobTask.getJobId();
136157

137158
if (silent == false) {

0 commit comments

Comments
 (0)