You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ML] Fix possible race condition when closing an opening job (#42506)
This change fixes a race condition that would result in an
in-memory data structure becoming out-of-sync with persistent
tasks in cluster state.
If repeated often enough this could result in it being
impossible to open any ML jobs on the affected node, as the
master node would think the node had capacity to open another
job but the chosen node would error during the open sequence
due to its in-memory data structure being full.
The race could be triggered by opening a job and then closing
it a tiny fraction of a second later. It is unlikely a user
of the UI could open and close the job that fast, but a script
or program calling the REST API could.
The nasty thing is, from the externally observable states and
stats everything would appear to be fine - the fast open then
close sequence would appear to leave the job in the closed
state. It's only later that the leftovers in the in-memory
data structure might build up and cause a problem.
Copy file name to clipboardExpand all lines: x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/process/autodetect/AutodetectProcessManager.java
+19-15Lines changed: 19 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -455,16 +455,12 @@ protected void doRun() {
455
455
logger.debug("Aborted opening job [{}] as it has been closed", jobId);
456
456
return;
457
457
}
458
-
if (processContext.getState() != ProcessContext.ProcessStateName.NOT_RUNNING) {
459
-
logger.debug("Cannot open job [{}] when its state is [{}]",
0 commit comments