-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Found in 7.7.0-SNAPSHOT "build_hash" : "2f0aca992bb8c91c17603050807891cad2e41483", "build_date" : "2020-03-16T02:52:34.086738Z",
- 3 node cluster, all nodes acting as data, master and ml
- All nodes are co-located on the same 16GB VM
"xpack.ml.max_machine_memory_percent" : 16
I have a script that creates 16 jobs in succession. Each job requires 2GB model memory.
The first 3 jobs open and the datafeeds start.
The 4th job returns opened:false and the datafeed fails to start with the following:
open job {"opened":false}
start datafeed {"error":{"root_cause":[{"type":"status_exception","reason":"Could not start datafeed, allocation explanation []"}],"type":"status_exception","reason":"Could not ...
In the job list, the job state is opening and the datafeed state is stopped. No errors are visible.
As one of the first 3 jobs completes, one of the opening jobs transitions its state to opened. However the datafeed remains stopped.
These are the job messages for a job that was lazy opening.

Expected behavior would be for the datafeed to be starting and for it to start once resource became available (which would happen when one of the other jobs closed, in this scenario).
Once jobs have completed, I can manually start the datafeed on one of the opened jobs and it will complete without on-screen errors. (I cannot start one of the opening jobs, which is to be expected.)