Skip to content

[ML] Datafeed does not start when allow_lazy_open is enabled #53763

@sophiec20

Description

@sophiec20

Found in 7.7.0-SNAPSHOT "build_hash" : "2f0aca992bb8c91c17603050807891cad2e41483", "build_date" : "2020-03-16T02:52:34.086738Z",

  • 3 node cluster, all nodes acting as data, master and ml
  • All nodes are co-located on the same 16GB VM
  • "xpack.ml.max_machine_memory_percent" : 16

I have a script that creates 16 jobs in succession. Each job requires 2GB model memory.

The first 3 jobs open and the datafeeds start.
The 4th job returns opened:false and the datafeed fails to start with the following:

open job        {"opened":false}
start datafeed  {"error":{"root_cause":[{"type":"status_exception","reason":"Could not start datafeed, allocation explanation []"}],"type":"status_exception","reason":"Could not ...

In the job list, the job state is opening and the datafeed state is stopped. No errors are visible.

As one of the first 3 jobs completes, one of the opening jobs transitions its state to opened. However the datafeed remains stopped.

These are the job messages for a job that was lazy opening.
image

Expected behavior would be for the datafeed to be starting and for it to start once resource became available (which would happen when one of the other jobs closed, in this scenario).

Once jobs have completed, I can manually start the datafeed on one of the opened jobs and it will complete without on-screen errors. (I cannot start one of the opening jobs, which is to be expected.)

Metadata

Metadata

Assignees

Labels

:mlMachine learning>bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions