Skip to content

[ML] Add a response mechanism to ML controller command processing #62823

@droberts195

Description

@droberts195

When the ML Java code needs to start one of the ML native processes (autodetect, normalize or data_frame_analyzer) it sends a command to the controller process telling it to spawn the required process. Currently the communications are one way only - the JVM sends a command to the controller and assumes it will be actioned immediately. There is no mechanism for the controller to respond when it has actioned the command. This seemed reasonable in the initial design because the controller is completely dedicated to starting and killing processes, and these were assumed to be very fast operations.

We have observed that when security software is running on a machine spawning a new process can take a very long time - over 20 seconds has been observed between the command being received in the controller and the resulting posix_spawn call returning. This invalidates the assumption that commands issued to controller by the JVM will be near instantaneous. It causes a problem because the timeout waiting for the named pipes to connect starts immediately after the command is issued, but the process may not actually start until considerably later.

Therefore, there is a need for controller to be able to report back to the ES JVM when each command sent to it has been actioned. Then the ES JVM should not try to connect the named pipes to a process until the controller has reported that it has actually spawned that process. This will mean that the configured timeout for connecting the named pipes is measured from a more appropriate point in time.

Metadata

Metadata

Assignees

Labels

:mlMachine learning

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions