Skip to content

Commit e4275f3

Browse files
author
David Roberts
committed
[ML] Use utility thread pool for memory estimation (#62314)
The job comms thread pool is intended for the long-running job processes that do anomaly detection or data frame analytics and count towards job count and memory limits. This commit moves the short-lived memory estimation processes to the ML utility thread pool. Although this doesn't matter in most cases, at the limits of scale it could mean that memory estimations would get in the way of starting jobs, or would queue up for an excessive period of time while waiting for jobs to finish.
1 parent bf9651c commit e4275f3

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -706,7 +706,7 @@ public Collection<Object> createComponents(Client client, ClusterService cluster
706706
EsExecutors.allocatedProcessors(settings));
707707
MemoryUsageEstimationProcessManager memoryEstimationProcessManager =
708708
new MemoryUsageEstimationProcessManager(
709-
threadPool.generic(), threadPool.executor(MachineLearning.JOB_COMMS_THREAD_POOL_NAME), memoryEstimationProcessFactory);
709+
threadPool.generic(), threadPool.executor(UTILITY_THREAD_POOL_NAME), memoryEstimationProcessFactory);
710710
DataFrameAnalyticsConfigProvider dataFrameAnalyticsConfigProvider = new DataFrameAnalyticsConfigProvider(client, xContentRegistry,
711711
dataFrameAnalyticsAuditor);
712712
assert client instanceof NodeClient;
@@ -963,11 +963,15 @@ public List<ExecutorBuilder<?>> getExecutorBuilders(Settings settings) {
963963
// number of jobs per node.
964964

965965
// 4 threads per job process: for input, c++ logger output, result processing and state processing.
966+
// Only use this thread pool for the main long-running process associated with an anomaly detection
967+
// job or a data frame analytics job. (Using it for some other purpose could mean that an unrelated
968+
// job fails to start or that whatever needed the thread for another purpose has to queue for a very
969+
// long time.)
966970
ScalingExecutorBuilder jobComms = new ScalingExecutorBuilder(JOB_COMMS_THREAD_POOL_NAME,
967971
4, MAX_MAX_OPEN_JOBS_PER_NODE * 4, TimeValue.timeValueMinutes(1), "xpack.ml.job_comms_thread_pool");
968972

969-
// This pool is used by renormalization, plus some other parts of ML that
970-
// need to kick off non-trivial activities that mustn't block other threads.
973+
// This pool is used by renormalization, data frame analytics memory estimation, plus some other parts
974+
// of ML that need to kick off non-trivial activities that mustn't block other threads.
971975
ScalingExecutorBuilder utility = new ScalingExecutorBuilder(UTILITY_THREAD_POOL_NAME,
972976
1, MAX_MAX_OPEN_JOBS_PER_NODE * 4, TimeValue.timeValueMinutes(10), "xpack.ml.utility_thread_pool");
973977

0 commit comments

Comments
 (0)