-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
Description
At present the ML UI has functionality to calculate a rough estimate of the model memory requirement for certain types of anomaly detection jobs. However, it doesn't cover all detector functions and doesn't cover population jobs.
The ML API in Elasticsearch should provide an endpoint that encapsulates the various formulas, can be extended to cover all possible configurations, and can be kept up to date when model sizes change.
The inputs to this endpoint will be:
- An
analysis_config, in the same format as would be provided to the create job endpoint - documented in https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-put-job.html#ml-put-job-path-parms - Overall cardinalities for the
by,overandpartitionfields - Max bucket cardinalities for
influencerfields that are not alsoby,overorpartitionfields
An example of the proposed request format is:
POST _ml/anomaly_detectors/_estimate_model_memory
{
"analysis_config": {
"bucket_span": "10m",
"detectors": [
{
"function": "sum",
"field_name": "bytes",
"partition_field_name": "src_ip"
}
],
"influencers": [ "src_ip", "dest_ip" ]
},
"overall_cardinality": {
"src_ip": 567483
},
"max_bucket_cardinality": {
"dest_ip": 7456
}
}
An example of the proposed response format is:
{
"model_memory_estimate": "836mb"
}
peteharverson, jgowdyelastic and darnautov