[ML] Start gathering and storing inference stats #53429

benwtrent · 2020-03-11T19:00:29Z

This PR enables stats on inference to be gathered and stored in the .ml-stats-* indices.

Each node + model_id will have its own running stats document and these will later be summed together when returning _stats to the user.

.ml-stats-* is ILM managed (when possible). So, at any point the underlying index could change. This means that a stats document that is read in and then later updated will actually be a new doc in a new index. This complicates matters as this means that having a running knowledge of seq_no and primary_term is complicated and almost impossible. This is because we don't know the latest index name.

We should also strive for throughput, as this code sits in the middle of an ingest pipeline (or even a query).

elasticmachine · 2020-03-11T19:00:32Z

Pinging @elastic/ml-core (:ml)

benwtrent · 2020-03-11T19:09:05Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

+            ActionListener.wrap(
+                r -> this.loadModel(modelId, r),
+                e -> {
+                    logger.error("[{}] failed to get previous model stats", modelId);


We might actually want to fail completely if the unwrapped error is anything other than a ResourceNotFound. If .ml-stats-* exists but has unallocated primary shards, we may want to bail.

...n/core/src/main/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsStatsAction.java

...ore/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/InferenceStats.java

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/LocalModel.java

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

davidkyle

Everything looks good but I'd like you to reconsider how valuable the time spent metric is.

If only a few documents are inferred then time spent in nanos will be rounded down to 0 millis.

The overhead to measuring the time is tiny compared to the cost of the modelling function but there is overhead calling System.nanoTime()

My main concern is that we give a false impression of accuracy where we know there is rounding error and loss. How valuable will the user find the metric?

Let's open up the discussion my feeling is that it is not necessary and it is easy to put the timing code back in a later version if required but difficult to remove.

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/TrainedModelStatsService.java

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/TrainedModelStatsService.java

benwtrent · 2020-03-31T11:48:20Z

I'd like you to reconsider how valuable the time spent metric is.

I included it as it is a statistic gathered by ingest. But, I do agree, the overhead and loss of accuracy here might not make it worth it.

All the other use cases of inference would include timing stats themselves (ingest and search). So, information of how long a model takes to infer can be well...inferred, by the time used in search/ingest.

I have no problem removing timing stats right now.

benwtrent · 2020-03-31T18:04:24Z

@elasticmachine update branch

…nwtrent/elasticsearch into feature/ml-inference-stats-collection

benwtrent · 2020-03-31T18:30:22Z

@elasticmachine update branch

benwtrent · 2020-04-01T13:03:27Z

@elasticmachine update branch

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/TrainedModelStatsService.java

x-pack/plugin/core/src/main/resources/org/elasticsearch/xpack/core/ml/stats_index_mappings.json

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/TrainedModelStatsService.java

davidkyle

LGTM

The persist interval can probably be bump a couple of seconds

…nce-stats-collection

davidkyle

LGTM

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/TrainedModelStatsService.java

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/LocalModel.java

This PR enables stats on inference to be gathered and stored in the `.ml-stats-*` indices. Each node + model_id will have its own running stats document and these will later be summed together when returning _stats to the user. `.ml-stats-*` is ILM managed (when possible). So, at any point the underlying index could change. This means that a stats document that is read in and then later updated will actually be a new doc in a new index. This complicates matters as this means that having a running knowledge of seq_no and primary_term is complicated and almost impossible. This is because we don't know the latest index name. We should also strive for throughput, as this code sits in the middle of an ingest pipeline (or even a query).

* [ML] Start gathering and storing inference stats (#53429) This PR enables stats on inference to be gathered and stored in the `.ml-stats-*` indices. Each node + model_id will have its own running stats document and these will later be summed together when returning _stats to the user. `.ml-stats-*` is ILM managed (when possible). So, at any point the underlying index could change. This means that a stats document that is read in and then later updated will actually be a new doc in a new index. This complicates matters as this means that having a running knowledge of seq_no and primary_term is complicated and almost impossible. This is because we don't know the latest index name. We should also strive for throughput, as this code sits in the middle of an ingest pipeline (or even a query).

[ML] Gathering inference stats in localModel and loading service

ee48686

benwtrent added >enhancement :ml Machine learning v8.0.0 v7.7.0 labels Mar 11, 2020

benwtrent commented Mar 11, 2020

View reviewed changes

benwtrent added 4 commits March 12, 2020 08:03

Merge branch 'master' into feature/ml-inference-stats-collection

0112705

making stats loading failure fail the listeners

6b45143

Merge branch 'master' into feature/ml-inference-stats-collection

3afa69e

allowing missing stats index

c406e8e

davidkyle reviewed Mar 19, 2020

View reviewed changes

benwtrent requested review from davidkyle and droberts195 March 20, 2020 14:55

benwtrent added 2 commits March 20, 2020 10:55

addressing pr comments

b7bda48

Merge branch 'master' into feature/ml-inference-stats-collection

6bf2263

davidkyle reviewed Mar 23, 2020

View reviewed changes

bpintea added v7.8.0 and removed v7.7.0 labels Mar 25, 2020

Merge branch 'master' into feature/ml-inference-stats-collection

bf4051a

addressing pr comments

0d6ba92

elasticmachine and others added 3 commits March 31, 2020 14:04

Merge branch 'master' into feature/ml-inference-stats-collection

d2aecfa

fixing style checks

9faca62

Merge branch 'feature/ml-inference-stats-collection' of github.com:be…

5b0cf39

…nwtrent/elasticsearch into feature/ml-inference-stats-collection

elasticmachine and others added 2 commits March 31, 2020 14:30

Merge branch 'master' into feature/ml-inference-stats-collection

f5f2821

Update InferenceIngestIT.java

fe44488

benwtrent requested a review from davidkyle April 1, 2020 12:57

Merge branch 'master' into feature/ml-inference-stats-collection

650947b

davidkyle reviewed Apr 2, 2020

View reviewed changes

benwtrent added 2 commits April 2, 2020 13:21

Merge branch 'master' into feature/ml-inference-stats-collection

c2c1522

addressing PR comments

5f86f95

davidkyle approved these changes Apr 3, 2020

View reviewed changes

benwtrent added 5 commits April 3, 2020 10:00

incrementally updating stats instead of overwriting

0360c3b

Merge remote-tracking branch 'upstream/master' into feature/ml-infere…

8802fc4

…nce-stats-collection

fixing bwc serialization versions

0cca932

minor fixes

ff15e97

handling situation where aggs are null

74b95d5

davidkyle approved these changes Apr 3, 2020

View reviewed changes

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/TrainedModelStatsService.java Outdated Show resolved Hide resolved

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/LocalModel.java Show resolved Hide resolved

fixing stats queueing

1d7abd5

benwtrent merged commit c087ee1 into elastic:master Apr 3, 2020

benwtrent deleted the feature/ml-inference-stats-collection branch April 3, 2020 18:09

benwtrent mentioned this pull request Apr 3, 2020

[7.x] [ML] Start gathering and storing inference stats (#53429) #54738

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

[ML] Start gathering and storing inference stats #53429

[ML] Start gathering and storing inference stats #53429

Uh oh!

Conversation

benwtrent commented Mar 11, 2020

Uh oh!

elasticmachine commented Mar 11, 2020

Uh oh!

benwtrent Mar 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benwtrent commented Mar 31, 2020

Uh oh!

benwtrent commented Mar 31, 2020

Uh oh!

benwtrent commented Mar 31, 2020

Uh oh!

benwtrent commented Apr 1, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants