[ML][Inference] separating definition and config object storage #48651

benwtrent · 2019-10-29T19:26:16Z

This separates out the definition object from being stored within the configuration object in the index.

This allows us to gather the config object without decompressing a potentially large definition.

Additionally, input is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.

elasticmachine · 2019-10-29T19:26:34Z

Pinging @elastic/ml-core (:ml)

benwtrent · 2019-10-29T19:27:49Z

...gin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/TrainedModelDefinition.java

        this.trainedModel = ExceptionsHelper.requireNonNull(trainedModel, TRAINED_MODEL);
        this.preProcessors = preProcessors == null ? Collections.emptyList() : Collections.unmodifiableList(preProcessors);
-        this.input = ExceptionsHelper.requireNonNull(input, INPUT);
+        this.modelId = modelId;


I added the modelId here so that it is trivially queriable on config deletion. That way we can just do a DBQ on the model_id field for the deleted model.

benwtrent · 2019-10-29T19:28:57Z

...n/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/process/results/AnalyticsResult.java

        PARSER.declareObject(optionalConstructorArg(), RowResults.PARSER, RowResults.TYPE);
        PARSER.declareInt(optionalConstructorArg(), PROGRESS_PERCENT);
-        PARSER.declareObject(optionalConstructorArg(), (p, c) -> TrainedModelDefinition.STRICT_PARSER.apply(p, null).build(),
+        PARSER.declareObject(optionalConstructorArg(), (p, c) -> TrainedModelDefinition.LENIENT_PARSER.apply(p, null),


I changed this to the LENIENT_PARSER as the native code still sends back the input for the definition. Once that is corrected, I will return this to the STRICT_PARSER

dimitris-athanasiou · 2019-10-30T07:58:35Z

.../rest-high-level/src/main/java/org/elasticsearch/client/ml/inference/TrainedModelConfig.java

+            return PARSER.parse(parser, null);
+        }
+
+        private final List<String> fieldNames;


What is the reason for having this into a class instead of having the List<String> straight on the config object? Do we envision adding more things here? If yes, I would suggest extracting this into its own file.

@dimitris-athanasiou I will extract it to its own file. One could see us possibly having input be a transform instead of just a list of strings.

dimitris-athanasiou · 2019-10-30T08:03:50Z

...gin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/TrainedModelDefinition.java

+    private final String modelId;

-    TrainedModelDefinition(TrainedModel trainedModel, List<PreProcessor> preProcessors, Input input) {
+    TrainedModelDefinition(TrainedModel trainedModel, List<PreProcessor> preProcessors, @Nullable String modelId) {


Could we make this private?

dimitris-athanasiou · 2019-10-30T08:07:24Z

...gin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/TrainedModelDefinition.java

        private boolean processorsInOrder;
-        private Input input;

        private static Builder builderForParser() {


I think this is not used anywhere.

@dimitris-athanasiou it should be :). I will fix that promptly.

dimitris-athanasiou · 2019-10-30T08:10:13Z

...gin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/TrainedModelDefinition.java

        private List<PreProcessor> preProcessors;
        private TrainedModel trainedModel;
+        private String modelId;
        private boolean processorsInOrder;


Unrelated to this change, but could you please explain the need for processorsInOrder?

De serializing multiple named objects indicates that either they were provided in a map (out of order) or in an array (in order). We require that processors be provided in order. So, if they are provided in a map, we blow up.

This boolean is set by the parser as it is handling the named objects. If it is not set, that means that the parser handled these as a map. If there were more than one, then we should blow up.

dimitris-athanasiou · 2019-10-30T08:14:10Z

...n/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/process/results/AnalyticsResult.java

        PARSER.declareObject(optionalConstructorArg(), RowResults.PARSER, RowResults.TYPE);
        PARSER.declareInt(optionalConstructorArg(), PROGRESS_PERCENT);
-        PARSER.declareObject(optionalConstructorArg(), (p, c) -> TrainedModelDefinition.STRICT_PARSER.apply(p, null).build(),
+        PARSER.declareObject(optionalConstructorArg(), (p, c) -> TrainedModelDefinition.LENIENT_PARSER.apply(p, null),


Now that we no longer need to call build() we can switch to this way of declaring the parser:

PARSER.declareObject(optionalConstructorArg(), TrainedModelDefinition.LENIENT_PARSER, INFERENCE_MODEL);

dimitris-athanasiou · 2019-10-30T08:16:59Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/InferenceInternalIndex.java

            .startObject(TrainedModelConfig.CREATE_TIME.getPreferredName())
            .field(TYPE, DATE)
            .endObject()
-            .startObject(TrainedModelConfig.DEFINITION.getPreferredName())


Don't we need to disable indexing for the definition somewhere else?

By default, we don't dynamically index new fields (dynamic = false)

dimitris-athanasiou · 2019-10-30T08:21:26Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+
+        ActionListener<IndexResponse> putConfigListener = ActionListener.wrap(
+            r -> {
+                if (trainedModelConfig.getDefinition() != null) {


Should we allow for definition to ever be null when we're storing a model? I can't think of a reason to do so. Even if we add support for updates, we should have a different method for it. Unless there's a reason I'm missing, I think we could check the definition is not null and throw if it is early on this method.

I suppose we can make it more flexible in the future if ever necessary (or provide specially handling for internal use cases).

dimitris-athanasiou · 2019-10-30T08:29:28Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+            }
+        );
+
+        indexObject(trainedModelConfig.getModelId(), trainedModelConfig, putConfigListener);


I think we should consider indexing both objects in a bulk request. That way we'd be refreshing the index once and we can also handle errors in one place.

dimitris-athanasiou · 2019-10-30T08:31:28Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+            .add(client.prepareSearch(InferenceIndexConstants.INDEX_PATTERN)
+                .setQuery(queryBuilder)
+                // use sort to get the last
+                .addSort("_index", SortOrder.DESC)


Unrelated to this change but why are we sorting on _index here instead of create_time?

I think we should prefer the latest index version regardless of when the model is created. If we support updating, an updated model may go into the new index, but have an older create time of another model.

This is an open question for sure, but this is consistent with how we handle transforms.

dimitris-athanasiou · 2019-10-30T08:32:40Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+                TrainedModelDefinition definition;
+                try {
+                    builder = handleSearchItem(multiSearchResponse.getResponses()[0], modelId, this::parseInferenceDocLenientlyFromSource);
+                } catch(ResourceNotFoundException ex) {


nit: space after catch

dimitris-athanasiou

Looks good. Just a comment about whether we should rename Input to TrainedModelInput.

dimitris-athanasiou · 2019-10-30T13:38:18Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/Input.java

+import java.util.Objects;
+
+
+public class Input implements ToXContentObject, Writeable {


Should we call it TrainedModelInput? Input is too generic and we use the TrainedModel prefix for the config and the definition too.

dimitris-athanasiou · 2019-10-30T13:44:51Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

-            indexListener.onFailure(ex);
+            // This should never happen. If we were able to deserialize the object (from Native or REST) and then fail to serialize it again
+            // that is not the users fault. We did something wrong and should throw.
+            throw new ElasticsearchStatusException("Unexpected serialization exception for [{}]",


nit: Could use ExceptionsHelper.serverError. Not necessary but for future reference.

dimitris-athanasiou

LGTM

…tic#48651) This separates out the definition object from being stored within the configuration object in the index. This allows us to gather the config object without decompressing a potentially large definition. Additionally, input is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.

…#48651) (#48695) * [ML][Inference] separating definition and config object storage (#48651) This separates out the `definition` object from being stored within the configuration object in the index. This allows us to gather the config object without decompressing a potentially large definition. Additionally, `input` is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.

[ML][Inference] separating definition and config object storage

95b0f39

benwtrent added :ml Machine learning >non-issue v7.6.0 v8.0.0 labels Oct 29, 2019

benwtrent commented Oct 29, 2019

View reviewed changes

dimitris-athanasiou reviewed Oct 30, 2019

View reviewed changes

addressing PR comments

acf3be9

benwtrent requested a review from dimitris-athanasiou October 30, 2019 12:59

dimitris-athanasiou reviewed Oct 30, 2019

View reviewed changes

benwtrent added 2 commits October 30, 2019 10:06

renaming input and using exceptions helper

5b3e499

adding tests

126eabf

dimitris-athanasiou approved these changes Oct 30, 2019

View reviewed changes

benwtrent merged commit 1b3c1d2 into elastic:master Oct 30, 2019

benwtrent deleted the feature/ml-inference-separate-definition-and-config branch October 30, 2019 15:34

benwtrent mentioned this pull request Oct 30, 2019

[7.x] [ML][Inference] separating definition and config object storage (#48651) #48695

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

		import java.util.Objects;


		public class Input implements ToXContentObject, Writeable {

[ML][Inference] separating definition and config object storage #48651

[ML][Inference] separating definition and config object storage #48651

Uh oh!

Conversation

benwtrent commented Oct 29, 2019

Uh oh!

elasticmachine commented Oct 29, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants