[ML][Inference] Fix model pagination with models as resources #51573

benwtrent · 2020-01-28T21:33:52Z

This adds logic to handle paging problems when the ID pattern + tags reference models stored as resources.

Most of the complexity comes from the issue where a model stored as a resource could be at the start, or the end of a page or when we are on the last page.

closes #51543

elasticmachine · 2020-01-28T21:33:54Z

Pinging @elastic/ml-core (:ml)

x-pack/plugin/src/test/resources/rest-api-spec/test/ml/inference_crud.yml

przemekwitek · 2020-01-29T07:08:50Z

...rc/test/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProviderTests.java

    }

+    public void testExpandIdsPagination() {
+        //NOTE: these test assume that the query pagination results are "buffered"


Suggested change

//NOTE: these test assume that the query pagination results are "buffered"

// NOTE: these tests assume that the query pagination results are "buffered"

przemekwitek · 2020-01-29T08:22:32Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

    public void expandIds(String idExpression,
                          boolean allowNoResources,
-                          @Nullable PageParams pageParams,
+                          PageParams pageParams,


So, are pageParams guaranteed to be non-null?

Is the pageParams parameter really necessary? I'd argue that I would expect expandIds to return all the ids rather than a page as the id small compared to a full document.

The default value of PageParams.getSize() is 100. Without setting a page param object you may not realise that limitation is in place and that you are not getting all of your ids. Is there even a mechanism to say you need to get the second page now?

Compare to JobConfigProvider.expandJobIds which gives you AnomalyDetectorsIndex.CONFIG_INDEX_MAX_RESULTS_WINDOW = 10_000 ids.

It would also simplify this code to remove to the pageParams which is a very good reason to remove it.

Is the pageParams parameter really necessary?

Yes, I think it is. The APIs making calls against this function require paging params (default is from: 0, size: 100). I think this is a sane way of handling larger numbers of items.

Compare to JobConfigProvider.expandJobIds which gives you AnomalyDetectorsIndex.CONFIG_INDEX_MAX_RESULTS_WINDOW = 10_000 ids.

I think that this is a bug and shows that we don't support good pagination at all with anomaly detection jobs.

Is there even a mechanism to say you need to get the second page now?

Yes, the total count of all the ids found with id pattern is returned.

So, are pageParams guaranteed to be non-null?

It is not possible to set them as null via the API since you can only set from and size individually.

If it is null from a client, they just don't set those parameters (and thus get the default value).

If an internal caller is making these calls, they should not explicitly set pageParams to null and if they do, they will receive an error pretty quickly.

My concern was that with default paging we open ourselves up to bugs such as elastic/kibana#38559.

I assumed getTrainedModel would take paging parameters in which case paging in expandIds would be redundant but I see from the way the code is used that is not the case and GetTrainedModelsStatsAction explicitly depends on paging ids.

AD jobs are not paged because when they lived in clusterstate all where returned by the GET API. It would have been a breaking change to require paging on that API when we move to the .ml-config index and we could not make a breaking change in that release.

Yes, the apis GET _ml/trained_models and GET _ml/trained_models/_stats accept paging parameters and use those when expanding the IDs.

Since they are stored in indices, paging parameters are a must (even for expanding IDs) and might as well have it handled now instead of worrying about adding them later when we reach scaling issues.

przemekwitek · 2020-01-29T08:27:22Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+            if (bufferedFrom > 0) {
+                allFoundIds.remove(allFoundIds.first());
+                bufferedFrom--;
+            }


Suggested change

}

} else {

przemekwitek · 2020-01-29T08:29:37Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+        while (allFoundIds.size() > sizeLimit) {
+            // If we are still over limit, and have buffered items, that means the first ids belong on the previous page
+            if (bufferedFrom > 0) {
+                allFoundIds.remove(allFoundIds.first());


Not sure if it makes sense to optimize this, but I could imagine having an iterator going through allFoundIds and calling .remove() on the iterator rather than removing the smallest item from the collection directly.

Ah, popping the iterator, for sure. I actually find it more readable to have a remote(..first()). These sets are usually "small" and right now the highest number that we will "buffer" is the total number of models stored as resources (which is 1).

przemekwitek · 2020-01-29T08:32:10Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

            client::search);
    }

+    static Set<String> collectIds(PageParams pageParams, Set<String> foundFromResources, Set<String> foundFromDocs, long totalMatchedIds) {


I found this method somewhat hard to follow.
Should the logic be similar to merge sort? I.e. we have two sorted collections and take size elements (in total) from whichever collection has the lowest id at the front. from would be per-collection (so we'd have two froms).

LMK if you think it's possible to rewrite it this way.

It could be rewritten like a merge sort, but the issue still resides "From which side of the page do we remove items?" And we would not know the ends of the pages until the set is fully merged. So, I am not sure how it would be helpful.

przemekwitek

LGTM

davidkyle

It's much tricker than I thought.

davidkyle · 2020-01-29T15:21:55Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+        if (tags.isEmpty()) {
+            foundResourceIds.addAll(matchedResourceIds(tokens));
+        } else {
+            for(String resourceId : matchedResourceIds(tokens)) {


Unrelated nit: Can MODELS_STORED_AS_RESOURCE be a Collections.unmodifiableSet then line 520

return new HashSet<>(MODELS_STORED_AS_RESOURCE);

in matchedResourceIds wouldn't have to wrap a set in a set

davidkyle · 2020-01-29T15:32:52Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

-                    requiredMatches.filterMatchedIds(foundResourceIds);
+                    requiredMatches.filterMatchedIds(allFoundIds);
                    if (requiredMatches.hasUnmatchedIds()) {
                        idsListener.onFailure(ExceptionsHelper.missingTrainedModel(requiredMatches.unmatchedIdsString()));


Could this error in a valid situation?

from = 0, size = 10, idExpression = 'bar*,foo'

I have the models bar-1, bar-2 ... bar-10 and foo.

The first 10 bar-n are returned by the search but foo is unmatched.

@davidkyle i think this was a problem well before I added this logic. Data frame analytics suffers from this as well.

I think it is OK to error in this case.

I may be wrong though (about it being OK) :D.

Certainly it's not part of this change but let's follow up on it. I don't think we should be reporting en error when we don't know if it is an error or not, maybe we can relax the check

davidkyle · 2020-01-29T15:34:21Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+            // We "buffer" the from and size to take into account models stored as resources.
+            // This is so we handle the edge cases when the model that is stored as a resource is at the start/end of
+            // a page.
+            .from(Math.max(0, pageParams.getFrom() - foundResourceIds.size()))


So the strategy is to get extra then figure out where the resource model ids would fit in that sorted list 👍

@davidkyle exactly. If we don't have the buffer, AND the resource model ID is at the start of the list, we don't know if it is ACTUALLY at the start of the page, or the END of the previous page.

Hence the need for a buffer.

davidkyle · 2020-01-29T15:46:41Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+        int bufferedFrom = Math.min(foundFromResources.size(), from);
+
+        // If size = 10_000 but there aren't that many total IDs, reduce the size here to make following logic simpler
+        int sizeLimit = (int)Math.min(pageParams.getSize(), totalMatchedIds - from);


If the number of search hits and and resource model ids are < pageParams.getSize() then that is the size limit. from isn't a factor here

I think from is a factor here. totalMatchedIds is the total hits, which means the total number of ids that matched the query which could be WAY bigger than the size.

If there are a total of 20 IDs, and the user is doing from: 19, size:200, we should give them the rest of the IDs. Our size limit in this case shouldn't be 200, it should be 1

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

davidkyle · 2020-01-29T16:41:13Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+                allFoundIds.remove(allFoundIds.last());
+            }
+        }
+        return allFoundIds;


My way of looking at it is this: we have a request for 10 ids and 2 models stored as resource. We query for 14 ids so we can figure out where the resource model ids would fit. We have a result like

AAOOOOOOOOOOBB

where the As and Bs are the padding.

If the model resource Ids come before the As then they are outside the ordering and all the Os are returned. Same if the model resource ids come after the Bs.

If they are inserted middle (ids M & N) we have an ordering like

AAOOOOOMOOONOOBB

we we take the first 10 after the As.

Same for
AAMOOOOOONOOOOBB
MNAAOOOOOOOOOOBB
AMAOOOOOONOOOOBB
AAOOOOOOOOOOBBMN

I think we have to track the insertion position of the model resource Ids. Collections.binarySearch() would give the insert position.

There is a further complication when the search does not return size hits and you have
AAOOOOOOOOOO
or
AAOOOO

There are no BB at the end. moving back from by 2 and increasing size by 2 simply adds two new items to the start because it moves the page view back but it will still include the end of the initially desired page.

If that makes sense.

Let me take your example and run with it. Assuming size is 10.
10 ids and 2 models
Since we decrease from by two, we need to increase the size by two to fill out the page. If we didn't we would end up with
AAOOOOOOOO (2 As, 8 Os)
But, we increase the size to capture those two Os and our search from the docs returns

AAOOOOOOOOOO

If the resource models are inserted middle (ids M & N) we have an ordering like:

AAOOOOOMOOONOO

since our size is 14 in this case, we trim the first two As and then the last two O

Returning to the user

OOOOOMOOON

Here are the results for the rest of your example insertion points

MOOOOOONOO

AAOOOOOOOO

AOOOOOONOO

OOOOOOOOOO

I think 1 and 4 are obvious OK situations.

2 and 3 may seem weird off hand, but they are OK as well, since M or N would have been at the END of the previous page and "pushed" these two buffered IDS onto this page.

I think we have to track the insertion position of the model resource Ids.

I don't think this is possible since we don't know all the page boundries (without the buffers) and knowing the insertion point does not add anything.

Thanks for the explanation. From those examples it seems once you have taken the 2 sorted sets and merged them together there are 2 conditions to consider.

from == 0 in which case we remove from the back of the set until allFoundIds.size <= pageparams.size

In this case there are no As to remove so we trim the set to size from the back

from > 0, remove the number of resource models from the front of the set - this is the As or another id that displaced the As. Then trim the set to size from the back.

With those observations the code could be simplified to

static Set<String> collectIds(PageParams pageParams, Set<String> foundFromResources, Set<String> foundFromDocs) { // If there are no matching resource models, there was no buffering and the models from the docs // are paginated correctly. if (foundFromResources.isEmpty()) { return foundFromDocs; } TreeSet<String> allFoundIds = new TreeSet<>(foundFromDocs); allFoundIds.addAll(foundFromResources); if (pageParams.getFrom() > 0) { // not the first page so there will be extra results at the front to remove int numToTrimFromFront = Math.min(foundFromResources.size(), pageParams.getFrom()); for (int i = 0; i < numToTrimFromFront; i++) { allFoundIds.remove(allFoundIds.first()); } } // trim down to size removing from the rear while (allFoundIds.size() > pageParams.getSize()) { allFoundIds.remove(allFoundIds.last()); } return allFoundIds; }

@davidkyle wrote some more test cases, and this solution works great. I modified my code :D

davidkyle

LGTM

Thanks for iterating on this

…c#51573) This adds logic to handle paging problems when the ID pattern + tags reference models stored as resources. Most of the complexity comes from the issue where a model stored as a resource could be at the start, or the end of a page or when we are on the last page.

#51736) This adds logic to handle paging problems when the ID pattern + tags reference models stored as resources. Most of the complexity comes from the issue where a model stored as a resource could be at the start, or the end of a page or when we are on the last page.

[ML][Inference] Fix model pagination with models as resources

b8c0d2e

benwtrent added >non-issue :ml Machine learning v8.0.0 v7.7.0 v7.6.1 labels Jan 28, 2020

przemekwitek reviewed Jan 29, 2020

View reviewed changes

addressing pr comments

e7e8249

benwtrent requested review from davidkyle and przemekwitek January 29, 2020 15:00

przemekwitek approved these changes Jan 29, 2020

View reviewed changes

davidkyle reviewed Jan 29, 2020

View reviewed changes

benwtrent added 2 commits January 29, 2020 14:37

addressing PR comments

5a7052f

Simplifying paging logic

87f28c9

benwtrent requested a review from davidkyle January 30, 2020 16:41

davidkyle approved these changes Jan 31, 2020

View reviewed changes

benwtrent merged commit 108ebc1 into elastic:master Jan 31, 2020

benwtrent deleted the feature/ml-inference-fix-model-pagination branch January 31, 2020 11:52

benwtrent removed the v7.6.1 label Jan 31, 2020

benwtrent mentioned this pull request Jan 31, 2020

[7.x] [ML][Inference] Fix model pagination with models as resources (#51573) #51736

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

	//NOTE: these test assume that the query pagination results are "buffered"
	// NOTE: these tests assume that the query pagination results are "buffered"

[ML][Inference] Fix model pagination with models as resources #51573

[ML][Inference] Fix model pagination with models as resources #51573

Uh oh!

Conversation

benwtrent commented Jan 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jan 28, 2020

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkyle Jan 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

przemekwitek left a comment

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

benwtrent commented Jan 28, 2020 •

edited

Loading

davidkyle Jan 29, 2020 •

edited

Loading