Skip to content

Conversation

@benwtrent
Copy link
Member

@benwtrent benwtrent commented Jan 28, 2020

This adds logic to handle paging problems when the ID pattern + tags reference models stored as resources.

Most of the complexity comes from the issue where a model stored as a resource could be at the start, or the end of a page or when we are on the last page.

closes #51543

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

}

public void testExpandIdsPagination() {
//NOTE: these test assume that the query pagination results are "buffered"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//NOTE: these test assume that the query pagination results are "buffered"
// NOTE: these tests assume that the query pagination results are "buffered"

public void expandIds(String idExpression,
boolean allowNoResources,
@Nullable PageParams pageParams,
PageParams pageParams,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, are pageParams guaranteed to be non-null?

Copy link
Member

@davidkyle davidkyle Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the pageParams parameter really necessary? I'd argue that I would expect expandIds to return all the ids rather than a page as the id small compared to a full document.

The default value of PageParams.getSize() is 100. Without setting a page param object you may not realise that limitation is in place and that you are not getting all of your ids. Is there even a mechanism to say you need to get the second page now?

Compare to JobConfigProvider.expandJobIds which gives you AnomalyDetectorsIndex.CONFIG_INDEX_MAX_RESULTS_WINDOW = 10_000 ids.

It would also simplify this code to remove to the pageParams which is a very good reason to remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the pageParams parameter really necessary?

Yes, I think it is. The APIs making calls against this function require paging params (default is from: 0, size: 100). I think this is a sane way of handling larger numbers of items.

Compare to JobConfigProvider.expandJobIds which gives you AnomalyDetectorsIndex.CONFIG_INDEX_MAX_RESULTS_WINDOW = 10_000 ids.

I think that this is a bug and shows that we don't support good pagination at all with anomaly detection jobs.

Is there even a mechanism to say you need to get the second page now?

Yes, the total count of all the ids found with id pattern is returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, are pageParams guaranteed to be non-null?

It is not possible to set them as null via the API since you can only set from and size individually.

If it is null from a client, they just don't set those parameters (and thus get the default value).

If an internal caller is making these calls, they should not explicitly set pageParams to null and if they do, they will receive an error pretty quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was that with default paging we open ourselves up to bugs such as elastic/kibana#38559.

I assumed getTrainedModel would take paging parameters in which case paging in expandIds would be redundant but I see from the way the code is used that is not the case and GetTrainedModelsStatsAction explicitly depends on paging ids.

AD jobs are not paged because when they lived in clusterstate all where returned by the GET API. It would have been a breaking change to require paging on that API when we move to the .ml-config index and we could not make a breaking change in that release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the apis GET _ml/trained_models and GET _ml/trained_models/_stats accept paging parameters and use those when expanding the IDs.

Since they are stored in indices, paging parameters are a must (even for expanding IDs) and might as well have it handled now instead of worrying about adding them later when we reach scaling issues.

if (bufferedFrom > 0) {
allFoundIds.remove(allFoundIds.first());
bufferedFrom--;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
} else {

while (allFoundIds.size() > sizeLimit) {
// If we are still over limit, and have buffered items, that means the first ids belong on the previous page
if (bufferedFrom > 0) {
allFoundIds.remove(allFoundIds.first());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it makes sense to optimize this, but I could imagine having an iterator going through allFoundIds and calling .remove() on the iterator rather than removing the smallest item from the collection directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, popping the iterator, for sure. I actually find it more readable to have a remote(..first()). These sets are usually "small" and right now the highest number that we will "buffer" is the total number of models stored as resources (which is 1).

client::search);
}

static Set<String> collectIds(PageParams pageParams, Set<String> foundFromResources, Set<String> foundFromDocs, long totalMatchedIds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this method somewhat hard to follow.
Should the logic be similar to merge sort? I.e. we have two sorted collections and take size elements (in total) from whichever collection has the lowest id at the front. from would be per-collection (so we'd have two froms).

LMK if you think it's possible to rewrite it this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be rewritten like a merge sort, but the issue still resides "From which side of the page do we remove items?" And we would not know the ends of the pages until the set is fully merged. So, I am not sure how it would be helpful.

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much tricker than I thought.

if (tags.isEmpty()) {
foundResourceIds.addAll(matchedResourceIds(tokens));
} else {
for(String resourceId : matchedResourceIds(tokens)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated nit: Can MODELS_STORED_AS_RESOURCE be a Collections.unmodifiableSet then line 520

return new HashSet<>(MODELS_STORED_AS_RESOURCE);

in matchedResourceIds wouldn't have to wrap a set in a set

requiredMatches.filterMatchedIds(foundResourceIds);
requiredMatches.filterMatchedIds(allFoundIds);
if (requiredMatches.hasUnmatchedIds()) {
idsListener.onFailure(ExceptionsHelper.missingTrainedModel(requiredMatches.unmatchedIdsString()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this error in a valid situation?

from = 0, size = 10, idExpression = 'bar*,foo'

I have the models bar-1, bar-2 ... bar-10 and foo.

The first 10 bar-n are returned by the search but foo is unmatched.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle i think this was a problem well before I added this logic. Data frame analytics suffers from this as well.

I think it is OK to error in this case.

I may be wrong though (about it being OK) :D.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly it's not part of this change but let's follow up on it. I don't think we should be reporting en error when we don't know if it is an error or not, maybe we can relax the check

// We "buffer" the from and size to take into account models stored as resources.
// This is so we handle the edge cases when the model that is stored as a resource is at the start/end of
// a page.
.from(Math.max(0, pageParams.getFrom() - foundResourceIds.size()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the strategy is to get extra then figure out where the resource model ids would fit in that sorted list 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle exactly. If we don't have the buffer, AND the resource model ID is at the start of the list, we don't know if it is ACTUALLY at the start of the page, or the END of the previous page.

Hence the need for a buffer.

int bufferedFrom = Math.min(foundFromResources.size(), from);

// If size = 10_000 but there aren't that many total IDs, reduce the size here to make following logic simpler
int sizeLimit = (int)Math.min(pageParams.getSize(), totalMatchedIds - from);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of search hits and and resource model ids are < pageParams.getSize() then that is the size limit. from isn't a factor here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from is a factor here. totalMatchedIds is the total hits, which means the total number of ids that matched the query which could be WAY bigger than the size.

If there are a total of 20 IDs, and the user is doing from: 19, size:200, we should give them the rest of the IDs. Our size limit in this case shouldn't be 200, it should be 1

allFoundIds.remove(allFoundIds.last());
}
}
return allFoundIds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My way of looking at it is this: we have a request for 10 ids and 2 models stored as resource. We query for 14 ids so we can figure out where the resource model ids would fit. We have a result like

AAOOOOOOOOOOBB

where the As and Bs are the padding.

If the model resource Ids come before the As then they are outside the ordering and all the Os are returned. Same if the model resource ids come after the Bs.

If they are inserted middle (ids M & N) we have an ordering like

AAOOOOOMOOONOOBB

we we take the first 10 after the As.

Same for
AAMOOOOOONOOOOBB
MNAAOOOOOOOOOOBB
AMAOOOOOONOOOOBB
AAOOOOOOOOOOBBMN

I think we have to track the insertion position of the model resource Ids. Collections.binarySearch() would give the insert position.

There is a further complication when the search does not return size hits and you have
AAOOOOOOOOOO
or
AAOOOO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no BB at the end. moving back from by 2 and increasing size by 2 simply adds two new items to the start because it moves the page view back but it will still include the end of the initially desired page.

If that makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take your example and run with it. Assuming size is 10.
10 ids and 2 models
Since we decrease from by two, we need to increase the size by two to fill out the page. If we didn't we would end up with
AAOOOOOOOO (2 As, 8 Os)
But, we increase the size to capture those two Os and our search from the docs returns

AAOOOOOOOOOO

If the resource models are inserted middle (ids M & N) we have an ordering like:

AAOOOOOMOOONOO

since our size is 14 in this case, we trim the first two As and then the last two O

Returning to the user

OOOOOMOOON

Here are the results for the rest of your example insertion points

  1. MOOOOOONOO
  2. AAOOOOOOOO
  3. AOOOOOONOO
  4. OOOOOOOOOO

I think 1 and 4 are obvious OK situations.

2 and 3 may seem weird off hand, but they are OK as well, since M or N would have been at the END of the previous page and "pushed" these two buffered IDS onto this page.

I think we have to track the insertion position of the model resource Ids.

I don't think this is possible since we don't know all the page boundries (without the buffers) and knowing the insertion point does not add anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. From those examples it seems once you have taken the 2 sorted sets and merged them together there are 2 conditions to consider.

  1. from == 0 in which case we remove from the back of the set until allFoundIds.size <= pageparams.size

In this case there are no As to remove so we trim the set to size from the back

  1. from > 0, remove the number of resource models from the front of the set - this is the As or another id that displaced the As. Then trim the set to size from the back.

With those observations the code could be simplified to

    static Set<String> collectIds(PageParams pageParams, Set<String> foundFromResources, Set<String> foundFromDocs) {
        // If there are no matching resource models, there was no buffering and the models from the docs
        // are paginated correctly.
        if (foundFromResources.isEmpty()) {
            return foundFromDocs;
        }

        TreeSet<String> allFoundIds = new TreeSet<>(foundFromDocs);
        allFoundIds.addAll(foundFromResources);

        if (pageParams.getFrom() > 0) {
            // not the first page so there will be extra results at the front to remove
            int numToTrimFromFront = Math.min(foundFromResources.size(), pageParams.getFrom());
            for (int i = 0; i < numToTrimFromFront; i++) {
                allFoundIds.remove(allFoundIds.first());
            }
        }

        // trim down to size removing from the rear
        while (allFoundIds.size() > pageParams.getSize()) {
            allFoundIds.remove(allFoundIds.last());
        }

        return allFoundIds;
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle wrote some more test cases, and this solution works great. I modified my code :D

@benwtrent benwtrent requested a review from davidkyle January 30, 2020 16:41
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for iterating on this

@benwtrent benwtrent merged commit 108ebc1 into elastic:master Jan 31, 2020
@benwtrent benwtrent deleted the feature/ml-inference-fix-model-pagination branch January 31, 2020 11:52
@benwtrent benwtrent removed the v7.6.1 label Jan 31, 2020
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 31, 2020
…c#51573)

This adds logic to handle paging problems when the ID pattern + tags reference models stored as resources. 

Most of the complexity comes from the issue where a model stored as a resource could be at the start, or the end of a page or when we are on the last page.
benwtrent added a commit that referenced this pull request Jan 31, 2020
#51736)

This adds logic to handle paging problems when the ID pattern + tags reference models stored as resources. 

Most of the complexity comes from the issue where a model stored as a resource could be at the start, or the end of a page or when we are on the last page.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ML] Get trained models: pre-packed models are not affected by from and size

5 participants