-
Notifications
You must be signed in to change notification settings - Fork 25.6k
reindex: automatically choose the number of slices #26030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reindex: automatically choose the number of slices #26030
Conversation
add unit tests with multiple sources
At this point everything passes
| } else { | ||
| slices = 1; | ||
| slices = Slices.DEFAULT; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how backwards compatibility should work here. This change uses -1 as the value here for serializing the auto setting, which won't be a valid slices int value in earlier versions. Something like
readFrom {
if version on or after 5.1.1 {
slices = new Slices(in)
} else {
slices = default
}
}
writeTo {
if version on or after 5.1.1 and before 6.1.0 {
if slices is auto {
throw exception
} else {
slices.writeTo(out)
}
} else if version on or after 6.1.0
slices.writeTo(out)
} else {
if slices > 1 or slices is auto, throw exception
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thrown IllegalArgumentException("Auto slices no supported on versions before 6.1.0") from these which should bubble up to the user.
Also, I think you can drop the branch for 5.1.1 because 6.1.0 will only be able to communicate back as far as 5.6.0.
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a full review, just a couple random things I saw while glancing through the change.
| * Represents the setting for the number of slices used by a BulkByScrollRequest. Valid values are positive integers and "auto". The | ||
| * "auto" setting defers choosing the number of slices to the request handler. | ||
| */ | ||
| public final class Slices implements ToXContent, Writeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an entirely new class just to wrap a single integer? The boolean methods here could be static methods taking the int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking here was that rather than duplicate this logic (is slices a number or "auto") everywhere that distinction is relevant, I'd encapsulate it into a class. That and I prefer to avoid magic numbers if possible because they're less obvious than using a separate type.
That said, I definitely understand why this may not be desirable
The boolean methods here could be static methods taking the int?
Do you mean something like, treat slices=0 as auto and just have this method somewhere
public static boolean isAuto(int slices) {
if (slices == 0) {
return true
} else if (slices > 0) {
return false;
} else {
throw new InvalidArgumentException();
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that. Although I'm not sure if that is even really necessary. Just have a constant AUTO_SLICES = -1. This is similar the NO_DOC constant in lucene for docid. And then check for equality with the constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since you want the number to be positive, I would also use the value 0, so there is no gap for validation.
| e); | ||
| } | ||
| if (searchRequest.source().slice() != null && slices != 1) { | ||
| if (searchRequest.source().slice() != null && !slices.equals(Slices.DEFAULT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to use == false for negation because it is easier to see visually. You will find this pattern in both Elasticsearch and Lucene.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, that's much clearer visually
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look and left a few comments but didn't read all the way through.
| } | ||
| if (searchRequest.source().slice() != null && slices != 1) { | ||
| if (searchRequest.source().slice() != null && !slices.equals(Slices.DEFAULT)) { | ||
| e = addValidationError("can't specify both slice and workers", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine workers isn't the right things to say here. That is language left over from my first implementation. Can you fix it while you are making this change?
| } else { | ||
| slices = 1; | ||
| slices = Slices.DEFAULT; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thrown IllegalArgumentException("Auto slices no supported on versions before 6.1.0") from these which should bubble up to the user.
Also, I think you can drop the branch for 5.1.1 because 6.1.0 will only be able to communicate back as far as 5.6.0.
| StringBuilder builder = new StringBuilder(); | ||
| builder.append("BulkIndexByScrollResponse["); | ||
| builder.append(getClass().getSimpleName()); | ||
| builder.append("["); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this to line above, but I like the thought behind the change.
| * Sets this task to be a parent task for {@code slices} sliced subtasks | ||
| */ | ||
| public abstract void rethrottle(float newRequestsPerSecond); | ||
| public void setParent(int slices) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setChildSliceCount or initializeAsParent? Something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setParent makes me think it is setting a reference to the parent task.
| * a parent task. | ||
| */ | ||
| public abstract BulkByScrollTask.Status getStatus(); | ||
| public ParentBulkByScrollWorker getParentWorker() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reading this from top to bottom, I wonder if this should be private or package private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these methods have to be public to be accessible to the classes in the reindex module that use them. They get IllegalAccessError in integration even though they're in the same package. I'm not really sure how module loading works, but if it uses a different classloader than core I think it would disallow package-scope access in this case.
Sort of related: would it make sense to move some of the stuff covered in this PR that's in core to the reindex module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure how module loading works, but if it uses a different classloader than core I think it would disallow package-scope access in this case.
Yeah. I think it is rude to have the same package in two modules....
Sort of related: would it make sense to move some of the stuff covered in this PR that's in core to the reindex module?
I wouldn't worry too much about it for now. I believe @rjernst is working on something that'll let us more everything over to reindex's module again which'd be lovely.
| * @param requestsPerSecond How many search requests per second this task should make | ||
| */ | ||
| public abstract TaskInfo getInfoGivenSliceInfo(String localNodeId, List<TaskInfo> sliceInfo); | ||
| public void setChild(Integer sliceId, float requestsPerSecond) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal as comment above initializeAsChild or something.
| * Returns the worker object that manages sending search requests. Throws IllegalStateException if this task is not set to be a | ||
| * child task. | ||
| */ | ||
| public ChildBulkByScrollWorker getChildWorker() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal, I wonder if this should be public.
| @Override | ||
| public void onCancelled() { | ||
| if (isParent()) { | ||
| // do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain that we do not have to do anything because the task cancelation mechanism automatically waits for all children to be canceled.
| } else if (isChild()) { | ||
| childWorker.handleCancel(); | ||
| } else { | ||
| throw new IllegalStateException("This task's worker is not set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is an exception a user can see. In that case we might want it to be something like "this request has yet to initialize enough to know how to be canceled."
| this.count = count; | ||
| } | ||
|
|
||
| public Slices(StreamInput stream) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to put the reading code right under the writing code so they are easy to eyeball together.
* Move back to integer type for slices * Better method names in BulkByScrollTask * BWC only to 6.1.0 and later, tests included
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a bunch of minor stuff and some stylistic stuff. I think it quite close though.
| if (searchRequest.source().slice() != null && slices != 1) { | ||
| e = addValidationError("can't specify both slice and workers", e); | ||
| if (searchRequest.source().slice() != null && slices != DEFAULT_SLICES) { | ||
| e = addValidationError("can't set a specific single slice for this request and multiple slices", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.. I wonder if "can't specify both manual and automatic slicing on the same request" would be better. I just had a read of the reindex docs this morning and I believe searchRequest.source().slice() != null is what I called "manual slicing" and slices != DEFAULT_SLICES is what I called "automatic slicing".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that makes sense. When I wrote that I was thinking about the child tasks having a slice builder set in automatic slicing, but this aligns more with how the api is described
| */ | ||
| public Self setSlices(int slices) { | ||
| if (slices < 1) { | ||
| throw new IllegalArgumentException("[slices] must be at least 1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to keep it but compare with 0 instead.
| out.writeVInt(maxRetries); | ||
| out.writeFloat(requestsPerSecond); | ||
| if (out.getVersion().onOrAfter(Version.V_5_1_1)) { | ||
| if (out.getVersion().onOrAfter(Version.V_6_1_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be clearer to combine the if statements like:
if (slices == AUTO_SLICES && out.getVersion().before(Version.V_6_1_0)) {
throw
} else {
write
}
|
|
||
| /** | ||
| * Build the status for this task given a snapshot of the information of running slices. | ||
| * Returns true if this task is a child task that performs search requests. False otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this return true for single slice requests? They aren't really "children". I'd called them "working" requests which I think makes more sense but clashes with your "Worker" name. I'm not sure what to do about it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll return true for single sliced requests since they get set with a child worker, even if slices was auto. I agree the names aren't great and there's a lot of overlap, I'll see if I can find some better ones
| * Sets this task to be a child task that performs search requests, when the request is not sliced. | ||
| * @param requestsPerSecond How many search requests per second this task should make | ||
| */ | ||
| public void setSliceChild(float requestsPerSecond) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably skip making this method and call the other one with null instead.
| Supplier<AbstractAsyncBulkByScrollAction<Request>> taskSupplier) { | ||
|
|
||
| if (request.getSlices() == AbstractBulkByScrollRequest.AUTO_SLICES) { | ||
| client.admin().cluster().prepareSearchShards(request.getSearchRequest().indices()).execute(ActionListener.wrap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend not to use prepareXXX methods inside of Elasticsearch and reserve those for testing. We tend to think of the "Builders" as part of the transport client API rather than truly part of core. That isn't a hard and fast rule but I figure it is worth following here so, one day, we can remove the builders entirely. Like, years from now. Anyway, please use the request directly.
| task.setSliceChildren(slices); | ||
| sendSubRequests(client, action, node.getId(), task, request, listener); | ||
| } else { | ||
| Integer sliceId = request.getSearchRequest().source().slice() == null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this'd be easier to read as:
SliceBuilder sliceBuilder = request.getSearchRequest().source().slice();
Integer sliceId = sliceBuilder == null ? null : sliceBuilder.getId();
| private static int countSlicesBasedOnShards(ClusterSearchShardsResponse response) { | ||
| Map<Index, Integer> countsByIndex = Arrays.stream(response.getGroups()).collect(Collectors.toMap( | ||
| group -> group.getShardId().getIndex(), | ||
| __ -> 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be more normal for us to do group -> 1 without the _. We don't have the "_ means doesn't matter" norm yet.
| } | ||
|
|
||
| private static <Request extends AbstractBulkByScrollRequest<Request>> void sendSubRequests( | ||
| Client client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal with the arguments.
| ActionListener<BulkByScrollResponse> listener, | ||
| Client client, | ||
| DiscoveryNode node, | ||
| Supplier<AbstractAsyncBulkByScrollAction<Request>> taskSupplier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think taskSupplier is worth javadoc. Maybe even renaming/reworking to Runnable runUnsliced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or something. I'm not great with names.
* Better names for the slice task strategy classes * More descriptive error messages and docs language
|
@nik9000 I changed Worker -> TaskState since that's mostly what it is, and Parent -> Leader and Child -> worker to make the relationship more clear. I think these names are a little better than what I had originally For your comment about the supplier in the parallelization helper can you elaborate on why a Runnable would be better? I just went with the stricter behavior because it seemed like starting an |
Sure! Reading the supplier made me think you were getting something. And then when I realized what you were getting I thought "but doesn't he need to customize that for every request?" and then I realized why you don't, it is because you only call it at all when starting the "working" request. You skip it when you are parallelizing. So then I thought "why not just call it something that has to do with running the request?" And then I thought, "why is it a supplier when it really is about running stuff?" Basically I got confused about what it was for so I suggested renaming it. You don't have to make it into a |
| */ | ||
| public Self setSlices(int slices) { | ||
| if (slices < 0) { | ||
| throw new IllegalArgumentException("[slices] must be at least 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case this is ever thrown it'd be nice to have the error message more like "slices must be at least 0 but was [$value]".
|
I see what you mean, I think it's more clear if it's a runnable. It looks like the last CI build failed because the node was low on resources, I'll let this finish before merging in the interest of getting a green build |
In reindex APIs, when using the `slices` parameter to choose the number of slices, adds the option to specify `slices` as "auto" which will choose a reasonable number of slices. It uses the number of shards in the source index, up to a ceiling. If there is more than one source index, it uses the smallest number of shards among them. This gives users an easy way to use slicing in these APIs without having to make decisions about how to configure it, as it provides a good-enough configuration for them out of the box. This may become the default behavior for these APIs in the future.
* master: (30 commits) Rewrite range queries with open bounds to exists query (elastic#26160) Fix eclipse compilation problem (elastic#26170) Epoch millis and second formats parse float implicitly (Closes elastic#14641) (elastic#26119) fix SplitProcessor targetField test (elastic#26178) Fixed typo in README.textile (elastic#26168) Fix incorrect class name in deleteByQuery docs (elastic#26151) Move more token filters to analysis-common module reindex: automatically choose the number of slices (elastic#26030) Fix serialization of the `_all` field. (elastic#26143) percolator: Hint what clauses are important in a conjunction query based on fields Remove unused Netty-related settings (elastic#26161) Remove SimpleQueryStringIT#testPhraseQueryOnFieldWithNoPositions. Tests: reenable ShardReduceIT#testIpRange. Allow `ClusterState.Custom` to be created on initial cluster states (elastic#26144) Teach the build about betas and rcs (elastic#26066) Fix wrong header level inner hits: Unfiltered nested source should keep its full path Document how to import Lucene Snapshot libs when elasticsearch clients (elastic#26113) Use `global_ordinals_hash` execution mode when sorting by sub aggregations. (elastic#26014) Make the README use a single type in examples. (elastic#26098) ...
For #24547
Add an option to reindex, update by query, and delete by query to set
slicestoautorather than a specific number. The number of slices it chooses will be the lowest number of shards among the source indices, up to a constant ceiling.I chose the ceiling arbitrarily as 20 here. Next I want to do some of the rally benchmarking mentioned in the original issue to find what value makes the most sense.
In the interest of maintaining the original behavior, when slices is set to auto and there is only one shard, it will handle the request as if it was not sliced (i.e.
slicesdefaults to 1).This also adds unit tests for these APIs with multiple source indices
This includes the language from #25582 so I'll close that when this is merged