reindex: automatically choose the number of slices #26030

andyb-elastic · 2017-08-02T20:08:58Z

Add an option to reindex, update by query, and delete by query to set slices to auto rather than a specific number. The number of slices it chooses will be the lowest number of shards among the source indices, up to a constant ceiling.

I chose the ceiling arbitrarily as 20 here. Next I want to do some of the rally benchmarking mentioned in the original issue to find what value makes the most sense.

In the interest of maintaining the original behavior, when slices is set to auto and there is only one shard, it will handle the request as if it was not sliced (i.e. slices defaults to 1).

This also adds unit tests for these APIs with multiple source indices

This includes the language from #25582 so I'll close that when this is merged

Just uses the total group size as the slice count All unit and integ tests that were modified pass Module rest tests also pass with new tests for auto setting

Added similar unit tests for the three API types, they work so far

Seemingly handles one-slice size correctly Broken when one slice and auto Next going to try refactoring the task types

Changed the BulkByScroll task to use a strategy for the task behavior that used to all be built into the task Right now all the production code compiles except for a few things in xpack. Tests don't compile

Production and test code compiles org.elasticsearch.index.reindex.AsyncBulkByScrollActionTests#testCancelWhileDelayedAfterScrollResponse and some others are broken

Tests with rethrottling still broken

All tests pass now for core and reindex module

add unit tests with multiple sources

At this point everything passes

andyb-elastic · 2017-08-02T20:19:37Z

core/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

        } else {
-            slices = 1;
+            slices = Slices.DEFAULT;
        }


I'm not sure how backwards compatibility should work here. This change uses -1 as the value here for serializing the auto setting, which won't be a valid slices int value in earlier versions. Something like

readFrom { if version on or after 5.1.1 { slices = new Slices(in) } else { slices = default } } writeTo { if version on or after 5.1.1 and before 6.1.0 { if slices is auto { throw exception } else { slices.writeTo(out) } } else if version on or after 6.1.0 slices.writeTo(out) } else { if slices > 1 or slices is auto, throw exception } }

I've thrown IllegalArgumentException("Auto slices no supported on versions before 6.1.0") from these which should bubble up to the user.

Also, I think you can drop the branch for 5.1.1 because 6.1.0 will only be able to communicate back as far as 5.6.0.

rjernst

This is not a full review, just a couple random things I saw while glancing through the change.

rjernst · 2017-08-03T16:51:42Z

core/src/main/java/org/elasticsearch/index/reindex/Slices.java

+ * Represents the setting for the number of slices used by a BulkByScrollRequest. Valid values are positive integers and "auto". The
+ * "auto" setting defers choosing the number of slices to the request handler.
+ */
+public final class Slices implements ToXContent, Writeable {


Why do we need an entirely new class just to wrap a single integer? The boolean methods here could be static methods taking the int?

My thinking here was that rather than duplicate this logic (is slices a number or "auto") everywhere that distinction is relevant, I'd encapsulate it into a class. That and I prefer to avoid magic numbers if possible because they're less obvious than using a separate type.

That said, I definitely understand why this may not be desirable

The boolean methods here could be static methods taking the int?

Do you mean something like, treat slices=0 as auto and just have this method somewhere

public static boolean isAuto(int slices) { if (slices == 0) { return true } else if (slices > 0) { return false; } else { throw new InvalidArgumentException(); } }

Something like that. Although I'm not sure if that is even really necessary. Just have a constant AUTO_SLICES = -1. This is similar the NO_DOC constant in lucene for docid. And then check for equality with the constant.

But since you want the number to be positive, I would also use the value 0, so there is no gap for validation.

rjernst · 2017-08-03T16:52:32Z

core/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

                    e);
        }
-        if (searchRequest.source().slice() != null && slices != 1) {
+        if (searchRequest.source().slice() != null && !slices.equals(Slices.DEFAULT)) {


We prefer to use == false for negation because it is easier to see visually. You will find this pattern in both Elasticsearch and Lucene.

Agree, that's much clearer visually

nik9000

I had a look and left a few comments but didn't read all the way through.

nik9000 · 2017-08-03T15:14:29Z

core/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

        }
-        if (searchRequest.source().slice() != null && slices != 1) {
+        if (searchRequest.source().slice() != null && !slices.equals(Slices.DEFAULT)) {
            e = addValidationError("can't specify both slice and workers", e);


I imagine workers isn't the right things to say here. That is language left over from my first implementation. Can you fix it while you are making this change?

nik9000 · 2017-08-03T15:17:54Z

core/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

        } else {
-            slices = 1;
+            slices = Slices.DEFAULT;
        }


I've thrown IllegalArgumentException("Auto slices no supported on versions before 6.1.0") from these which should bubble up to the user.

Also, I think you can drop the branch for 5.1.1 because 6.1.0 will only be able to communicate back as far as 5.6.0.

nik9000 · 2017-08-03T15:18:49Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollResponse.java

        StringBuilder builder = new StringBuilder();
-        builder.append("BulkIndexByScrollResponse[");
+        builder.append(getClass().getSimpleName());
+        builder.append("[");


I'd move this to line above, but I like the thought behind the change.

nik9000 · 2017-08-03T15:20:43Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java

+     * Sets this task to be a parent task for {@code slices} sliced subtasks
     */
-    public abstract void rethrottle(float newRequestsPerSecond);
+    public void setParent(int slices) {


setChildSliceCount or initializeAsParent? Something like that.

setParent makes me think it is setting a reference to the parent task.

nik9000 · 2017-08-03T15:21:37Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java

+     * a parent task.
     */
-    public abstract BulkByScrollTask.Status getStatus();
+    public ParentBulkByScrollWorker getParentWorker() {


Just reading this from top to bottom, I wonder if this should be private or package private.

It looks like these methods have to be public to be accessible to the classes in the reindex module that use them. They get IllegalAccessError in integration even though they're in the same package. I'm not really sure how module loading works, but if it uses a different classloader than core I think it would disallow package-scope access in this case.

Sort of related: would it make sense to move some of the stuff covered in this PR that's in core to the reindex module?

I'm not really sure how module loading works, but if it uses a different classloader than core I think it would disallow package-scope access in this case.

Yeah. I think it is rude to have the same package in two modules....

Sort of related: would it make sense to move some of the stuff covered in this PR that's in core to the reindex module?

I wouldn't worry too much about it for now. I believe @rjernst is working on something that'll let us more everything over to reindex's module again which'd be lovely.

nik9000 · 2017-08-03T15:22:02Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java

+     * @param requestsPerSecond How many search requests per second this task should make
     */
-    public abstract TaskInfo getInfoGivenSliceInfo(String localNodeId, List<TaskInfo> sliceInfo);
+    public void setChild(Integer sliceId, float requestsPerSecond) {


Same deal as comment above initializeAsChild or something.

nik9000 · 2017-08-03T15:22:52Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java

+     * Returns the worker object that manages sending search requests. Throws IllegalStateException if this task is not set to be a
+     * child task.
+     */
+    public ChildBulkByScrollWorker getChildWorker() {


Same deal, I wonder if this should be public.

nik9000 · 2017-08-03T15:23:30Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java

+    @Override
+    public void onCancelled() {
+        if (isParent()) {
+            // do nothing


Maybe explain that we do not have to do anything because the task cancelation mechanism automatically waits for all children to be canceled.

nik9000 · 2017-08-03T15:24:17Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java

+        } else if (isChild()) {
+            childWorker.handleCancel();
+        } else {
+            throw new IllegalStateException("This task's worker is not set");


I wonder if this is an exception a user can see. In that case we might want it to be something like "this request has yet to initialize enough to know how to be canceled."

nik9000 · 2017-08-03T19:07:22Z

core/src/main/java/org/elasticsearch/index/reindex/Slices.java

+        this.count = count;
+    }
+
+    public Slices(StreamInput stream) throws IOException {


I like to put the reading code right under the writing code so they are easy to eyeball together.

* Move back to integer type for slices * Better method names in BulkByScrollTask * BWC only to 6.1.0 and later, tests included

andyb-elastic · 2017-08-07T22:50:14Z

@rjernst @nik9000 thanks for the feedback, I think I got all the points covered here so far

nik9000

I left a bunch of minor stuff and some stylistic stuff. I think it quite close though.

nik9000 · 2017-08-09T13:42:37Z

core/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

-        if (searchRequest.source().slice() != null && slices != 1) {
-            e = addValidationError("can't specify both slice and workers", e);
+        if (searchRequest.source().slice() != null && slices != DEFAULT_SLICES) {
+            e = addValidationError("can't set a specific single slice for this request and multiple slices", e);


Hmmm.. I wonder if "can't specify both manual and automatic slicing on the same request" would be better. I just had a read of the reindex docs this morning and I believe searchRequest.source().slice() != null is what I called "manual slicing" and slices != DEFAULT_SLICES is what I called "automatic slicing".

Yeah I think that makes sense. When I wrote that I was thinking about the child tasks having a slice builder set in automatic slicing, but this aligns more with how the api is described

nik9000 · 2017-08-09T13:43:03Z

core/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

     */
    public Self setSlices(int slices) {
-        if (slices < 1) {
-            throw new IllegalArgumentException("[slices] must be at least 1");


Might be useful to keep it but compare with 0 instead.

nik9000 · 2017-08-09T13:51:54Z

core/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

        out.writeVInt(maxRetries);
        out.writeFloat(requestsPerSecond);
-        if (out.getVersion().onOrAfter(Version.V_5_1_1)) {
+        if (out.getVersion().onOrAfter(Version.V_6_1_0)) {


I think it'd be clearer to combine the if statements like:

if (slices == AUTO_SLICES && out.getVersion().before(Version.V_6_1_0)) { throw } else { write }

nik9000 · 2017-08-09T13:56:15Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java


    /**
-     * Build the status for this task given a snapshot of the information of running slices.
+     * Returns true if this task is a child task that performs search requests. False otherwise


Will this return true for single slice requests? They aren't really "children". I'd called them "working" requests which I think makes more sense but clashes with your "Worker" name. I'm not sure what to do about it though.

It'll return true for single sliced requests since they get set with a child worker, even if slices was auto. I agree the names aren't great and there's a lot of overlap, I'll see if I can find some better ones

nik9000 · 2017-08-09T13:57:06Z

core/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java

+     * Sets this task to be a child task that performs search requests, when the request is not sliced.
+     * @param requestsPerSecond How many search requests per second this task should make
+     */
+    public void setSliceChild(float requestsPerSecond) {


I'd probably skip making this method and call the other one with null instead.

nik9000 · 2017-08-09T14:39:35Z

...reindex/src/main/java/org/elasticsearch/index/reindex/BulkByScrollParallelizationHelper.java

+        Supplier<AbstractAsyncBulkByScrollAction<Request>> taskSupplier) {
+
+        if (request.getSlices() == AbstractBulkByScrollRequest.AUTO_SLICES) {
+            client.admin().cluster().prepareSearchShards(request.getSearchRequest().indices()).execute(ActionListener.wrap(


We tend not to use prepareXXX methods inside of Elasticsearch and reserve those for testing. We tend to think of the "Builders" as part of the transport client API rather than truly part of core. That isn't a hard and fast rule but I figure it is worth following here so, one day, we can remove the builders entirely. Like, years from now. Anyway, please use the request directly.

nik9000 · 2017-08-09T14:41:55Z

...reindex/src/main/java/org/elasticsearch/index/reindex/BulkByScrollParallelizationHelper.java

+            task.setSliceChildren(slices);
+            sendSubRequests(client, action, node.getId(), task, request, listener);
+        } else {
+            Integer sliceId = request.getSearchRequest().source().slice() == null


I think this'd be easier to read as:

SliceBuilder sliceBuilder = request.getSearchRequest().source().slice(); Integer sliceId = sliceBuilder == null ? null : sliceBuilder.getId();

nik9000 · 2017-08-09T14:44:51Z

...reindex/src/main/java/org/elasticsearch/index/reindex/BulkByScrollParallelizationHelper.java

+    private static int countSlicesBasedOnShards(ClusterSearchShardsResponse response) {
+        Map<Index, Integer> countsByIndex = Arrays.stream(response.getGroups()).collect(Collectors.toMap(
+            group -> group.getShardId().getIndex(),
+            __ -> 1,


I think it'd be more normal for us to do group -> 1 without the _. We don't have the "_ means doesn't matter" norm yet.

nik9000 · 2017-08-09T14:45:27Z

...reindex/src/main/java/org/elasticsearch/index/reindex/BulkByScrollParallelizationHelper.java

+    }
+
+    private static <Request extends AbstractBulkByScrollRequest<Request>> void sendSubRequests(
+        Client client,


Same deal with the arguments.

nik9000 · 2017-08-09T14:49:37Z

...reindex/src/main/java/org/elasticsearch/index/reindex/BulkByScrollParallelizationHelper.java

+        ActionListener<BulkByScrollResponse> listener,
+        Client client,
+        DiscoveryNode node,
+        Supplier<AbstractAsyncBulkByScrollAction<Request>> taskSupplier) {


I think taskSupplier is worth javadoc. Maybe even renaming/reworking to Runnable runUnsliced.

Or something. I'm not great with names.

* Better names for the slice task strategy classes * More descriptive error messages and docs language

andyb-elastic · 2017-08-10T16:21:57Z

@nik9000 I changed Worker -> TaskState since that's mostly what it is, and Parent -> Leader and Child -> worker to make the relationship more clear. I think these names are a little better than what I had originally

For your comment about the supplier in the parallelization helper can you elaborate on why a Runnable would be better? I just went with the stricter behavior because it seemed like starting an AbstractAsyncBulkByScrollAction was the goal here

nik9000 · 2017-08-10T17:00:52Z

For your comment about the supplier in the parallelization helper can you elaborate on why a Runnable would be better? I just went with the stricter behavior because it seemed like starting an AbstractAsyncBulkByScrollAction was the goal here

Sure! Reading the supplier made me think you were getting something. And then when I realized what you were getting I thought "but doesn't he need to customize that for every request?" and then I realized why you don't, it is because you only call it at all when starting the "working" request. You skip it when you are parallelizing. So then I thought "why not just call it something that has to do with running the request?" And then I thought, "why is it a supplier when it really is about running stuff?"

Basically I got confused about what it was for so I suggested renaming it. You don't have to make it into a Runnable but it'd be nice to rename it so it is obvious that it is only for the working portion.

nik9000 · 2017-08-10T18:00:30Z

core/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

     */
    public Self setSlices(int slices) {
+        if (slices < 0) {
+            throw new IllegalArgumentException("[slices] must be at least 0");


Just in case this is ever thrown it'd be nice to have the error message more like "slices must be at least 0 but was [$value]".

andyb-elastic · 2017-08-10T21:51:52Z

I see what you mean, I think it's more clear if it's a runnable.

It looks like the last CI build failed because the node was low on resources, I'll let this finish before merging in the interest of getting a green build

In reindex APIs, when using the `slices` parameter to choose the number of slices, adds the option to specify `slices` as "auto" which will choose a reasonable number of slices. It uses the number of shards in the source index, up to a ceiling. If there is more than one source index, it uses the smallest number of shards among them. This gives users an easy way to use slicing in these APIs without having to make decisions about how to configure it, as it provides a good-enough configuration for them out of the box. This may become the default behavior for these APIs in the future.

* master: (30 commits) Rewrite range queries with open bounds to exists query (elastic#26160) Fix eclipse compilation problem (elastic#26170) Epoch millis and second formats parse float implicitly (Closes elastic#14641) (elastic#26119) fix SplitProcessor targetField test (elastic#26178) Fixed typo in README.textile (elastic#26168) Fix incorrect class name in deleteByQuery docs (elastic#26151) Move more token filters to analysis-common module reindex: automatically choose the number of slices (elastic#26030) Fix serialization of the `_all` field. (elastic#26143) percolator: Hint what clauses are important in a conjunction query based on fields Remove unused Netty-related settings (elastic#26161) Remove SimpleQueryStringIT#testPhraseQueryOnFieldWithNoPositions. Tests: reenable ShardReduceIT#testIpRange. Allow `ClusterState.Custom` to be created on initial cluster states (elastic#26144) Teach the build about betas and rcs (elastic#26066) Fix wrong header level inner hits: Unfiltered nested source should keep its full path Document how to import Lucene Snapshot libs when elasticsearch clients (elastic#26113) Use `global_ordinals_hash` execution mode when sorting by sub aggregations. (elastic#26014) Make the README use a single type in examples. (elastic#26098) ...

andyb-elastic added 18 commits July 17, 2017 18:51

wip #24547

fe71b5f

wip #24547

87a4d14

Just uses the total group size as the slice count All unit and integ tests that were modified pass Module rest tests also pass with new tests for auto setting

wip #24547

b7fd60d

Added similar unit tests for the three API types, they work so far

wip #24547

8639283

Seemingly handles one-slice size correctly Broken when one slice and auto Next going to try refactoring the task types

wip #24547

d7491e0

Changed the BulkByScroll task to use a strategy for the task behavior that used to all be built into the task Right now all the production code compiles except for a few things in xpack. Tests don't compile

wip #24547

6e49fdd

Production and test code compiles org.elasticsearch.index.reindex.AsyncBulkByScrollActionTests#testCancelWhileDelayedAfterScrollResponse and some others are broken

wip #24547

3c90d9d

Tests with rethrottling still broken

wip #24547

1814f4f

wip #24547 fix tests

dacc62a

All tests pass now for core and reindex module

wip #24547 cleanup logging and docs

0f0904d

Merge branch 'master' into feature/slices-auto

d1e6d9a

wip #24547 checkstyle fixes

679c2a0

wip #24547 consolidate not auto slicing and auto slicing unit tests

a0a30fa

add unit tests with multiple sources

wip #24547 move common slicing code into helper

fd54d64

wip #24547 update docs

132d3c0

wip #24547 simplify parallelization helper

2202988

wip #24547 fix some unit tests

3336d98

At this point everything passes

Merge branch 'master' into feature/slices-auto

cbef5c1

andyb-elastic added :Reindex API >feature v6.1.0 v7.0.0 labels Aug 2, 2017

andyb-elastic requested review from jimczi and nik9000 August 2, 2017 20:08

andyb-elastic commented Aug 2, 2017

View reviewed changes

rjernst reviewed Aug 3, 2017

View reviewed changes

nik9000 reviewed Aug 3, 2017

View reviewed changes

wip #24547 fixes for code review

25518ef

* Move back to integer type for slices * Better method names in BulkByScrollTask * BWC only to 6.1.0 and later, tests included

Merge branch 'master' into feature/slices-auto

1b680bd

nik9000 requested changes Aug 9, 2017

View reviewed changes

andyb-elastic added 3 commits August 9, 2017 23:10

wip #24547 fixes for code review

14eb56a

* Better names for the slice task strategy classes * More descriptive error messages and docs language

wip #24547 checkstyle fixes

5500bcf

wip #24547 fix broken test

eeffd39

nik9000 reviewed Aug 10, 2017

View reviewed changes

nik9000 approved these changes Aug 10, 2017

View reviewed changes

wip #24547 code review fixes

f5b5c18

Merge branch 'master' into feature/slices-auto

bd597e5

andyb-elastic added >enhancement and removed >feature review labels Aug 11, 2017

andyb-elastic merged commit 7e3cd6a into elastic:master Aug 11, 2017

This was referenced Aug 11, 2017

[DOCS] Revise slice setting guidelines #25582

Closed

backport reindex: automatically choose the number of slices #26174

Closed

jimczi mentioned this pull request Aug 22, 2017

Automatically slice reindex and friends #24547

Closed

lcawl added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

reindex: automatically choose the number of slices #26030

reindex: automatically choose the number of slices #26030

Uh oh!

Conversation

andyb-elastic commented Aug 2, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rjernst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andyb-elastic commented Aug 7, 2017

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment