Refactor InternalEngine's index/delete flow for better clarity #23711

bleskes · 2017-03-23T09:13:13Z

The InternalEngine Index/Delete methods (plus satellites like version loading from Lucene) have accumulated some cruft over the years making it hard to clearly the code flows for various use cases (primary indexing/recovery/replicas etc). This PR refactors those methods for better readability. As a follow up we intend to take certain parts of these method and extract them to another help methods to improve things even more. This will be done as a follow up.

To support the refactoring I have considerably beefed up the versioning tests.

This PR is a spin-off from #23543 , which made it clear this is needed.

…oc values

…ict in deletes

…ing alone

bleskes · 2017-03-23T09:14:18Z

@jasontedor This is the same code you already reviewed but without the seq no logic. Since you already LGTMed so I didn't ask for your review. Of course, feel free to review anyway if you want.

s1monw

I left some comments. I have to admit I am not sure it improves the readability of the engine. It rather feels like it make it more complicated with more methods without clear naming.

s1monw · 2017-03-27T07:59:13Z

core/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDAndVersionLookup.java

-
-        this.versions = versions;
-        this.termsEnum = termsEnum;
+        Terms terms = fields.terms(UidFieldMapper.NAME);


do you recall why this null check on fields was here? is there a chance that this is called on an empty reader?

That's a good question. The class was introduced that way with no explanation. I thought that it had to do with BWC where we made transitions into how we store UIDs. I wonder when we can end up with an empty reader. I'll discuss this with @jpountz

I double checked with Adrien and this is OK (and is something he wanted to do for a long time).

s1monw · 2017-03-27T07:59:45Z

core/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDAndVersionLookup.java

    /** Return null if id is not found. */
-    public DocIdAndVersion lookup(BytesRef id, Bits liveDocs, LeafReaderContext context) throws IOException {
+    public DocIdAndVersion lookupVersion(BytesRef id, Bits liveDocs, LeafReaderContext context) throws IOException {
+        assert context.reader().getCoreCacheKey().equals(readerKey);


can we get messages for the asserts?

s1monw · 2017-03-27T08:02:27Z

core/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDAndVersionLookup.java

    /** Reused for iteration (when the term exists) */
    private PostingsEnum docsEnum;

+    private final Object readerKey;


this reader key is only used for asserts. can we make sure it's null if asserts are not enabled?

s1monw · 2017-03-27T08:04:44Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+        assert incrementVersionLookup();
+        VersionValue versionValue = versionMap.getUnderLock(op.uid());
+        if (versionValue == null) {
+            assert incrementIndexVersionLookup();


this code is supposed to run only under assertions. I'll add comments.

s1monw · 2017-03-27T08:10:19Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

-                        op.type(),
-                        op.id(),
-                        op.versionType().explainConflictForWrites(currentVersion, expectedVersion, deleted));
+    enum LuceneOpStatus {


the enum values make no sense in the context of the name. I wonder if we should call it OperationAge or something like this.

The name comes from the following line of thought - maybe it helps explain it (I'm fine with any other names) - operations on replicas (where this is relevant) are always stored in the translog. They are added to lucene only if the document version in lucene is older than the incoming one. The enum is supposed to reflect the status of the document in lucene. Does this help? any suggestion as to how to name it to reflect it only relates the lucene index?

s1monw · 2017-03-27T08:17:01Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

-                        : Optional.empty();
-                } catch (IllegalArgumentException | VersionConflictEngineException ex) {
-                    resultOnVersionConflict = Optional.of(new IndexResult(ex, currentVersion, index.seqNo()));
+                } else if (canOptimizeAddDocument && mayHaveBeenIndexedBefore(index) == false) {


can we fold this into an else { and start again with if in there with a comment that we are now on a replica? it would read cleaner

s1monw · 2017-03-27T08:21:41Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+                    assert index.versionType().versionTypeForReplicationAndRecovery() == index.versionType() :
+                        "resolving out of order delivery based on versioning but version type isn't fit for it. got ["
+                            + index.versionType() + "]";
+                    final LuceneOpStatus luceneOpStatus = checkLuceneOpStatusBasedOnVersions(index);


the semantics of this method are weird. I pass in an operation and it returns OLDER if the given operation has a higher version? I would have expected the opposite. This semantics also make the rest of this method hard to read since it has to negate the return values in 2/3 of the cases. I think you should flip it.

How about renaming the method and enum to lucene doc status? I'll change that and see if it makes more sense for you.

s1monw · 2017-03-27T08:22:43Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+    private IndexResult indexIntoLucene(Index index, long seqNo, long newVersion, boolean markDocAsCreated,
+                                        boolean useLuceneUpdateDocument)
+        throws IOException {
+        assertSequenceNumberBeforeIndexing(index.origin(), seqNo);


only execute this if assertions are enabled?!

s1monw · 2017-03-27T08:23:25Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

                 * non-document failure
                 */
-                return new IndexResult(ex, currentVersion, index.seqNo());
+                return new IndexResult(ex, Versions.MATCH_ANY, index.seqNo());


oh why did you change this to MATCH_ANY?... worth a comment?

I don't have that information anymore here, and in the case of failure, we don't use it anyway. I'll comment.

s1monw · 2017-03-27T08:23:52Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java


-    private boolean isForceUpdateDocument(Index index) {
-        boolean forceUpdateDocument;
+    private boolean mayHaveBeenIndexedBefore(Index index) {


maybe a javadoc for this method?

jasontedor · 2017-03-27T21:08:49Z

This is the same code you already reviewed but without the seq no logic. Since you already LGTMed so I didn't ask for your review.

I didn't review it, I saw it while it was a work in progress. I have not LGTMed the previous PR, I would like to review this PR.

bleskes · 2017-03-28T09:45:37Z

I didn't review it, I saw it while it was a work in progress. I have not LGTMed the previous PR, I would like to review this PR.

@jasontedor I'm very sorry and have no idea what made me think you did. As said before, your review is more than welcome (and it seems needed as it didn't do it before).

bleskes · 2017-03-29T12:22:11Z

@s1monw I pushed a commit that address most of your feedback.

re

I have to admit I am not sure it improves the readability of the engine. It rather feels like it make it more complicated with more methods without clear naming.

Fair enough. This is subjective. I think things will be clearer with the follow up change we intended to make (move some the long code into methods that return a struct). Maybe I should do this now within this change so we can see how it looks? @jasontedor indicated he would also prefer to see the end result rather than review this intermediate step (the diff is too big anyway).

s1monw · 2017-03-30T09:29:21Z

@jasontedor indicated he would also prefer to see the end result rather than review this intermediate step (the diff is too big anyway).

++

bleskes · 2017-03-30T15:50:18Z

@jasontedor @s1monw I pushed ahead and added helper methods. I think it looks much better but this is subjective. I will probably do another run and polish things more but I think it's ready for you. LMKWYT

s1monw

mostly nit picks looks great

s1monw · 2017-03-31T07:36:50Z

core/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDAndVersionLookup.java

+            throw new IllegalArgumentException("reader misses the [" + VersionFieldMapper.NAME +
+                "] field");
+        }
+        boolean assertionsOn = false;


maybe:

Object readerKey = null; assert (readerKey = reader.getCoreCacheKey()) != null; this.readerKey = readerKey;

much better. Thanks.

s1monw · 2017-03-31T07:47:53Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

        }
    }

+    private static final class IndexingPlan {


I am not sure about the name, maybe IndexingStrategy?

s1monw · 2017-03-31T07:49:20Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            final LuceneDocStatus luceneOpStatus = checkLuceneDocStatusBasedOnVersions(index);
+            if (luceneOpStatus == LuceneDocStatus.NEWER_OR_EQUAL) {
+                plan = IndexingPlan.processButSkipLucene(
+                    luceneOpStatus == LuceneDocStatus.NOT_FOUND, index.seqNo(), index.version());


luceneOpStatus == LuceneDocStatus.NOT_FOUND is false here?

yes! IntelliJ agrees too.

s1monw · 2017-03-31T07:50:25Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            // unlike the primary, replicas don't really care to about creation status of documents
+            // this allows to ignore the case where a document was found in the live version maps in
+            // a delete state and return false for the created flag in favor of code simplicity
+            final LuceneDocStatus luceneOpStatus = checkLuceneDocStatusBasedOnVersions(index);


so if this returns LuceneDocStatus.NEWER_OR_EQUAL then the existing doc is newer or equal and not the given doc? that is very confusing, I mentioned this before I think. I'd never expect that. The opposite is intuitive?

Ok. I flipped it around.

bleskes · 2017-03-31T14:31:17Z

@s1monw I addressed your latest feedback (thx). Can you take another look please?

s1monw

LGTM thanks for the iterations

jasontedor

LGTM.

bleskes · 2017-04-05T12:43:30Z

Thank you @s1monw @jasontedor. This was a tough one.

bleskes · 2017-04-05T12:43:46Z

PS. I will let this bake for a day or two before back porting.

The refactoring in elastic#23711 hardcoded version logic for replica to assume monotonic versions. Sadly that's wrong for `FORCE` and `VERSION_GTE`. Instead we should use the methods in VersionType to detect conflicts. Note - once replicas use sequence numbers for out of order delivery, this logic goes away.

The refactoring in #23711 hardcoded version logic for replica to assume monotonic versions. Sadly that's wrong for `FORCE` and `VERSION_GTE`. Instead we should use the methods in VersionType to detect conflicts. Note - once replicas use sequence numbers for out of order delivery, this logic goes away.

The InternalEngine Index/Delete methods (plus satellites like version loading from Lucene) have accumulated some cruft over the years making it hard to clearly the code flows for various use cases (primary indexing/recovery/replicas etc). This PR refactors those methods for better readability. The methods are broken up into smaller sub methods, albeit at the price of less code I reused. To support the refactoring I have considerably beefed up the versioning tests.

The refactoring in #23711 hardcoded version logic for replica to assume monotonic versions. Sadly that's wrong for `FORCE` and `VERSION_GTE`. Instead we should use the methods in VersionType to detect conflicts.

bleskes added 19 commits March 7, 2017 15:20

initial version

f2211c1

fix testAppendWhileRecovering

baa7e51

lingering reference to SortedNumericDocValuesField

14b4b4b

currectly load sequence numbers if the index doesn't have the right d…

2b835f7

…oc values

Merge remote-tracking branch 'upstream/master' into seq_no_as_version

4a2f350

share more version look up code

6b9e3ca

remove no commit

cde619c

minor tweaks

d81bc8c

internal versioning test for primary. Fix found flag on version confl…

a27352d

…ict in deletes

fix delete error handling

7b58f1d

add primary external versioning test

b626e95

added testVersioningPromotedReplica

eca7872

add concurrency tests

0e4a2f6

clean duplicate tests

ca138f1

added testConcurrentGetAndSetOnPrimary

c0343ea

Merge remote-tracking branch 'upstream/master' into seq_no_as_version

a63c2c2

rollback seq no based out of order resolving and leave it at refactor…

a4f2335

…ing alone

fix line length

3328c94

Merge branch 'master' into engine_clearer_flow

baa5032

bleskes added :Engine >non-issue v5.4.0 v6.0.0-alpha1 labels Mar 23, 2017

bleskes requested a review from s1monw March 23, 2017 09:13

s1monw suggested changes Mar 27, 2017

View reviewed changes

jasontedor self-requested a review March 27, 2017 21:09

Merge remote-tracking branch 'upstream/master' into engine_clearer_flow

b640c4f

some more minor cleanup

7cc73a1

bleskes added 4 commits March 30, 2017 14:09

Merge remote-tracking branch 'upstream/master' into engine_clearer_flow

9eba7a0

IndexingPlan

7c0a050

DeletionPlan

99866b9

lint

3ba080b

bleskes requested a review from s1monw March 30, 2017 15:50

s1monw suggested changes Mar 31, 2017

View reviewed changes

bleskes added 3 commits March 31, 2017 13:57

Merge remote-tracking branch 'upstream/master' into engine_clearer_flow

f88cdb6

feedback

8cdac43

flip Lucene/Op comparision

ffc9a85

bleskes requested a review from s1monw March 31, 2017 14:31

s1monw approved these changes Apr 3, 2017

View reviewed changes

jasontedor approved these changes Apr 5, 2017

View reviewed changes

bleskes merged commit 75b4f40 into elastic:master Apr 5, 2017

bleskes deleted the engine_clearer_flow branch April 5, 2017 12:43

bleskes mentioned this pull request Apr 9, 2017

Engine: version logic on replicas should not be hard coded #23998

Merged

Refactor InternalEngine's index/delete flow for better clarity #23711

Refactor InternalEngine's index/delete flow for better clarity #23711

Uh oh!

Conversation

bleskes commented Mar 23, 2017

Uh oh!

bleskes commented Mar 23, 2017

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor commented Mar 27, 2017

Uh oh!

bleskes commented Mar 28, 2017

Uh oh!

bleskes commented Mar 29, 2017

Uh oh!

s1monw commented Mar 30, 2017

Uh oh!

bleskes commented Mar 30, 2017

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!