Introduce ReplicaEngine model #29

DaveCTurner · 2018-03-22T14:35:22Z

This models how indexing and deletion operations are handled on the replica,
including the optimisations for append-only operations and the interaction with
Lucene commits and the version map.

NB this version is based on master and exhibits a bug. I'm opening this for comments on approach etc and will followup with changes as per PRs.

This models how indexing and deletion operations are handled on the replica, including the optimisations for append-only operations and the interaction with Lucene commits and the version map.

Tombstones in the version map are not cleaned up on a refresh, unlike update entries: once they have been refreshed they continue to exist until the GC time elapses and they are cleaned up. This commit includes this in the model.

Refreshing Lucene and updating the version map are not performed atomically, so this models that the updater can see an intermediate state.

It was processing the buffer in reverse order.

The lucene doc is NULL if deleted

bleskes

Looks great. I left some minor comments/questions.

bleskes · 2018-03-23T09:14:41Z

ReplicaEngine/tla/ReplicaEngine.tla

+(* The set of individual requests that can occur on the document *)
+Request(request_count)
+    (* ADD: An optimised append-only write can only occur as the first operation
+    on the document ID in seqno order. Any subsequent attempts to ADD the


nit - we a subsequent op can be a delete or update.

bleskes · 2018-03-23T09:19:09Z

ReplicaEngine/tla/ReplicaEngine.tla

+          , content \in DocContent
+          }
+    (* UPDATE: A write that does not involve an internally-generated document ID.
+       RETRY_ADD: A retry of a write that does involve an internally-generated


left over comment

bleskes · 2018-03-23T09:19:35Z

ReplicaEngine/tla/ReplicaEngine.tla

+          }
+    (* DELETE *)
+    \cup  { [type |-> DELETE, seqno |-> seqno]
+          : seqno \in 1..request_count


can this appear in seq# 1 ?

Probably not. But I think I've found a modelling bug involving 3 DELETEs and then an UPDATE (2 DELETEs was not enough) so requiring a preceding UPDATE would make a much bigger model.

Ok, @ywelsch gives a scenario where an UPDATE and a DELETE are replicated by the primary which then crashes, and the new primary only received the DELETE so replaces the UPDATE with a no-op.

bleskes · 2018-03-23T11:46:28Z

ReplicaEngine/tla/ReplicaEngine.tla

+    if req.type = DELETE
+    then
+        (* planDeletionAsNonPrimary *)
+


we need a versionMap_needsSafeAccess = TRUE here. Not that important from a correctness as it can happen at any time anyway, but it will be consistent with code and other branches here.

I left that out because it's not in the code.

see https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java#L1040

Ohhhh. Outside the lock, so I missed it, d'oh.

Fixed in 0ceddd1

This reverts commit 019272c.

ywelsch

I've left some comments, but overall looks good.

ywelsch · 2018-03-23T15:04:21Z

ReplicaEngine/tla/ReplicaEngine.tla

+                           /\ expected_doc /= NULL => lucene.document.content = expected_doc
+
+=============================================================================
+\* Modification History


Can you remove this line and the ones below.
The TLA toolbox will then not automatically add this poor man's version control anymore to the file.

ywelsch · 2018-03-23T15:41:26Z

ReplicaEngine/tla/ReplicaEngine.tla

+          , content \in DocContent
+          }
+    (* UPDATE: A write that does not involve an internally-generated document ID. *)
+    \cup  { [type |-> UPDATE, seqno |-> seqno, content |-> content]


maybe simpler:

\cup [type : {RETRY_ADD}, seqno : 1..request_count, content : DocContent, autoIdTimeStamp : {DocAutoIdTimestamp}]

ywelsch · 2018-03-23T15:45:41Z

ReplicaEngine/tla/ReplicaEngine.tla

+variables
+    lucene =
+        [ document |-> NULL
+        , buffer   |-> <<>>


I would just have two variables here, one for the doc, and one for the buffer.

ywelsch · 2018-03-23T16:47:45Z

ReplicaEngine/tla/ReplicaEngine.tla

+        LuceneUpdateVersionMap: (* TODO needs an invariant saying that the VM is >= Lucene and also contains the buffered ops *)
+
+        versionMap_isUnsafe := FALSE;
+        versionMap_needsSafeAccess := FALSE;


what about previousMapsNeededSafeAccess? I wonder if this addresses the oddity we have seen after you added the fixes to the model.

It makes it way less likely, but I don't think it eliminates it. You just need more concurrent activity and more refreshes in order to clear it. I think it's worth modelling more correctly anyway.

I wonder if this addresses the oddity we have seen after you added the fixes to the model.

The oddity that the model found is in fact another desync bug that's reproducible in Elasticsearch itself.

ywelsch · 2018-03-23T16:58:53Z

ReplicaEngine/tla/ReplicaEngine.tla

+            then
+                (* Perform a Lucene refresh *)
+                AwaitRefreshOnDelete: \* Label here to allow for other concurrent activity
+                await lucene.buffer = <<>>;


the code has an enforceSafeAccess here.

Good catch. Added.

ywelsch · 2018-03-23T17:09:29Z

ReplicaEngine/tla/ReplicaEngine.tla

+                then
+                    (* Perform a Lucene refresh *)
+                    AwaitRefreshOnIndex: \* Label here to allow for other concurrent activity
+                    await lucene.buffer = <<>>;


same comment as above for delete, do we need to do enforceSafeAccess here?

Yep, good catch, added.

DaveCTurner · 2018-03-26T07:25:12Z

@bleskes said:

In terms of invariants in the model, I think we should add the following:

Assert that at any given moment the version map is a good representation of the Lucene, when combine with the indexing buffer. This is not so straight forward - we need to make sure that if you ask the version map when it thinks it's safe, it's answer is always consistent with what Lucene would say if you hit refresh.

The enforceSafeAccess flag is an optimization to make sure we don't hit refresh storms. The correctness should never depend on it. I wonder if we should adapt the model to flip it on will (rather than connect it to two refresh cycles).

Until this model is in a state where it passes the model checker I'd prefer not to add artificial invariants or transitions. Both of those ideas would have reduced the number of steps it took to find a failure, but would have made it much harder to answer the question of whether it was a genuine bug (i.e. actually could lead to a desync) or not.

Models the fix implemented in elastic/elasticsearch#28787

Models the fix implemented in elastic/elasticsearch#28790

As currently modelled, this seems to find a spurious failure, but needs quite a bit more work to model the inner structure of the version map in order to check that.

DaveCTurner · 2018-03-26T08:17:47Z

I pushed fixes that model elastic/elasticsearch#28787 and elastic/elasticsearch#28790 as well as @bleskes proposal for the third issue.

I also dropped the label from the middle of the refresh loop as it was yielding spurious failures because of not modelling both the current and old maps.

This model now passes the model checker - takes about a minute to go through ~6M states on my laptop.

DaveCTurner · 2018-03-27T12:04:55Z

Incidentally, trying to reproduce the third desync bug in the test harness fails because when assertions are enabled we always call removeTombstonesUnderLock.

ywelsch

I also dropped the label from the middle of the refresh loop as it was yielding spurious failures because of not modelling both the current and old maps.

Can you elaborate a bit more on this?

ywelsch · 2018-03-27T12:48:19Z

ReplicaEngine/tla/ReplicaEngine.tla

+process ConsumerProcess = "Consumer"
+variables
+    maxUnsafeAutoIdTimestamp \in {0, DocAutoIdTimestamp - 1, DocAutoIdTimestamp, DocAutoIdTimestamp + 1},
+    maxSeqNoOfNonAppendOnlyOperations \in {0, 2, request_count + 1},


why only 0, 2, request_count + 1? How about starting with 0, and then (similarly as you have done for modeling other concurrent requests for different documents) model a separate process that increases this.

I was battling state space explosion. However, this doesn't seem to be the source of it, so I pushed 36454d7.

ywelsch · 2018-03-27T12:49:36Z

ReplicaEngine/tla/ReplicaEngine.tla

+            useLuceneUpdateDocument  := FALSE;
+            indexIntoLucene          := TRUE;
+        else
+            if FALSE = (req.type \in {ADD, RETRY_ADD})


I think that req.type \notin {ADD, RETRY_ADD} would read nicer.

Me too. I pushed a643c14

(Oops, bit of an oversight that)

DaveCTurner · 2018-03-27T13:48:26Z

I missed something: we're not currently dealing with duplicated messages. I pushed 8cd6ce9. It's now much slower.

Introduce ReplicaEngine model

28b2442

This models how indexing and deletion operations are handled on the replica, including the optimisations for append-only operations and the interaction with Lucene commits and the version map.

DaveCTurner requested review from bleskes and ywelsch March 22, 2018 14:35

DaveCTurner added 4 commits March 23, 2018 08:35

Fix modelling of deletions

5f21a25

Tombstones in the version map are not cleaned up on a refresh, unlike update entries: once they have been refreshed they continue to exist until the GC time elapses and they are cleaned up. This commit includes this in the model.

Introduce label in LuceneLoop

6fac39b

Refreshing Lucene and updating the version map are not performed atomically, so this models that the updater can see an intermediate state.

Fix ApplyBufferedOperations

857d9b3

It was processing the buffer in reverse order.

Fix invariant

cb9da3f

The lucene doc is NULL if deleted

bleskes approved these changes Mar 23, 2018

View reviewed changes

DaveCTurner added 5 commits March 23, 2018 13:16

Boaz fixes

567dc45

Set versionMap_needsSafeAccess on deletion

0ceddd1

Do not create a DELETE as the first operation on the doc

019272c

Revert "Do not create a DELETE as the first operation on the doc"

0b46bb7

This reverts commit 019272c.

Rename model to 'model' and remove generated files

c843c79

DaveCTurner force-pushed the 2018-03-22-version-map-revisited branch from cfb55bc to c843c79 Compare March 23, 2018 15:35

ywelsch reviewed Mar 23, 2018

View reviewed changes

DaveCTurner added 6 commits March 23, 2018 18:18

Remove bogus expression

add11da

Lucene can refresh even if the buffer is empty (because other docs)

43478b9

Need safe access after a forced refresh

88ef32b

No need for modification history

a1c7b7f

Separate variables for Lucene

2306b48

Shorter description of permitted requests

0e31664

DaveCTurner added 4 commits March 26, 2018 08:57

Track maxSeqNoOfNonAppendOnlyOperations

f2fcecb

Models the fix implemented in elastic/elasticsearch#28787

Preserve GC deletes according to local checkpoint

33cbeda

Models the fix implemented in elastic/elasticsearch#28790

Add proposed fix for desync bug elastic#3

893a0ed

Remove label within Lucene process

bd97a0a

As currently modelled, this seems to find a spurious failure, but needs quite a bit more work to model the inner structure of the version map in order to check that.

ywelsch reviewed Mar 27, 2018

View reviewed changes

TIL \notIn

a643c14

DaveCTurner added 3 commits March 27, 2018 14:11

Model nondeterministically increasing maxSeqNoOfNonAppendOnlyOperations

36454d7

Flip needsSafeAccess both ways

d1426a6

Account for duplicated messages

8cd6ce9

(Oops, bit of an oversight that)

DaveCTurner added 3 commits March 27, 2018 18:31

Limit number of duplicated messages processed

74c017e

Need to check the local checkpoint _after_ getVersionFromMap too

8982236

Added ref to PR #29276

8ac6716

DaveCTurner mentioned this pull request Mar 28, 2018

Add model for interaction between Lucene & Version Map on replicas #28

Closed

DaveCTurner merged commit 4ac5682 into elastic:master Mar 28, 2018

dnhatn mentioned this pull request Jul 13, 2018

Possible to index duplicate documents with same id and routing id. elastic/elasticsearch#31976

Closed

Introduce ReplicaEngine model #29

Introduce ReplicaEngine model #29

Uh oh!

Conversation

DaveCTurner commented Mar 22, 2018

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Mar 26, 2018

Uh oh!

DaveCTurner commented Mar 26, 2018

Uh oh!

DaveCTurner commented Mar 27, 2018

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Mar 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone