-
Notifications
You must be signed in to change notification settings - Fork 25
Introduce ReplicaEngine model #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ReplicaEngine model #29
Conversation
This models how indexing and deletion operations are handled on the replica, including the optimisations for append-only operations and the interaction with Lucene commits and the version map.
Tombstones in the version map are not cleaned up on a refresh, unlike update entries: once they have been refreshed they continue to exist until the GC time elapses and they are cleaned up. This commit includes this in the model.
Refreshing Lucene and updating the version map are not performed atomically, so this models that the updater can see an intermediate state.
It was processing the buffer in reverse order.
The lucene doc is NULL if deleted
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I left some minor comments/questions.
| (* The set of individual requests that can occur on the document *) | ||
| Request(request_count) | ||
| (* ADD: An optimised append-only write can only occur as the first operation | ||
| on the document ID in seqno order. Any subsequent attempts to ADD the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - we a subsequent op can be a delete or update.
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| , content \in DocContent | ||
| } | ||
| (* UPDATE: A write that does not involve an internally-generated document ID. | ||
| RETRY_ADD: A retry of a write that does involve an internally-generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over comment
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| } | ||
| (* DELETE *) | ||
| \cup { [type |-> DELETE, seqno |-> seqno] | ||
| : seqno \in 1..request_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this appear in seq# 1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. But I think I've found a modelling bug involving 3 DELETEs and then an UPDATE (2 DELETEs was not enough) so requiring a preceding UPDATE would make a much bigger model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, @ywelsch gives a scenario where an UPDATE and a DELETE are replicated by the primary which then crashes, and the new primary only received the DELETE so replaces the UPDATE with a no-op.
| if req.type = DELETE | ||
| then | ||
| (* planDeletionAsNonPrimary *) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a versionMap_needsSafeAccess = TRUE here. Not that important from a correctness as it can happen at any time anyway, but it will be consistent with code and other branches here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left that out because it's not in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhh. Outside the lock, so I missed it, d'oh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0ceddd1
cfb55bc to
c843c79
Compare
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments, but overall looks good.
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| /\ expected_doc /= NULL => lucene.document.content = expected_doc | ||
|
|
||
| ============================================================================= | ||
| \* Modification History |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this line and the ones below.
The TLA toolbox will then not automatically add this poor man's version control anymore to the file.
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| , content \in DocContent | ||
| } | ||
| (* UPDATE: A write that does not involve an internally-generated document ID. *) | ||
| \cup { [type |-> UPDATE, seqno |-> seqno, content |-> content] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe simpler:
\cup [type : {RETRY_ADD}, seqno : 1..request_count, content : DocContent, autoIdTimeStamp : {DocAutoIdTimestamp}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| variables | ||
| lucene = | ||
| [ document |-> NULL | ||
| , buffer |-> <<>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just have two variables here, one for the doc, and one for the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
| LuceneUpdateVersionMap: (* TODO needs an invariant saying that the VM is >= Lucene and also contains the buffered ops *) | ||
|
|
||
| versionMap_isUnsafe := FALSE; | ||
| versionMap_needsSafeAccess := FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about previousMapsNeededSafeAccess? I wonder if this addresses the oddity we have seen after you added the fixes to the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes it way less likely, but I don't think it eliminates it. You just need more concurrent activity and more refreshes in order to clear it. I think it's worth modelling more correctly anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this addresses the oddity we have seen after you added the fixes to the model.
The oddity that the model found is in fact another desync bug that's reproducible in Elasticsearch itself.
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| then | ||
| (* Perform a Lucene refresh *) | ||
| AwaitRefreshOnDelete: \* Label here to allow for other concurrent activity | ||
| await lucene.buffer = <<>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code has an enforceSafeAccess here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Added.
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| then | ||
| (* Perform a Lucene refresh *) | ||
| AwaitRefreshOnIndex: \* Label here to allow for other concurrent activity | ||
| await lucene.buffer = <<>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above for delete, do we need to do enforceSafeAccess here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch, added.
|
@bleskes said:
Until this model is in a state where it passes the model checker I'd prefer not to add artificial invariants or transitions. Both of those ideas would have reduced the number of steps it took to find a failure, but would have made it much harder to answer the question of whether it was a genuine bug (i.e. actually could lead to a desync) or not. |
Models the fix implemented in elastic/elasticsearch#28787
Models the fix implemented in elastic/elasticsearch#28790
As currently modelled, this seems to find a spurious failure, but needs quite a bit more work to model the inner structure of the version map in order to check that.
|
I pushed fixes that model elastic/elasticsearch#28787 and elastic/elasticsearch#28790 as well as @bleskes proposal for the third issue. I also dropped the label from the middle of the refresh loop as it was yielding spurious failures because of not modelling both the current and old maps. This model now passes the model checker - takes about a minute to go through ~6M states on my laptop. |
|
Incidentally, trying to reproduce the third desync bug in the test harness fails because when assertions are enabled we always call |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also dropped the label from the middle of the refresh loop as it was yielding spurious failures because of not modelling both the current and old maps.
Can you elaborate a bit more on this?
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| process ConsumerProcess = "Consumer" | ||
| variables | ||
| maxUnsafeAutoIdTimestamp \in {0, DocAutoIdTimestamp - 1, DocAutoIdTimestamp, DocAutoIdTimestamp + 1}, | ||
| maxSeqNoOfNonAppendOnlyOperations \in {0, 2, request_count + 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only 0, 2, request_count + 1? How about starting with 0, and then (similarly as you have done for modeling other concurrent requests for different documents) model a separate process that increases this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was battling state space explosion. However, this doesn't seem to be the source of it, so I pushed 36454d7.
ReplicaEngine/tla/ReplicaEngine.tla
Outdated
| useLuceneUpdateDocument := FALSE; | ||
| indexIntoLucene := TRUE; | ||
| else | ||
| if FALSE = (req.type \in {ADD, RETRY_ADD}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that req.type \notin {ADD, RETRY_ADD} would read nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too. I pushed a643c14
|
I missed something: we're not currently dealing with duplicated messages. I pushed 8cd6ce9. It's now much slower. |
This models how indexing and deletion operations are handled on the replica,
including the optimisations for append-only operations and the interaction with
Lucene commits and the version map.
NB this version is based on
masterand exhibits a bug. I'm opening this for comments on approach etc and will followup with changes as per PRs.