Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

This models how indexing and deletion operations are handled on the replica,
including the optimisations for append-only operations and the interaction with
Lucene commits and the version map.

NB this version is based on master and exhibits a bug. I'm opening this for comments on approach etc and will followup with changes as per PRs.

This models how indexing and deletion operations are handled on the replica,
including the optimisations for append-only operations and the interaction with
Lucene commits and the version map.
@DaveCTurner DaveCTurner requested review from bleskes and ywelsch March 22, 2018 14:35
Tombstones in the version map are not cleaned up on a refresh, unlike update
entries: once they have been refreshed they continue to exist until the GC time
elapses and they are cleaned up. This commit includes this in the model.
Refreshing Lucene and updating the version map are not performed atomically, so
this models that the updater can see an intermediate state.
It was processing the buffer in reverse order.
The lucene doc is NULL if deleted
Copy link

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I left some minor comments/questions.

(* The set of individual requests that can occur on the document *)
Request(request_count)
(* ADD: An optimised append-only write can only occur as the first operation
on the document ID in seqno order. Any subsequent attempts to ADD the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - we a subsequent op can be a delete or update.

, content \in DocContent
}
(* UPDATE: A write that does not involve an internally-generated document ID.
RETRY_ADD: A retry of a write that does involve an internally-generated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over comment

}
(* DELETE *)
\cup { [type |-> DELETE, seqno |-> seqno]
: seqno \in 1..request_count
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this appear in seq# 1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. But I think I've found a modelling bug involving 3 DELETEs and then an UPDATE (2 DELETEs was not enough) so requiring a preceding UPDATE would make a much bigger model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, @ywelsch gives a scenario where an UPDATE and a DELETE are replicated by the primary which then crashes, and the new primary only received the DELETE so replaces the UPDATE with a no-op.

if req.type = DELETE
then
(* planDeletionAsNonPrimary *)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a versionMap_needsSafeAccess = TRUE here. Not that important from a correctness as it can happen at any time anyway, but it will be consistent with code and other branches here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left that out because it's not in the code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh. Outside the lock, so I missed it, d'oh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0ceddd1

@DaveCTurner DaveCTurner force-pushed the 2018-03-22-version-map-revisited branch from cfb55bc to c843c79 Compare March 23, 2018 15:35
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments, but overall looks good.

/\ expected_doc /= NULL => lucene.document.content = expected_doc

=============================================================================
\* Modification History
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this line and the ones below.
The TLA toolbox will then not automatically add this poor man's version control anymore to the file.

, content \in DocContent
}
(* UPDATE: A write that does not involve an internally-generated document ID. *)
\cup { [type |-> UPDATE, seqno |-> seqno, content |-> content]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe simpler:

\cup  [type : {RETRY_ADD}, seqno : 1..request_count, content : DocContent, autoIdTimeStamp : {DocAutoIdTimestamp}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

variables
lucene =
[ document |-> NULL
, buffer |-> <<>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just have two variables here, one for the doc, and one for the buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

LuceneUpdateVersionMap: (* TODO needs an invariant saying that the VM is >= Lucene and also contains the buffered ops *)

versionMap_isUnsafe := FALSE;
versionMap_needsSafeAccess := FALSE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about previousMapsNeededSafeAccess? I wonder if this addresses the oddity we have seen after you added the fixes to the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes it way less likely, but I don't think it eliminates it. You just need more concurrent activity and more refreshes in order to clear it. I think it's worth modelling more correctly anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this addresses the oddity we have seen after you added the fixes to the model.

The oddity that the model found is in fact another desync bug that's reproducible in Elasticsearch itself.

then
(* Perform a Lucene refresh *)
AwaitRefreshOnDelete: \* Label here to allow for other concurrent activity
await lucene.buffer = <<>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code has an enforceSafeAccess here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added.

then
(* Perform a Lucene refresh *)
AwaitRefreshOnIndex: \* Label here to allow for other concurrent activity
await lucene.buffer = <<>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above for delete, do we need to do enforceSafeAccess here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch, added.

@DaveCTurner
Copy link
Contributor Author

@bleskes said:

In terms of invariants in the model, I think we should add the following:

  1. Assert that at any given moment the version map is a good representation of the Lucene, when combine with the indexing buffer. This is not so straight forward - we need to make sure that if you ask the version map when it thinks it's safe, it's answer is always consistent with what Lucene would say if you hit refresh.

  2. The enforceSafeAccess flag is an optimization to make sure we don't hit refresh storms. The correctness should never depend on it. I wonder if we should adapt the model to flip it on will (rather than connect it to two refresh cycles).

Until this model is in a state where it passes the model checker I'd prefer not to add artificial invariants or transitions. Both of those ideas would have reduced the number of steps it took to find a failure, but would have made it much harder to answer the question of whether it was a genuine bug (i.e. actually could lead to a desync) or not.

As currently modelled, this seems to find a spurious failure, but needs quite a
bit more work to model the inner structure of the version map in order to check
that.
@DaveCTurner
Copy link
Contributor Author

I pushed fixes that model elastic/elasticsearch#28787 and elastic/elasticsearch#28790 as well as @bleskes proposal for the third issue.

I also dropped the label from the middle of the refresh loop as it was yielding spurious failures because of not modelling both the current and old maps.

This model now passes the model checker - takes about a minute to go through ~6M states on my laptop.

@DaveCTurner
Copy link
Contributor Author

Incidentally, trying to reproduce the third desync bug in the test harness fails because when assertions are enabled we always call removeTombstonesUnderLock.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also dropped the label from the middle of the refresh loop as it was yielding spurious failures because of not modelling both the current and old maps.

Can you elaborate a bit more on this?

process ConsumerProcess = "Consumer"
variables
maxUnsafeAutoIdTimestamp \in {0, DocAutoIdTimestamp - 1, DocAutoIdTimestamp, DocAutoIdTimestamp + 1},
maxSeqNoOfNonAppendOnlyOperations \in {0, 2, request_count + 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only 0, 2, request_count + 1? How about starting with 0, and then (similarly as you have done for modeling other concurrent requests for different documents) model a separate process that increases this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was battling state space explosion. However, this doesn't seem to be the source of it, so I pushed 36454d7.

useLuceneUpdateDocument := FALSE;
indexIntoLucene := TRUE;
else
if FALSE = (req.type \in {ADD, RETRY_ADD})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that req.type \notin {ADD, RETRY_ADD} would read nicer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. I pushed a643c14

@DaveCTurner
Copy link
Contributor Author

I missed something: we're not currently dealing with duplicated messages. I pushed 8cd6ce9. It's now much slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants