Skip to content

Conversation

@bleskes
Copy link
Contributor

@bleskes bleskes commented Mar 11, 2017

Internal indexing requests in Elasticsearch may be processed out of order and repeatedly. This is important during recovery and due to concurrency in replicating requests between primary and replicas. As such, a replica/recovering shard needs to be able to identify that an incoming request contains information that is old and thus need not be processed. The current logic is based on external version. This is sadly not sufficient. This PR moves the logic to rely on sequences numbers and primary terms which give the semantics we need.

The change also refactors InternalEngine.index and InternalEngine.delete. The current implementations tries to share as much code as possible between the different execution paths (primary vs replica, versioned vs unversioned etc.) but the end result is hard to read and is complex to reason about. The PR proposes a slightly more verbose version but where the code flows are clearer (IMO) and rely on immutable variables which makes easier to reason about guarantees.

The PR also beefed up all the versioning tests in InternalEngineTests as the current tests are not sufficient (and didn't expose some subtle but minor existing bugs).

Relates to #10708

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass and added a higher level recommendation before I go and review it deeper

Fields fields = reader.fields();
if (fields != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty readers are not possible anymore?

/** Utility class to resolve the Lucene doc ID, version, seqNo and primaryTerms for a given uid. */
public class VersionsAndSeqNoResolver {

static final ConcurrentMap<Object, CloseableThreadLocal<PerThreadIDAndVersionSeqNoLookup>> lookupStates =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that this class has multiple purposes I think we should rethink the way it works. Today we have to resolve the document multiple times, first to lookup the version then to lookup the sequence ID. I think it's dangerous to do it this way since we are:

  1. we might use a different searcher / reader (it should work but I don't like it from an API perspective
  2. we duplicate code where it's not necessary
  3. we lookup the same ID in 2*N Segments which is costly

now that said I wonder if we can refactor stuff to acquire a searcher globally to gain a per-thread lookup state that is closeable (we can use try / with in delete and index and we use that state to find the doc ID and it's subreader (this might be simply PerThreadIDAndVersionSeqNoLookup) That way we hold on to the reader for the time being but have fast access and clear access patterns on the different values (version, seq ID). It will likely change the engine a bit but I think the semantics will be clear then.

@bleskes
Copy link
Contributor Author

bleskes commented Apr 12, 2017

superseded by #24060

@bleskes bleskes closed this Apr 12, 2017
@bleskes bleskes deleted the seq_no_as_version branch April 12, 2017 06:49
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine labels Feb 13, 2018
@clintongormley clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. :Sequence IDs labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants