Engine: store maxUnsafeAutoIdTimestamp in commit #24149

bleskes · 2017-04-18T07:43:30Z

The maxUnsafeAutoIdTimestamp timestamp is a safety marker guaranteeing that no retried-indexing operation with a higher auto gen id timestamp was process by the engine. This allows us to safely process documents without checking if they were seen before.

Currently this property is maintained in memory and is handed off from the primary to any replica during the recovery process.

This PR takes a more natural approach and stores it in the lucene commit, using the same semantics (no retry op with a higher time stamp is part of this commit). This means that the knowledge is transferred during the file copy and also means that we don't need to worry about crazy situations where an original append only request arrives at the engine after a retry was processed and the engine was restarted.

Once this is in 5.x, I will submit a follow up PR to remove this part of the recovery logic.

The `maxUnsafeAutoIdTimestamp` time stamp is a safety marker guaranteeing that no indexing operation with a higher auto gen id timestamp was process by the engine. This allows us to safely process documents without checking if they were seen before. Currently this property is maintained in memory and is handed off from the primary to any replica during the recovery process. This PR takes a more natural approach and stores it in the lucene commit, using the same semantics (no op with a higher time stamp is part of this commit). This means that the knowledge is transferred during the file copy and also means that we don't need to worry about crazy situations where an original append only request arrives at the engine after a retry was processed *and* the engine was restarted.

s1monw · 2017-04-18T10:34:40Z

Once this is in 5.x, I will submit a follow up PR to remove this part of the recovery logic.

question, this won't be in all 5.x indices since they might do a full cluster restart? So we can't remove it since we might need it during recover process? The other option is to detect this on open but then we need to do this for all 5.x indices I think

s1monw

I think it's a good change... left a suggestion

s1monw · 2017-04-18T10:36:10Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

    }

+    private void updateMaxUnsafeAutoIdTimestampFromWriter(IndexWriter writer) {
+        long commitMaxUnsafeAutoIdTimestamp = Long.MIN_VALUE;


what's wrong with writer.getLiveCommitData().getOrDefault(MAX_UNSAFE_AUTO_ID_TIMESTAMP_COMMIT_ID, Long.MIN_VALUE);?

it's an iterable :

public final synchronized Iterable<Map.Entry<String,String>> getLiveCommitData() {

I can throw it into a HashMap, if you prefer?

I wondered if there was something better than iterating too but there's not since IndexWriter#getLiveCommitData only returns an Iterable.

oh yeah oh well :) fair enough...

bleskes · 2017-04-18T10:38:46Z

question, this won't be in all 5.x indices since they might do a full cluster restart?

Agree it won't be there. However, on full cluster restart the primary is already opened with the maxUnsafeAutoIdTimestamp set to -1 and it flushes on start. This means that the future replicas will receive it when they do a file sync, so I think we can still remove this from the peer recovery code?

jasontedor

LGTM.

s1monw · 2017-04-18T10:49:43Z

Agree it won't be there. However, on full cluster restart the primary is already opened with the maxUnsafeAutoIdTimestamp set to -1 and it flushes on start. This means that the future replicas will receive it when they do a file sync, so I think we can still remove this from the peer recovery code?

fair enough - lets put a comment in the code pls

bleskes · 2017-04-18T18:11:44Z

Thx @s1monw & @jasontedor

The `maxUnsafeAutoIdTimestamp` timestamp is a safety marker guaranteeing that no retried-indexing operation with a higher auto gen id timestamp was process by the engine. This allows us to safely process documents without checking if they were seen before. Currently this property is maintained in memory and is handed off from the primary to any replica during the recovery process. This commit takes a more natural approach and stores it in the lucene commit, using the same semantics (no retry op with a higher time stamp is part of this commit). This means that the knowledge is transferred during the file copy and also means that we don't need to worry about crazy situations where an original append only request arrives at the engine after a retry was processed *and* the engine was restarted.

With #24149 , it is now stored in the Lucene commit and is implicitly transferred in the file phase of the recovery.

With elastic#24149 , it is now stored in the Lucene commit and is implicitly transferred in the file phase of the recovery.

…ID if index was created before 5.5.0 It was adding in #24149 which was merged into 5.5.0

bleskes added :Engine :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v5.5.0 v6.0.0-alpha1 labels Apr 18, 2017

bleskes requested a review from s1monw April 18, 2017 07:43

s1monw approved these changes Apr 18, 2017

View reviewed changes

jasontedor approved these changes Apr 18, 2017

View reviewed changes

bleskes merged commit edff30f into elastic:master Apr 18, 2017

bleskes deleted the commit_max_autogentimestamp branch April 18, 2017 18:11

bleskes mentioned this pull request Apr 21, 2017

Peer Recovery: remove maxUnsafeAutoIdTimestamp hand off #24243

Merged

bleskes added a commit that referenced this pull request Apr 21, 2017

Peer Recovery: remove maxUnsafeAutoIdTimestamp hand off (#24243)

badb2be

With #24149 , it is now stored in the Lucene commit and is implicitly transferred in the file phase of the recovery.

bleskes added a commit that referenced this pull request Sep 22, 2017

InternalEngine - don't assert on MAX_UNSAFE_AUTO_ID_TIMESTAMP_COMMIT_…

ae445eb

…ID if index was created before 5.5.0 It was adding in #24149 which was merged into 5.5.0

bleskes added a commit that referenced this pull request Sep 22, 2017

InternalEngine - don't assert on MAX_UNSAFE_AUTO_ID_TIMESTAMP_COMMIT_…

0c8d46d

…ID if index was created before 5.5.0 It was adding in #24149 which was merged into 5.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Engine: store maxUnsafeAutoIdTimestamp in commit #24149

Engine: store maxUnsafeAutoIdTimestamp in commit #24149

Uh oh!

bleskes commented Apr 18, 2017 •

edited

Loading

Uh oh!

s1monw commented Apr 18, 2017

Uh oh!

s1monw left a comment

Uh oh!

s1monw Apr 18, 2017

Uh oh!

bleskes Apr 18, 2017

Uh oh!

jasontedor Apr 18, 2017

Uh oh!

s1monw Apr 18, 2017

Uh oh!

bleskes commented Apr 18, 2017

Uh oh!

jasontedor left a comment

Uh oh!

s1monw commented Apr 18, 2017

Uh oh!

bleskes commented Apr 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Engine: store maxUnsafeAutoIdTimestamp in commit #24149

Engine: store maxUnsafeAutoIdTimestamp in commit #24149

Uh oh!

Conversation

bleskes commented Apr 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s1monw commented Apr 18, 2017

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

s1monw Apr 18, 2017

Choose a reason for hiding this comment

Uh oh!

bleskes Apr 18, 2017

Choose a reason for hiding this comment

Uh oh!

jasontedor Apr 18, 2017

Choose a reason for hiding this comment

Uh oh!

s1monw Apr 18, 2017

Choose a reason for hiding this comment

Uh oh!

bleskes commented Apr 18, 2017

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

s1monw commented Apr 18, 2017

Uh oh!

bleskes commented Apr 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bleskes commented Apr 18, 2017 •

edited

Loading