Enable a long translog retention policy by default #25294

bleskes · 2017-06-19T14:46:54Z

#25147 added the translog deletion policy but didn't enable it by default. This PR enables a default retention of 512MB (same maximum size of the current translog) and an age of 12 hours (i.e., after 12 hours all translog files will be deleted). This increases to chance to have an ops based recovery, even if the primary flushed or the replica was offline for a few hours.

In order to see which parts of the translog are committed into lucene the translog stats are extended to include information about uncommitted operations.

Views now include all translog ops and guarantee, as before, that those will not go away. Snapshotting a view allows to filter out generations that are not relevant based on a specific sequence number.

I still have to write some docs and add a migration note, but I think we can start reviewing.

Relates to #10708

…n_part2

…n_part2 # Conflicts: # core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

s1monw

left a bunch of comments

s1monw · 2017-06-20T14:42:52Z

core/src/main/java/org/elasticsearch/index/IndexService.java

            if (refreshTask.getInterval().equals(indexSettings.getRefreshInterval()) == false) {
                rescheduleRefreshTasks();
            }
+            if (trimTranslogTask.getInterval().equals(indexSettings.getTranslogRetentionCheckInterval()) == false) {


Can you elaborate why you go down this route instead of using IndexShard#afterWriteOperation() and maybe also do it in IndexShard#flush? I'd love to rather remove tasks than adding them?! it's just way easier to reason about things if they are happening due to a user interaction form the outside

Can you elaborate why you go down this route instead of using IndexShard#afterWriteOperation()

This is not instead but on top of it. The problem is that if people have a 12h retention policy and we need to start cleaning stuff 12hs later (assuming they index full speed and roll over to a new index).

I'd love to rather remove tasks than adding them?!

I'm totally with you. I spent some time thinking about ways to avoid the regular check but everything I came up with ended up more complex. I considered stuff scheduling things in in activity etc.. always come back to this being the clearest/easiest. I'll be more than happy to hear about alternatives.

s1monw · 2017-06-20T14:45:52Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

-        engine.getTranslog().rollGeneration();
+        final Translog translog = engine.getTranslog();
+        translog.rollGeneration();
+        translog.trimUnreferencedReaders();


hmm you run translog.trimUnreferencedReaders(); under the engine lock here but you run it without a lock here so I wonder when we have to run under lock and if we can assert on that?

That's a great observation. Strictly speaking we don't need to coordinate through the write lock here. In the place you linked to I was mostly interested in the error handling and that we fail the engine if anything goes wrong (as we close the translog on error as well). I was worried for poisonous situations where we keep trying to get stuff from a closed translog. I will work on removing the read lock in the engine and fold this functionality (trim and roll) into the engine so we can be consistent. agreed?

I think the readlock is healthy here, I wonder if we would rather fold this part into the engine and don't call it from the outside and add a parameter boolean rollGeneration to the trim method?

I too find it weird that we have IndexShard#trimTranslog that goes through the engine (with error handling) but ultimately amounts to trimming unreferenced readers, and we have this method that does the same without going through the engine instead poking all the way through to the translog.

s1monw · 2017-06-20T14:46:33Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

    }

+    private Stream<? extends BaseTranslogReader> readersAboveMinSeqNo(long minSeqNo) {
+        assert readLock.isHeldByCurrentThread() || writeLock.isHeldByCurrentThread();


please have one assert per check it's easier to see what failed

I'm confused - this is an or check. I will add a message.

nevermind message is good

s1monw · 2017-06-20T14:47:38Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

                // we're shutdown potentially on some tragic event, don't delete anything
                return;
            }
+            if (readers.isEmpty()) {


maybe instead of adding this intermediate return we can invert and only do the work if it's not empty?

yeah, I can do that and thought about it. The reason I went with this is that to do something like shouldTrimTranslog where I can add this shortcut, I will also have to go and ask the retention policy for the current trimming situation and add tests etc. Since this method is not called on every write, but I rather on every roll / flush/ background check every 10m.. I felt this was simpler. Let me know what you think.

methods should have a single return statement IMO, everything else is more complex in 99% of the cases.

s1monw · 2017-06-20T14:49:07Z

core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java

    }
+
+    synchronized long getViewCount(long viewGen) {
+        return translogRefCounts.getOrDefault(viewGen, Counter.newCounter(false)).get();


you are creating a new counter anyway even if it's there I think we can safe that object and use computeIfAbsent?

++. will change

s1monw · 2017-06-20T14:50:41Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java


            if (isSequenceNumberBasedRecoveryPossible) {
                logger.trace("performing sequence numbers based recovery. starting at [{}]", request.startingSeqNo());
+                startingSeqNo = request.startingSeqNo();


the changes in here are unrelated to the trimming code, I wonder if we can do it in 2 steps and add the integration in a followup, that way reviews are much simpler and more effective?

yes, this became less important than what it used to be after some iterations. That said the change is mostly related to reporting and that acquiring a snapshot now has a seq# in it's constructor (to avoid scanning a large translog). I can try to roll that part back but I was afraid it will change the system too much (we will stream more ops from the translog). I'm not sure it's worth it tbh.

+727 −336 I think we should try harder to make them more contained. really!

bleskes · 2017-06-20T19:46:21Z

@s1monw thanks. I responded to some of you comment. Will apply code changes tomorrow.

…n_part2

This reverts commit 286d4a3.

jasontedor

I left a few comments.

jasontedor · 2017-06-21T12:38:57Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java


+    @Override
+    public void trimTranslog() throws EngineException {
+        try (ReleasableLock lock = readLock.acquire()) {


Would you mind renaming this to ignored? It keeps the IDE from complaining, and it makes it clear that the resource is intentionally unused through the scope of the try-with-resources block.

jasontedor · 2017-06-21T12:44:36Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

-        engine.getTranslog().rollGeneration();
+        final Translog translog = engine.getTranslog();
+        translog.rollGeneration();
+        translog.trimUnreferencedReaders();


I too find it weird that we have IndexShard#trimTranslog that goes through the engine (with error handling) but ultimately amounts to trimming unreferenced readers, and we have this method that does the same without going through the engine instead poking all the way through to the translog.

jasontedor · 2017-06-21T12:46:09Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

+            .filter(reader -> {
+                final long maxSeqNo = reader.getCheckpoint().maxSeqNo;
+                return maxSeqNo == SequenceNumbersService.UNASSIGNED_SEQ_NO ||
+                    maxSeqNo >= minSeqNo;


Nit: I think this fits on the preceding line without wrapping.

jasontedor · 2017-06-21T13:00:22Z

core/src/main/java/org/elasticsearch/index/translog/TranslogStats.java

    private long translogSizeInBytes;
    private int numberOfOperations;
+    private long uncommittedSizeInBytes;
+    private int  uncommittedOperations;


I wonder if since these are summed in node stats if this should be a long (to reduce the chance of overflow)?

I agree, but I think we need to do this consistently and I didn't want to touch the other numberOfOperations field (and the bwc code that entails) in the same PR. That's why I did it like this.

To be clear, you're saying that you'll address this in a follow-up?

~~can~~ will do.

bleskes · 2017-06-21T14:44:26Z

@s1monw I took out all the scheduled trimming (we can decide what to do about it later) and did the engine changes.

@jasontedor I addressed your feedback.

Can you please look again?

jasontedor

LGTM.

s1monw

LGTM 2

…n_part2

bleskes · 2017-06-22T15:08:28Z

Thx @jasontedor @s1monw for the iterations.

* master: (56 commits) Initialize max unsafe auto ID timestamp on shrink Enable a long translog retention policy by default (elastic#25294) Remove `index.mapping.single_type=false` from core/tests (elastic#25331) test: single type defaults to true since alpha1 and not alpha3 Get short path name for native controllers Live primary-replica resync (no rollback) (elastic#24841) Upgrade to lucene-7.0.0-snapshot-ad2cb77. (elastic#25349) percolator: Deprecate `document_type` parameter. [DOCS] Fixed typo. [rest-api-spec/indices.refresh] Remove old params Remove redundant and broken MD5 checksum from repository-s3 (elastic#25270) Initialize sequence numbers on a shrunken index Port most snapshot/restore static bwc tests to qa:full-cluster-restart (elastic#25296) Javadoc: ThreadPool doesn't reject while shutdown (elastic#23678) test: verify `size_to_upgrade_in_bytes` in assertBusy(...) Docs: Removed duplicated line in mapping docs Add backward compatibility indices for 5.4.2 Update MockTransportService to the age of Transport.Connection (elastic#25320) Add version v5.4.2 after release IndexMetaData: Add internal format index setting (elastic#25292) ...

We currently check whether translog files can be trimmed whenever we create a new translog generation or close a view. However #25294 added a long translog retention period (12h, max 512MB by default), which means translog files should potentially be cleaned up long after there isn't any indexing activity to trigger flushes/the creation of new translog files. We therefore need a scheduled background check to clean up those files once they are no longer needed. Relates to #10708

bleskes added 13 commits June 18, 2017 17:36

initial change + translog tests pass

6d1a93b

add stats to distinguish between committed and uncommitted

2428bb1

lint

cb17e83

back to int

e1a544b

fix tests

00f8ae3

fix testReuseInFileBasedPeerRecovery

889492a

fix rest test

d2f70d4

Merge remote-tracking branch 'upstream/master' into translog_retentio…

7962d6b

…n_part2

another fix to 20_translog.yml

68322d5

fix testRecoveryAfterPrimaryPromotion

8170c2a

control retention in testStats

be59059

Merge remote-tracking branch 'upstream/master' into translog_retentio…

6527bf8

…n_part2

add time based trimming

286d4a3

bleskes added :Sequence IDs >enhancement v6.0.0 labels Jun 19, 2017

bleskes requested a review from jasontedor June 19, 2017 14:47

bleskes added 3 commits June 19, 2017 16:59

Merge remote-tracking branch 'upstream/master' into translog_retentio…

3411e25

…n_part2

Merge remote-tracking branch 'upstream/master' into translog_retentio…

7602e53

…n_part2 # Conflicts: # core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

some docs and a migration note

be0192e

s1monw suggested changes Jun 20, 2017

View reviewed changes

bleskes added 7 commits June 20, 2017 22:25

Merge remote-tracking branch 'upstream/master' into translog_retentio…

925b29d

…n_part2

Merge remote-tracking branch 'upstream/master' into translog_retentio…

202e6f8

…n_part2

Merge remote-tracking branch 'upstream/master' into translog_retentio…

30004a1

…n_part2

Revert "add time based trimming"

36e0a4e

This reverts commit 286d4a3.

inline rolling and trimming into engine

df0fb63

feedback

1042768

javadocs

f77ac5b

jasontedor reviewed Jun 21, 2017

View reviewed changes

bleskes added 3 commits June 21, 2017 15:15

roll back computeIfAbsent

9c56729

a different approach

375376b

jason's feedback

18b7fee

jasontedor approved these changes Jun 21, 2017

View reviewed changes

s1monw approved these changes Jun 22, 2017

View reviewed changes

bleskes added 3 commits June 22, 2017 14:02

Merge remote-tracking branch 'upstream/master' into translog_retentio…

976df1c

…n_part2

fix compilation

d9ca199

fix testSyncerSendsOffCorrectDocuments

a924d30

bleskes merged commit d963882 into elastic:master Jun 22, 2017

bleskes mentioned this pull request Jul 10, 2017

Add a scheduled translog retention check #25622

Merged

clintongormley added v6.0.0-beta1 and removed v6.0.0 labels Jul 25, 2017

Enable a long translog retention policy by default #25294

Enable a long translog retention policy by default #25294

Uh oh!

Conversation

bleskes commented Jun 19, 2017

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jun 20, 2017

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jun 21, 2017

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jun 22, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

jasontedor Jun 21, 2017 •

edited

Loading