Skip to content

Conversation

@jasontedor
Copy link
Member

@jasontedor jasontedor commented Nov 21, 2016

When a fatal error tragically closes an index writer, such an error
never makes its way to the uncaught exception handler. This prevents the
node from being torn down if an out of memory error or other fatal error
is thrown in the Lucene layer. This commit ensures that such events
bubble their way up to the uncaught exception handler.

Relates #19272

@s1monw
Copy link
Contributor

s1monw commented Nov 21, 2016

I tried to write a test for this here:

diff --git a/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java b/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
index 19b50c3..af3a8b3 100644
--- a/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
+++ b/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
@@ -26,6 +26,8 @@ import org.apache.logging.log4j.Logger;
 import org.apache.logging.log4j.core.LogEvent;
 import org.apache.logging.log4j.core.appender.AbstractAppender;
 import org.apache.logging.log4j.core.filter.RegexFilter;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.codecs.Codec;
 import org.apache.lucene.document.Field;
@@ -227,10 +229,13 @@ public class InternalEngineTests extends ESTestCase {
             engine.config().setEnableGcDeletes(false);
         }
     }
-
     public EngineConfig copy(EngineConfig config, EngineConfig.OpenMode openMode) {
+        return copy(config, openMode, config.getAnalyzer());
+    }
+
+    public EngineConfig copy(EngineConfig config, EngineConfig.OpenMode openMode, Analyzer analyzer) {
         return new EngineConfig(openMode, config.getShardId(), config.getThreadPool(), config.getIndexSettings(), config.getWarmer(),
-            config.getStore(), config.getDeletionPolicy(), config.getMergePolicy(), config.getAnalyzer(), config.getSimilarity(),
+            config.getStore(), config.getDeletionPolicy(), config.getMergePolicy(), analyzer, config.getSimilarity(),
             new CodecService(null, logger), config.getEventListener(), config.getTranslogRecoveryPerformer(), config.getQueryCache(),
             config.getQueryCachingPolicy(), config.getTranslogConfig(), config.getFlushMergesAfter(), config.getRefreshListeners(),
             config.getMaxUnsafeAutoIdTimestamp());
@@ -2849,4 +2854,33 @@ public class InternalEngineTests extends ESTestCase {
             assertTrue(internalEngine.failedEngine.get() instanceof MockDirectoryWrapper.FakeIOException);
         }
     }
+
+    public void testTragicEventErrorBubblesUp() throws IOException {
+        engine.close();
+        AtomicBoolean failOOM = new AtomicBoolean(true);
+        engine = new InternalEngine(copy(engine.config(), EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG, new Analyzer() {
+
+            @Override
+            protected TokenStreamComponents createComponents(String fieldName) {
+                return new TokenStreamComponents(new Tokenizer() {
+                    @Override
+                    public boolean incrementToken() throws IOException {
+                        if (failOOM.get()) {
+                            throw new OutOfMemoryError("boom");
+                        } else {
+                            throw new AssertionError("should not get to this point");
+                        }
+                    }
+                });
+            }
+        }));
+        Document document = testDocument();
+        document.add(new TextField("value", "test", Field.Store.YES));
+        ParsedDocument doc = testParsedDocument("1", "1", "test", null, -1, -1, document, B_1, null);
+        Engine.Index first = new Engine.Index(newUid("1"), doc);
+        expectThrows(OutOfMemoryError.class, () ->  engine.index(first));
+        failOOM.set(false);
+        expectThrows(OutOfMemoryError.class, () -> engine.index(first));
+        assertNull(engine.failedEngine.get()); // this is null here since we don't try to fail the engine if tragic even is an error.. should we?
+    }
 }

feel free to use it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we still try to fail the engine here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, the node is going to tear itself down anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd feel better if we do since if there is a bug catching that error somehow we didn't fail the shard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see that means we gotta pass Throwable to failEngine hmm not sure here... ok I guess I am ok with not failing

@jasontedor jasontedor force-pushed the die-with-dignity-index-writer branch from 96c4116 to b07878f Compare November 22, 2016 03:06
@jasontedor
Copy link
Member Author

jasontedor commented Nov 22, 2016

Very nice @s1monw, I pushed 309d97a.

@s1monw
Copy link
Contributor

s1monw commented Nov 22, 2016

LGTM

When a fatal error tragically closes an index writer, such an error
never makes its way to the uncaught exception handler. This prevents the
node from being torn down if an out of memory error or other fatal error
is thrown in the Lucene layer. This commit ensures that such events
bubble their way up to the uncaught exception handler.
This commit adds a unit test that when an index writer is closed
tragically, if it is due to a fatal error then that error bubbles up.
@jasontedor jasontedor force-pushed the die-with-dignity-index-writer branch from b07878f to 309d97a Compare November 22, 2016 13:15
This commit removes an extra space in
InternalEngineTests#testTragicEventErrorBubblesUp.
@jasontedor jasontedor merged commit 775638c into elastic:master Nov 22, 2016
jasontedor added a commit that referenced this pull request Nov 22, 2016
When a fatal error tragically closes an index writer, such an error
never makes its way to the uncaught exception handler. This prevents the
node from being torn down if an out of memory error or other fatal error
is thrown in the Lucene layer. This commit ensures that such events
bubble their way up to the uncaught exception handler.

Relates #21721
jasontedor added a commit that referenced this pull request Nov 22, 2016
When a fatal error tragically closes an index writer, such an error
never makes its way to the uncaught exception handler. This prevents the
node from being torn down if an out of memory error or other fatal error
is thrown in the Lucene layer. This commit ensures that such events
bubble their way up to the uncaught exception handler.

Relates #21721
@jasontedor jasontedor deleted the die-with-dignity-index-writer branch November 22, 2016 17:18
@jasontedor
Copy link
Member Author

Thanks for all your help on this one @s1monw. ❤️

@s1monw
Copy link
Contributor

s1monw commented Nov 22, 2016

YW @jasontedor

@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocker >bug critical :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v5.0.2 v5.1.1 v6.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants