-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Die with dignity on the Lucene layer #21721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Die with dignity on the Lucene layer #21721
Conversation
|
I tried to write a test for this here: diff --git a/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java b/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
index 19b50c3..af3a8b3 100644
--- a/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
+++ b/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
@@ -26,6 +26,8 @@ import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.core.LogEvent;
import org.apache.logging.log4j.core.appender.AbstractAppender;
import org.apache.logging.log4j.core.filter.RegexFilter;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.codecs.Codec;
import org.apache.lucene.document.Field;
@@ -227,10 +229,13 @@ public class InternalEngineTests extends ESTestCase {
engine.config().setEnableGcDeletes(false);
}
}
-
public EngineConfig copy(EngineConfig config, EngineConfig.OpenMode openMode) {
+ return copy(config, openMode, config.getAnalyzer());
+ }
+
+ public EngineConfig copy(EngineConfig config, EngineConfig.OpenMode openMode, Analyzer analyzer) {
return new EngineConfig(openMode, config.getShardId(), config.getThreadPool(), config.getIndexSettings(), config.getWarmer(),
- config.getStore(), config.getDeletionPolicy(), config.getMergePolicy(), config.getAnalyzer(), config.getSimilarity(),
+ config.getStore(), config.getDeletionPolicy(), config.getMergePolicy(), analyzer, config.getSimilarity(),
new CodecService(null, logger), config.getEventListener(), config.getTranslogRecoveryPerformer(), config.getQueryCache(),
config.getQueryCachingPolicy(), config.getTranslogConfig(), config.getFlushMergesAfter(), config.getRefreshListeners(),
config.getMaxUnsafeAutoIdTimestamp());
@@ -2849,4 +2854,33 @@ public class InternalEngineTests extends ESTestCase {
assertTrue(internalEngine.failedEngine.get() instanceof MockDirectoryWrapper.FakeIOException);
}
}
+
+ public void testTragicEventErrorBubblesUp() throws IOException {
+ engine.close();
+ AtomicBoolean failOOM = new AtomicBoolean(true);
+ engine = new InternalEngine(copy(engine.config(), EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG, new Analyzer() {
+
+ @Override
+ protected TokenStreamComponents createComponents(String fieldName) {
+ return new TokenStreamComponents(new Tokenizer() {
+ @Override
+ public boolean incrementToken() throws IOException {
+ if (failOOM.get()) {
+ throw new OutOfMemoryError("boom");
+ } else {
+ throw new AssertionError("should not get to this point");
+ }
+ }
+ });
+ }
+ }));
+ Document document = testDocument();
+ document.add(new TextField("value", "test", Field.Store.YES));
+ ParsedDocument doc = testParsedDocument("1", "1", "test", null, -1, -1, document, B_1, null);
+ Engine.Index first = new Engine.Index(newUid("1"), doc);
+ expectThrows(OutOfMemoryError.class, () -> engine.index(first));
+ failOOM.set(false);
+ expectThrows(OutOfMemoryError.class, () -> engine.index(first));
+ assertNull(engine.failedEngine.get()); // this is null here since we don't try to fail the engine if tragic even is an error.. should we?
+ }
}feel free to use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we still try to fail the engine here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, the node is going to tear itself down anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel better if we do since if there is a bug catching that error somehow we didn't fail the shard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see that means we gotta pass Throwable to failEngine hmm not sure here... ok I guess I am ok with not failing
96c4116 to
b07878f
Compare
|
LGTM |
When a fatal error tragically closes an index writer, such an error never makes its way to the uncaught exception handler. This prevents the node from being torn down if an out of memory error or other fatal error is thrown in the Lucene layer. This commit ensures that such events bubble their way up to the uncaught exception handler.
This commit adds a unit test that when an index writer is closed tragically, if it is due to a fatal error then that error bubbles up.
b07878f to
309d97a
Compare
This commit removes an extra space in InternalEngineTests#testTragicEventErrorBubblesUp.
When a fatal error tragically closes an index writer, such an error never makes its way to the uncaught exception handler. This prevents the node from being torn down if an out of memory error or other fatal error is thrown in the Lucene layer. This commit ensures that such events bubble their way up to the uncaught exception handler. Relates #21721
When a fatal error tragically closes an index writer, such an error never makes its way to the uncaught exception handler. This prevents the node from being torn down if an out of memory error or other fatal error is thrown in the Lucene layer. This commit ensures that such events bubble their way up to the uncaught exception handler. Relates #21721
|
Thanks for all your help on this one @s1monw. ❤️ |
|
YW @jasontedor |
When a fatal error tragically closes an index writer, such an error
never makes its way to the uncaught exception handler. This prevents the
node from being torn down if an out of memory error or other fatal error
is thrown in the Lucene layer. This commit ensures that such events
bubble their way up to the uncaught exception handler.
Relates #19272