Simplify write failure handling #19105

areek · 2016-06-27T21:06:31Z

Currently, any write (e.g. index, delete) operation failure can be categorized as:

request failure (e.g. analysis, parsing error, version conflict)
transient operation failure (e.g. due to shard initializing, relocation)
environment failure (e.g. out of disk, corruption, lucene tragic event)

The main motivation of the PR is to handle these failure types appropriately for a
write request. Each failure type needs to be handled differently:

request failure (being request specific) should be replicated and then failed
transient failure should be retried (eventually succeeding)
environment failure (persistent primary shard failure) should fail the request
immediately.

Currently, transient operation failures are retried in replication action but no distinction
is made between request and environment failures, both fails write request immediately.

In this PR, we distinguish between request and environment failures for a write operation.
In case of environment failures, the exception is bubbled up failing the request and in case
of request failures, the exception is captured and replication continues (we ignore performing
on replicas when such failures occur in primary). Transient operation failures are bubbled up
to be retried by the replication operation, as before.

Note: #20109 simplifies bulk execution code, which should clean up error handling for shard bulk requests.

areek · 2016-06-29T21:28:11Z

After discussions with @bleskes, I changed the scope of this PR. Now, the PR focuses on making primary write operation failures a valid write result, so replication operation can handle them explicitly. We can add operation failure replication, as needed in the feature/seq_no branch

bleskes · 2016-06-30T15:39:34Z

core/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

nite: maybe just put this in a variable?

bleskes · 2016-07-05T07:20:38Z

core/src/main/java/org/elasticsearch/action/index/TransportIndexAction.java

what do we throw anything here? is this when the parsing doesn't match the mapping?

we throw error while parsing index request source here, currently we bubble up the exception as this is not engine level exception (though it is operation specific) .

areek · 2016-10-27T04:28:10Z

Thanks @bleskes for the feedback. I updated the PR, addressing all your comments, including adding tests for failure handling in TransportWriteAction and InternalEngine

s1monw

I just looked at the exception handling for now and left a single comment

s1monw · 2016-10-27T11:27:17Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+        if (isDocumentFailure) {
+            return failure;
+        } else {
+            ElasticsearchException exception = new ElasticsearchException(failure);


sorry but this is an absolute no-go. We worked so hard to get rid of this in so many places. We should just rethrow the original exception instead of hiding it in and ElasticsearchException. Yet, I think we can use a littel hack to make it work in this case:

@SuppressWarnings("unchecked") static <T extends Throwable> void rethrow(Throwable t) throws T { throw (T) t; }

should allow you to just retrhow failure without declaring it. I think in this case it's OK

Thanks @s1monw for proposing a solution to this. Now we rethrow the original exception, if it failed the engine instead of wrapping it in ElasticsearchException.

…ations

bleskes · 2016-10-31T10:09:00Z

core/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java

+         * expects <code>finalResponseIfSuccessful</code> or <code>finalFailure</code> to be not-null
+         */
+        public PrimaryResult(ReplicaRequest replicaRequest, Response finalResponseIfSuccessful, Exception finalFailure) {
+            assert finalFailure != null ^ finalResponseIfSuccessful != null : "either a response or a failure has to be not null";


can we add some info to the string message to let us know which whether both were null or both were set?

added finalFailure and finalResponseIfSuccessful as part of the assert message

bleskes · 2016-10-31T10:11:26Z

core/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

            // execute item request
            final Engine.Result operationResult;
            final DocWriteResponse response;
+            BulkItemRequest replicaRequest = request.items()[requestIndex];


wondering whether we should make it final, so each path will have to set it. wdyt?

this makes sense, made replicaRequest final and explicitly set it for all the cases

bleskes · 2016-10-31T10:16:02Z

core/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

-                }
-                // set update response
-                item.setPrimaryResponse(new BulkItemResponse(item.id(), opType, response));
+            request.items()[requestIndex] = replicaRequest;


hmm... strictly speaking we should restore this on retry, like we do with preVersions & preVersionTypes. This is an existing bug. can you open an issue about it so we can think about a proper solution in a follow up? This one is hairy enough :)

I opened #21221 for this, I agree we should restore the original request, but it is non-trivial because we update the request in place.
In reality though, it should not be a problem, because upon retrying the shard bulk request, the shard will just execute the translated request instead of translating the update request again.

not completely true as without the indexing into the replicas, we don't know if the way we resolved the update is valid.

when could a resolved update request that successfully executed on the primary not be valid for a replica?

bleskes · 2016-10-31T10:23:22Z

core/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

                // then just use the response we got from the successful execution
-                if (item.getPrimaryResponse() == null || isConflictException(failure) == false) {
-                    item.setPrimaryResponse(new BulkItemResponse(item.id(), docWriteRequest.opType(),
+                if (replicaRequest.getPrimaryResponse() == null || isConflictException(failure) == false) {


This is a mess - there is no guarantee from the coe that the replica request wasn't just replaced when we did request.items()[requestIndex] = replicaRequest; . In practice I think it's OK, but I don't like it. Also, now that we have a proper primary relocation (where we don't shut down the shards half way operations), I don't think it's needed. Can you open another follow up issue so we can evaluate carefully?

I will open a followup issue, for some reason github is erroring out on creating issues now :0

opened #21230 for this

bleskes · 2016-10-31T10:27:25Z

core/src/main/java/org/elasticsearch/index/engine/Engine.java

+        void setTranslogLocation(Translog.Location translogLocation) {
            if (freeze == false) {
-                this.location = location;
+                this.translogLocation = translogLocation;


do we also want to assert failure is null?

added an assert for a null failure

bleskes · 2016-10-31T10:30:34Z

core/src/main/java/org/elasticsearch/ElasticsearchException.java

                org.elasticsearch.action.RoutingMissingException::new, 79),
-        OPERATION_FAILED_ENGINE_EXCEPTION(OperationFailedEngineException.class,
-                OperationFailedEngineException::new, 80),
+        // 80 used to be for IndexFailedEngineException, removed in 6.0


we need a bwc here no? or is it somewhere else?

@bleskes we need bwc here if we care about rolling restarts from 5.x. I am not sure how to add bwc for this (i.e. might be missing something). We could simply add the exceptions back (but not use them anywhere) so they can be serialized/deserialized when a 6.0 node is acting as a coordinating node?

Another question: The index/delete operation failures were communicated as exceptions, so do we even need a bwc for these failures for serialization/deserialization or can we just rely on generic exception serialization/deserialization like we currently do for persistent engine failures during index/delete operations?

bleskes · 2016-10-31T10:40:57Z

core/src/main/java/org/elasticsearch/indices/IndexingMemoryController.java

-        statusChecker.bytesWritten(result.getSizeInBytes());
+    private void recordOperationBytes(Engine.Operation operation, Engine.Result result) {
+        final int sizeInBytes;
+        if (result.getTranslogLocation() != null) {


I talked this one with @mikemccand and since this whole thing is a heuristic to approximate the memory usage of the IW, we think it's OK to always use the operation.estimatedSizeInBytes() . No need to look at the translog location.

I changed it to just use operation.estimatedSizeInBytes(), just a note: before the change, we did used to account for tranlogLocation.size when translog location was set.

++ sorry for not mentioning that it was already the case before - I know you just ported what was there 🎉

bleskes · 2016-10-31T10:43:16Z

core/src/test/java/org/elasticsearch/action/support/replication/TransportWriteActionTests.java

+                TestAction.WriteReplicaResult::respond);
+    }
+
+    private <Result, Response> void handleDocumentFailure(TestAction testAction,


nit: can you unpack it? it saves on 10 lines of code at the expense of making it much harder to understand what it does - you end going back and forth - I think it's not worth it?

bleskes · 2016-10-31T10:47:14Z

core/src/main/java/org/elasticsearch/index/shard/IndexingOperationListener.java

+    default void postIndex(Engine.Index index, Engine.IndexResult result) {}

    /**
     * Called after the indexing operation occurred with exception.


can we document the difference with the other postIndex and how failures are treated?

added java docs for this

bleskes · 2016-10-31T10:51:01Z

core/src/main/java/org/elasticsearch/action/support/replication/TransportWriteAction.java

+                                  @Nullable Location location, @Nullable Exception operationFailure,
+                                  IndexShard primary) {
+            super(request, finalResponse, operationFailure);
+            assert operationFailure != null ^ finalResponse != null;


but still if the location is != null, there must be no failure?

bleskes

I left a few minor comments and some requests for new issues. I like how this looks.

I would be great to get @s1monw LGTM.

Also, @jpountz can you sanity check https://github.com/elastic/elasticsearch/pull/19105/files#diff-7cdd93f7b049567dc8e2ffc37300852eR169 ? It would be great to have one exception type

areek · 2016-11-01T04:16:07Z

@bleskes thanks again for the review, I addressed all the comments and had
one question regarding bwc for the removed exceptions in #19105 (comment)

bleskes · 2016-11-01T09:29:21Z

@areek because the github ui sucks, I'm responding here so it will be easy to see:

if we care about rolling restarts from 5.x

We care! and yeah, I think we can just keep them in there and doc that they can be removed in 7.0 (assertion?)

The index/delete operation failures were communicated as exceptions, so do we even need a bwc for these failures for serialization/deserialization or can we just rely on generic exception serialization/deserialization like we currently do for persistent engine failures during index/delete operations?

I don't think we can rely on generic exceptions for incoming responses from old nodes. That will ask for a specific exception ID and we won't have it -> boom.

bleskes · 2016-11-01T09:31:09Z

core/src/main/java/org/elasticsearch/action/support/replication/TransportWriteAction.java

                                  IndexShard primary) {
            super(request, finalResponse, operationFailure);
+            if (location != null) {
+                assert operationFailure == null : "expected no failures when translog location is not null";


nit: assertion location == null || operationFailure == null ?

bleskes · 2016-11-01T09:33:06Z

core/src/main/java/org/elasticsearch/index/shard/IndexingOperationListener.java

    /**
-     * Called after the indexing operation occurred.
+     * Called after the indexing operation occurred. Implementations should
+     * check {@link Engine.IndexResult#hasFailure()} for operation failures


I don't think we need to say this? how about "note that this method is also called when indexing a document didn't succeed because of document related failures. See {@link #postIndex(..)} for engine level failures.

s1monw

I left some minor comments, I did review the engine changes and glanced on the replciation action stuff. I think it LGTM except of the one or two commetns I gave. The one with the maybeFail is important

s1monw · 2016-11-01T10:49:16Z

core/src/main/java/org/elasticsearch/index/engine/DeleteFailedEngineException.java

-
-import java.io.IOException;
-
-public class DeleteFailedEngineException extends EngineException {


s1monw · 2016-11-01T10:49:25Z

core/src/main/java/org/elasticsearch/index/IndexingSlowLog.java

-    }
-
-
-    private void postIndexing(ParsedDocument doc, long tookInNanos) {


s1monw · 2016-11-01T10:49:49Z

core/src/main/java/org/elasticsearch/index/engine/Engine.java

+    public abstract IndexResult index(Index operation);

-    public abstract void delete(Delete delete) throws EngineException;
+    public abstract DeleteResult delete(Delete delete);


a java doc commetn would be great here while we are at it

s1monw · 2016-11-01T10:50:42Z

core/src/main/java/org/elasticsearch/index/engine/Engine.java

+        private final Exception failure;
+        private Translog.Location translogLocation;
+        private long took;
+        private boolean freeze;


maybe use private final SetOnce<Boolean> frozen here it will barf if you freeze multiple times?

s1monw · 2016-11-01T10:52:02Z

core/src/main/java/org/elasticsearch/index/engine/IndexFailedEngineException.java

-import java.io.IOException;
-import java.util.Objects;
-
-public class IndexFailedEngineException extends EngineException {


s1monw · 2016-11-01T10:54:10Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            // and set the error in operation.setFailure. In case of environment related errors, the failure
+            // is bubbled up
+            isDocumentFailure = !((failure instanceof IllegalStateException || failure instanceof IOException)
+                    && maybeFailEngine(operation.operationType().getLowercase(), failure));


can we please move the maybeFailEngine(operation.operationType().getLowercase(), failure)) one line above and assign it to a variable. I really wanna make sure no short circuit logic here prevents it from being called

s1monw · 2016-11-01T10:55:02Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+
+    @SuppressWarnings("unchecked")
+    static <T extends Throwable> void rethrow(Throwable t) throws T {
+        throw (T) t;


maybe document that this is a hack to retrhow the original

s1monw · 2016-11-01T10:58:32Z

core/src/main/java/org/elasticsearch/index/shard/InternalIndexingStats.java

+        if (result.hasFailure() == false) {
+            if (!index.origin().isRecovery()) {
+                long took = result.getTook();
+                totalStats.indexMetric.inc(took);


should we have a write failures statistic too? @bleskes

… bwc with 5.x

This is a bespoke backport of elastic#20109 for 5.x: Currently, bulk item requests can be any ActionRequest, this PR restricts bulk item requests to DocumentRequest. This simplifies handling failures during bulk requests. Additionally, a new enum is added to DocumentRequest to represent the intended operation to be performed by a document request (create, index, update and delete), which was previously represented with a mix of strings and index request operation type. Now, index request operation type reuses the new enum to specify whether the request should create or index a document. Restricting bulk requests to DocumentRequest further simplifies execution of shard-level bulk operations to use the same failure handling for index, delete and update operations. This PR also fixes a bug which executed delete operations twice for replica copies while executing bulk requests. Relates to elastic#19105 and elastic#20109

Currently, any write (e.g. `index`, `delete`) operation failure can be categorized as: - request failure (e.g. analysis, parsing error, version conflict) - transient operation failure (e.g. due to shard initializing, relocation) - environment failure (e.g. out of disk, corruption, lucene tragic event) The main motivation of the PR is to handle these failure types appropriately for a write request. Each failure type needs to be handled differently: - request failure (being request specific) should be replicated and then failed - transient failure should be retried (eventually succeeding) - environment failure (persistent primary shard failure) should fail the request immediately. Currently, transient operation failures are retried in replication action but no distinction is made between request and environment failures, both fails write request immediately. In this PR, we distinguish between request and environment failures for a write operation. In case of environment failures, the exception is bubbled up failing the request and in case of request failures, the exception is captured and replication continues (we ignore performing on replicas when such failures occur in primary). Transient operation failures are bubbled up to be retried by the replication operation, as before.

* Simplify write failure handling (backport of #19105) Currently, any write (e.g. `index`, `delete`) operation failure can be categorized as: - request failure (e.g. analysis, parsing error, version conflict) - transient operation failure (e.g. due to shard initializing, relocation) - environment failure (e.g. out of disk, corruption, lucene tragic event) The main motivation of the PR is to handle these failure types appropriately for a write request. Each failure type needs to be handled differently: - request failure (being request specific) should be replicated and then failed - transient failure should be retried (eventually succeeding) - environment failure (persistent primary shard failure) should fail the request immediately. Currently, transient operation failures are retried in replication action but no distinction is made between request and environment failures, both fails write request immediately. In this PR, we distinguish between request and environment failures for a write operation. In case of environment failures, the exception is bubbled up failing the request and in case of request failures, the exception is captured and replication continues (we ignore performing on replicas when such failures occur in primary). Transient operation failures are bubbled up to be retried by the replication operation, as before. * incorporate feedback

areek added review resiliency WIP v5.0.0-alpha5 labels Jun 27, 2016

areek force-pushed the enhancement/replicate_primary_write_failures branch 3 times, most recently from 3b56c4d to 280d18a Compare June 29, 2016 21:21

areek changed the title ~~Replicate primary write operation failures~~ Make primary write operation failure a valid result Jun 29, 2016

areek force-pushed the enhancement/replicate_primary_write_failures branch from 280d18a to 5fbd801 Compare June 29, 2016 21:31

bleskes reviewed Jun 30, 2016
View reviewed changes

core/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java Outdated

Copy link

Contributor

bleskes Jun 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nite: maybe just put this in a variable?

areek force-pushed the enhancement/replicate_primary_write_failures branch from 5fbd801 to 938e043 Compare July 4, 2016 22:22

bleskes reviewed Jul 5, 2016
View reviewed changes

s1monw reviewed Oct 27, 2016

View reviewed changes

Rethrow original exception when it fails the engine during write oper…

2f883fc

…ations

bleskes reviewed Oct 31, 2016

View reviewed changes

bleskes approved these changes Oct 31, 2016

View reviewed changes

Merge branch 'master' into enhancement/replicate_primary_write_failures

eafd3df

areek mentioned this pull request Oct 31, 2016

Restore original update bulk item request on retry #21221

Closed

incorporate feedback

02ecff1

areek force-pushed the enhancement/replicate_primary_write_failures branch from ae38c7e to 02ecff1 Compare November 1, 2016 03:50

areek mentioned this pull request Nov 1, 2016

Evaluate version conflict in bulk request during primary relocation #21230

Closed

bleskes reviewed Nov 1, 2016

View reviewed changes

s1monw reviewed Nov 1, 2016

View reviewed changes

areek added 3 commits November 1, 2016 13:37

Merge branch 'master' into enhancement/replicate_primary_write_failures

603d506

documentation and minor fixes for engine level index/delete operations

cf3e2d1

add back index and delete engine failure exceptions as deprecated for…

ee0b273

… bwc with 5.x

areek merged commit 03abf4a into elastic:master Nov 1, 2016

bleskes mentioned this pull request Jan 15, 2017

Replace EngineClosedException with AlreadyClosedExcpetion #22631

Merged

dakrone mentioned this pull request Jan 19, 2017

Simplify bulk request execution #22697

Merged

areek mentioned this pull request Jan 24, 2017

Simplify write failure handling (backport of #19105) #22778

Merged

clintongormley added the >enhancement label May 5, 2017


		import java.io.IOException;

		public class DeleteFailedEngineException extends EngineException {

		}


		private void postIndexing(ParsedDocument doc, long tookInNanos) {

Simplify write failure handling #19105

Simplify write failure handling #19105

Uh oh!

Conversation

areek commented Jun 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

areek commented Jun 29, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

areek commented Oct 27, 2016

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

areek commented Nov 1, 2016

Uh oh!

bleskes commented Nov 1, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

areek commented Jun 27, 2016 •

edited

Loading

bleskes left a comment •

edited

Loading