Skip to content

Conversation

@Tim-Brooks
Copy link
Contributor

@Tim-Brooks Tim-Brooks commented Nov 20, 2017

This is related to #27422. Right now when we send a write to the netty
transport, we attach a listener to the future. When you submit a write
on the netty event loop and the event loop is shutdown, the onFailure
method is called. Unfortunately, netty then tries to notify the listener
which cannot be done without dispatching to the event loop. In this
case, the dispatch fails and netty logs and error and does not tell us.

This commit checks that netty is still not shutdown after sending a
message. If netty is shutdown, we complete the listener.

This is related to elastic#27422. Right now when we do a write in the netty
transport, we attach a listener to the future. When you submit a write
on the netty event loop and the event loop is shutdown, the onFailure
method is called. Unfortunately, netty then tries to notify the listener
which cannot be done without dispatching to the event loop. In this
case, the dispatch fails and netty logs and error and does not tell us.

This commit checks that netty is still running before sending a message.
This will not 100% fix this issue, but it should reduce it.
@Tim-Brooks Tim-Brooks added :Distributed Coordination/Network Http and internode communication implementations :Delivery/Build Build or test infrastructure review v6.1.0 v7.0.0 labels Nov 20, 2017
@Tim-Brooks
Copy link
Contributor Author

I also opened an issue with netty for this.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general & good catch. I left one request

});
channel.writeAndFlush(Netty4Utils.toByteBuf(reference), writePromise);

if (channel.eventLoop().isShutdown()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we protect this from double invocation? I think we should just make sure we only invoke once. can you wrap it in here and ensure that? maybe just use NotifyOnceListener

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - the only place we call this is from internalSendMessage.

    private void internalSendMessage(TcpChannel channel, BytesReference message, SendMetricListener listener) {
        try {
            channel.sendMessage(message, listener);
        } catch (Exception ex) {
            // call listener to ensure that any resources are released
            listener.onFailure(ex);
            onException(channel, ex);
        }
    }

And SendMetricListener is a NotifyOnceListener. So incidentally - yes it is always a notify once listener. Do you want me to codify that by changing the signature of TcpChannel#sendMessage to:

/**
     * Sends a tcp message to the channel. The listener will be executed once the send process has been
     * completed.
     *
     * @param reference to send to channel
     * @param listener to execute upon send completion
     */
    void sendMessage(BytesReference reference, NotifyOnceListener<Void> listener);

???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's find in this case! LGTM

@Tim-Brooks Tim-Brooks merged commit 4e04f95 into elastic:master Nov 20, 2017
Tim-Brooks added a commit that referenced this pull request Nov 20, 2017
This is related to #27422. Right now when we send a write to the netty
transport, we attach a listener to the future. When you submit a write
on the netty event loop and the event loop is shutdown, the onFailure
method is called. Unfortunately, netty then tries to notify the listener
which cannot be done without dispatching to the event loop. In this
case, the dispatch fails and netty logs and error and does not tell us.

This commit checks that netty is still not shutdown after sending a
message. If netty is shutdown, we complete the listener.
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for opening the issue against Netty too.

jasontedor added a commit that referenced this pull request Nov 21, 2017
* master:
  Fix resync request serialization
  Fix issue where pages aren't released (#27459)
  Add YAML REST tests for filters bucket agg (#27128)
  Remove tcp profile from low level nio channel (#27441)
jasontedor added a commit that referenced this pull request Nov 21, 2017
* 6.x:
  Fix resync request serialization
  [Test] Fix multi_match query builder expectation
  Fix issue where pages aren't released (#27459)
  Remove tcp profile from low level nio channel (#27441)
jasontedor added a commit to olcbean/elasticsearch that referenced this pull request Nov 21, 2017
* master: (41 commits)
  [Test] Fix AggregationsTests#testFromXContentWithRandomFields
  [DOC] Fix mathematical representation on interval (range) (elastic#27450)
  Update version check for CCS optional remote clusters
  Bump BWC version to 6.1.0 for elastic#27469
  Adapt rest test BWC version after backport
  Fix dynamic mapping update generation. (elastic#27467)
  Use the primary_term field to identify parent documents (elastic#27469)
  Move composite aggregation to core (elastic#27474)
  Fix test BWC version after backport
  Protect shard splitting from illegal target shards (elastic#27468)
  Cross Cluster Search: make remote clusters optional (elastic#27182)
  [Docs] Fix broken bulleted lists (elastic#27470)
  Move resync request serialization assertion
  Fix resync request serialization
  Fix issue where pages aren't released (elastic#27459)
  Add YAML REST tests for filters bucket agg (elastic#27128)
  Remove tcp profile from low level nio channel (elastic#27441)
  [TEST] Fix `GeoShapeQueryTests#testPointsOnly` failure
  Transition transport apis to use void listeners (elastic#27440)
  AwaitsFix GeoShapeQueryTests#testPointsOnly elastic#27454
  ...
@Tim-Brooks Tim-Brooks deleted the fix_shutdown_exception branch December 10, 2018 16:19
@tomcallahan tomcallahan added >non-issue and removed :Delivery/Build Build or test infrastructure labels Dec 18, 2018
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Network Http and internode communication implementations >non-issue v6.1.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants