Close the light connection when removing an address from TransportClient #26505

obourgain · 2017-09-05T09:14:59Z

There seems to be a connection leak in the TransportClient when using removeTransportAddress()

Stating the problem

When adding an address with TransportClient.addTransportAddresses(), the TransportClientNodesService adds a DiscoveryNode to listedNodes and ends up calling nodeSampler.sample(). In my case I have the SimpleNodeSampler.
For the new address (not already connected), the sampling is doing a 'light connection' with TransportService.connectToNodeLight(), which delegates to an instance of Transport. In my case, the Transport impl is a NettyTransport, which will open a Channel to the target node.

So here we have the newly created DiscoveryNode referenced in the listedNodes List of the TransportClientNodesService and a Channel cached in the NettyTransport
's connectedNodes map.

When removing an address from the TransportClient, the TransportClientNodesService is doing the following:

removing the address from the listedNodes
removing the address from the nodes and disconnecting
calling the NodeSampler to refresh

What I believe is missing is closing the connection when removing the node from listedNodes, because we have an opened Channel cached in NettyTransport.
This PR fixes this by asking the Transport to disconnect from the listed node upon removal to close the light connection.

How to reproduce

I am using an Elastic Cloud cluster, doing periodically a DNS resolution of the name and adding/removing addresses from the TransportClient accordingly.
For each address removed, a connection is leaked, and is visible with tools like lsof or with a debugger.

Closing note

I am using Elasticsearch 2.4, and while I know that upgrading to 5 is the way froward, I don't have time to do it right now, so I would appreciate a backport of this fix to the 2.4 branch to avoid having to build and deploy a fork of the client.

elasticmachine · 2017-09-05T09:15:01Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

rjernst · 2017-09-07T02:30:52Z

@elasticmachine test this please

obourgain · 2017-09-07T07:44:29Z

After further investigation using the branch 5.x, the issue seems to have been fixed.
After looking into the code it was fixed in #22828, so my PR is not relevant w.r.t master, but still I would appreciate if it was merged in 2.4 to avoid having to maintain a custom fork until I can migrate to 5.x

javanna · 2017-11-06T16:00:23Z

sorry but we don't plan on backporting #22828 to 2.4. Thanks a lot for your great analysis though and for sending your PR. Closing as this has been fixed already from 5.4 onwards.

Close the light connection when removing an address from TransportClient

20210e4

bleskes self-assigned this Sep 6, 2017

javanna closed this Nov 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Close the light connection when removing an address from TransportClient #26505

Close the light connection when removing an address from TransportClient #26505

Uh oh!

obourgain commented Sep 5, 2017

Uh oh!

elasticmachine commented Sep 5, 2017

Uh oh!

rjernst commented Sep 7, 2017

Uh oh!

obourgain commented Sep 7, 2017

Uh oh!

javanna commented Nov 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Close the light connection when removing an address from TransportClient #26505

Close the light connection when removing an address from TransportClient #26505

Uh oh!

Conversation

obourgain commented Sep 5, 2017

Stating the problem

How to reproduce

Closing note

Uh oh!

elasticmachine commented Sep 5, 2017

Uh oh!

rjernst commented Sep 7, 2017

Uh oh!

obourgain commented Sep 7, 2017

Uh oh!

javanna commented Nov 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants