Close the light connection when removing an address from TransportClient #26505
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There seems to be a connection leak in the TransportClient when using removeTransportAddress()
Stating the problem
When adding an address with
TransportClient.addTransportAddresses(), theTransportClientNodesServiceadds a DiscoveryNode to listedNodes and ends up callingnodeSampler.sample(). In my case I have theSimpleNodeSampler.For the new address (not already connected), the sampling is doing a 'light connection' with
TransportService.connectToNodeLight(), which delegates to an instance ofTransport. In my case, the Transport impl is a NettyTransport, which will open a Channel to the target node.So here we have the newly created DiscoveryNode referenced in the listedNodes List of the TransportClientNodesService and a Channel cached in the NettyTransport
's connectedNodes map.
When removing an address from the
TransportClient, the TransportClientNodesService is doing the following:What I believe is missing is closing the connection when removing the node from listedNodes, because we have an opened Channel cached in NettyTransport.
This PR fixes this by asking the Transport to disconnect from the listed node upon removal to close the light connection.
How to reproduce
I am using an Elastic Cloud cluster, doing periodically a DNS resolution of the name and adding/removing addresses from the TransportClient accordingly.
For each address removed, a connection is leaked, and is visible with tools like lsof or with a debugger.
Closing note
I am using Elasticsearch 2.4, and while I know that upgrading to 5 is the way froward, I don't have time to do it right now, so I would appreciate a backport of this fix to the 2.4 branch to avoid having to build and deploy a fork of the client.