Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

It is permitted for nodes to accept transport connections at addresses other
than their publish address, which allows a good deal of flexibility when
configuring discovery. However, it is not unusual for users to misconfigure
nodes to pick a publish address which is inaccessible to other nodes. We see
this happen a lot if the nodes are on different networks separated by a proxy,
or if the nodes are running in Docker with the wrong kind of network config.

In this case we offer no useful feedback to the user unless they enable
TRACE-level logs. It's particularly tricky to diagnose because if we test
connectivity between the nodes (using their discovery addresses) then all will
appear well.

This commit adds a WARN-level log if this kind of misconfiguration is detected:
the probe connection has succeeded (to indicate that we are really talking to a
healthy Elasticsearch node) but the followup connection attempt fails.

It also tidies up some loose ends in HandshakingTransportAddressConnector,
removing some TODOs that need not be completed, and registering its
accidentally-unregistered timeout settings.

Backport of #51304

It is permitted for nodes to accept transport connections at addresses other
than their publish address, which allows a good deal of flexibility when
configuring discovery. However, it is not unusual for users to misconfigure
nodes to pick a publish address which is inaccessible to other nodes. We see
this happen a lot if the nodes are on different networks separated by a proxy,
or if the nodes are running in Docker with the wrong kind of network config.

In this case we offer no useful feedback to the user unless they enable
TRACE-level logs. It's particularly tricky to diagnose because if we test
connectivity between the nodes (using their discovery addresses) then all will
appear well.

This commit adds a WARN-level log if this kind of misconfiguration is detected:
the probe connection has succeeded (to indicate that we are really talking to a
healthy Elasticsearch node) but the followup connection attempt fails.

It also tidies up some loose ends in `HandshakingTransportAddressConnector`,
removing some TODOs that need not be completed, and registering its
accidentally-unregistered timeout settings.
@DaveCTurner DaveCTurner added :Distributed Coordination/Network Http and internode communication implementations backport v7.7.0 labels Jan 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Network)

@DaveCTurner
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/2

@DaveCTurner DaveCTurner merged commit 0152c40 into elastic:7.x Jan 23, 2020
@DaveCTurner DaveCTurner deleted the 2020-01-23-HandshakingTransportAddressConnector-fixes-7x branch January 23, 2020 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Distributed Coordination/Network Http and internode communication implementations >enhancement v7.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants