Log when probe succeeds but full connection fails #51357

DaveCTurner · 2020-01-23T15:57:15Z

It is permitted for nodes to accept transport connections at addresses other
than their publish address, which allows a good deal of flexibility when
configuring discovery. However, it is not unusual for users to misconfigure
nodes to pick a publish address which is inaccessible to other nodes. We see
this happen a lot if the nodes are on different networks separated by a proxy,
or if the nodes are running in Docker with the wrong kind of network config.

In this case we offer no useful feedback to the user unless they enable
TRACE-level logs. It's particularly tricky to diagnose because if we test
connectivity between the nodes (using their discovery addresses) then all will
appear well.

This commit adds a WARN-level log if this kind of misconfiguration is detected:
the probe connection has succeeded (to indicate that we are really talking to a
healthy Elasticsearch node) but the followup connection attempt fails.

It also tidies up some loose ends in HandshakingTransportAddressConnector,
removing some TODOs that need not be completed, and registering its
accidentally-unregistered timeout settings.

Backport of #51304

It is permitted for nodes to accept transport connections at addresses other than their publish address, which allows a good deal of flexibility when configuring discovery. However, it is not unusual for users to misconfigure nodes to pick a publish address which is inaccessible to other nodes. We see this happen a lot if the nodes are on different networks separated by a proxy, or if the nodes are running in Docker with the wrong kind of network config. In this case we offer no useful feedback to the user unless they enable TRACE-level logs. It's particularly tricky to diagnose because if we test connectivity between the nodes (using their discovery addresses) then all will appear well. This commit adds a WARN-level log if this kind of misconfiguration is detected: the probe connection has succeeded (to indicate that we are really talking to a healthy Elasticsearch node) but the followup connection attempt fails. It also tidies up some loose ends in `HandshakingTransportAddressConnector`, removing some TODOs that need not be completed, and registering its accidentally-unregistered timeout settings.

elasticmachine · 2020-01-23T15:57:18Z

Pinging @elastic/es-distributed (:Distributed/Network)

DaveCTurner · 2020-01-23T16:42:20Z

@elasticmachine please run elasticsearch-ci/2

DaveCTurner added :Distributed Coordination/Network Http and internode communication implementations backport v7.7.0 labels Jan 23, 2020

DaveCTurner added the >enhancement label Jan 23, 2020

DaveCTurner merged commit 0152c40 into elastic:7.x Jan 23, 2020

DaveCTurner deleted the 2020-01-23-HandshakingTransportAddressConnector-fixes-7x branch January 23, 2020 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log when probe succeeds but full connection fails #51357

Log when probe succeeds but full connection fails #51357

Uh oh!

DaveCTurner commented Jan 23, 2020

Uh oh!

elasticmachine commented Jan 23, 2020

Uh oh!

DaveCTurner commented Jan 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Log when probe succeeds but full connection fails #51357

Log when probe succeeds but full connection fails #51357

Uh oh!

Conversation

DaveCTurner commented Jan 23, 2020

Uh oh!

elasticmachine commented Jan 23, 2020

Uh oh!

DaveCTurner commented Jan 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants