fix: do not let `_resolve/cluster` hang if remote is unresponsive #119516

pawankartik-elastic · 2025-01-03T15:29:51Z

Previously, _resolve/cluster would wait for a response from a remote as part of the connection strategy. If the remote were to be unresponsive, this API would wait until netty would terminate the connection with a handshake exception. The threshold for terminating the connection is 10s. This means that the API would wait for 10s before determining that the remote is unresponsive. After an internal discussion, this is now replaced with a fail fast strategy where a response is sent back to the user immediately rather than waiting for a connection termination.

Previously, `_resolve/cluster` would wait for a response from a remote as part of the connection strategy. If the remote were to be unresponsive, this API would wait until `netty` would terminate the connection with a handshake exception. The threshold for terminating the connection is `10s`. This means that the API would wait for `10s` before determining that the remote is unresponsive. This strategy is now replaced with a fail fast where a response is sent back to the user immediately rather than waiting for a connection termination.

elasticsearchmachine · 2025-01-03T15:30:16Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine · 2025-01-03T15:30:18Z

Hi @pawankartik-elastic, I've created a changelog YAML for you.

quux00

LGTM. Nice work researching this to have it culminate in being a one-liner!

…astic#119516) * fix: do not let `_resolve/cluster` hang if remote is unresponsive Previously, `_resolve/cluster` would wait for a response from a remote as part of the connection strategy. If the remote were to be unresponsive, this API would wait until `netty` would terminate the connection with a handshake exception. The threshold for terminating the connection is `10s`. This means that the API would wait for `10s` before determining that the remote is unresponsive. This strategy is now replaced with a fail fast where a response is sent back to the user immediately rather than waiting for a connection termination. * Update docs/changelog/119516.yaml

elasticsearchmachine · 2025-01-03T16:40:30Z

💚 Backport successful

Status	Branch	Result
✅	8.16
✅	8.17
✅	8.x

…19516) (#119526) * fix: do not let `_resolve/cluster` hang if remote is unresponsive Previously, `_resolve/cluster` would wait for a response from a remote as part of the connection strategy. If the remote were to be unresponsive, this API would wait until `netty` would terminate the connection with a handshake exception. The threshold for terminating the connection is `10s`. This means that the API would wait for `10s` before determining that the remote is unresponsive. This strategy is now replaced with a fail fast where a response is sent back to the user immediately rather than waiting for a connection termination. * Update docs/changelog/119516.yaml

…19516) (#119528) * fix: do not let `_resolve/cluster` hang if remote is unresponsive Previously, `_resolve/cluster` would wait for a response from a remote as part of the connection strategy. If the remote were to be unresponsive, this API would wait until `netty` would terminate the connection with a handshake exception. The threshold for terminating the connection is `10s`. This means that the API would wait for `10s` before determining that the remote is unresponsive. This strategy is now replaced with a fail fast where a response is sent back to the user immediately rather than waiting for a connection termination. * Update docs/changelog/119516.yaml

…19516) (#119527) * fix: do not let `_resolve/cluster` hang if remote is unresponsive Previously, `_resolve/cluster` would wait for a response from a remote as part of the connection strategy. If the remote were to be unresponsive, this API would wait until `netty` would terminate the connection with a handshake exception. The threshold for terminating the connection is `10s`. This means that the API would wait for `10s` before determining that the remote is unresponsive. This strategy is now replaced with a fail fast where a response is sent back to the user immediately rather than waiting for a connection termination. * Update docs/changelog/119516.yaml

…astic#119516) (elastic#119527) * fix: do not let `_resolve/cluster` hang if remote is unresponsive Previously, `_resolve/cluster` would wait for a response from a remote as part of the connection strategy. If the remote were to be unresponsive, this API would wait until `netty` would terminate the connection with a handshake exception. The threshold for terminating the connection is `10s`. This means that the API would wait for `10s` before determining that the remote is unresponsive. This strategy is now replaced with a fail fast where a response is sent back to the user immediately rather than waiting for a connection termination. * Update docs/changelog/119516.yaml

pawankartik-elastic added >bug auto-backport Automatically create backport pull requests when merged Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.16.0 :Search Foundations/Search Catch all for Search Foundations v9.0.0 v8.17.0 v8.18.0 labels Jan 3, 2025

pawankartik-elastic requested a review from quux00 January 3, 2025 15:29

Update docs/changelog/119516.yaml

7897d4b

quux00 approved these changes Jan 3, 2025

View reviewed changes

pawankartik-elastic merged commit d2d0636 into elastic:main Jan 3, 2025
16 checks passed

pawankartik-elastic deleted the pkar/resolve-cluster-hang-fix branch January 3, 2025 16:38

This was referenced Jan 3, 2025

[8.16] fix: do not let _resolve/cluster hang if remote is unresponsive (#119516) #119526

Merged

[8.17] fix: do not let _resolve/cluster hang if remote is unresponsive (#119516) #119527

Merged

pawankartik-elastic mentioned this pull request Jan 3, 2025

[8.x] fix: do not let _resolve/cluster hang if remote is unresponsive (#119516) #119528

Merged

DaveCTurner mentioned this pull request Jan 3, 2025

Clarify the behavior of remote/info and resolve/cluster for connected status of remotes #118993

Merged

pawankartik-elastic mentioned this pull request Mar 6, 2025

Revert fail-fast disconnect strategy for _resolve/cluster #124241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: do not let `_resolve/cluster` hang if remote is unresponsive #119516

fix: do not let `_resolve/cluster` hang if remote is unresponsive #119516

Uh oh!

pawankartik-elastic commented Jan 3, 2025

Uh oh!

elasticsearchmachine commented Jan 3, 2025

Uh oh!

elasticsearchmachine commented Jan 3, 2025

Uh oh!

quux00 left a comment

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: do not let _resolve/cluster hang if remote is unresponsive #119516

fix: do not let _resolve/cluster hang if remote is unresponsive #119516

Uh oh!

Conversation

pawankartik-elastic commented Jan 3, 2025

Uh oh!

elasticsearchmachine commented Jan 3, 2025

Uh oh!

elasticsearchmachine commented Jan 3, 2025

Uh oh!

quux00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 3, 2025

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: do not let `_resolve/cluster` hang if remote is unresponsive #119516

fix: do not let `_resolve/cluster` hang if remote is unresponsive #119516