Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

Specifying ?master_timeout=-1 on an API which performs a cluster state
update means that the cluster state update task will never time out
while waiting in the pending tasks queue. However this parameter is also
re-used in a few places where a timeout of -1 means something else,
typically to timeout immediately. This commit fixes those places so that
?master_timeout=-1 consistently means to wait forever.

Specifying `?master_timeout=-1` on an API which performs a cluster state
update means that the cluster state update task will never time out
while waiting in the pending tasks queue. However this parameter is also
re-used in a few places where a timeout of `-1` means something else,
typically to timeout immediately. This commit fixes those places so that
`?master_timeout=-1` consistently means to wait forever.
@DaveCTurner DaveCTurner added >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.14.0 labels Apr 3, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Apr 3, 2024

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 3, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

TransportNodesSnapshotsStatus.TYPE,
new TransportNodesSnapshotsStatus.Request(nodesIds.toArray(Strings.EMPTY_ARRAY)).snapshots(snapshots)
.timeout(request.masterNodeTimeout()),
.timeout(request.masterNodeTimeout().millis() < 0 ? null : request.masterNodeTimeout()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we compare with -1 explicitly here and everywhere down below?
Since user can specify any negative number in theory. Or doc should say that any negative number works as infinite timeout (which sounds confusing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only accept -1 when parsing a time value:

if (value < -1) {
// -1 is magic, but reject any other negative values
throw new IllegalArgumentException(
"failed to parse setting ["
+ settingName
+ "] with value ["
+ initialInput
+ "] as a time value: negative durations are not supported"
);
}

However the convention elsewhere in the code seems to be to compare .millis() < 0. I think we could make this neater for sure, but that's a job for another day. The docs definitely don't need to mention any lenience in this area.

@DaveCTurner DaveCTurner requested a review from volodk85 April 10, 2024 05:58
Copy link
Contributor

@volodk85 volodk85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit ccbb5ba into elastic:main Apr 10, 2024
@DaveCTurner DaveCTurner deleted the 2024/04/03/master-node-timeout branch April 10, 2024 17:32
craigtaverner pushed a commit to craigtaverner/elasticsearch that referenced this pull request Apr 11, 2024
Specifying `?master_timeout=-1` on an API which performs a cluster state
update means that the cluster state update task will never time out
while waiting in the pending tasks queue. However this parameter is also
re-used in a few places where a timeout of `-1` means something else,
typically to timeout immediately. This commit fixes those places so that
`?master_timeout=-1` consistently means to wait forever.
arteam added a commit that referenced this pull request May 2, 2024
This test doesn't fail anymore, I've run it 1000 times locally. This test
got introduced in #107050, and I believe the test got fixed in #107675.
Unfortunately, the got muted before #107675 got merged, so I can't confirm
that #107675 fixed the test on CI.
arteam added a commit that referenced this pull request May 3, 2024
This test doesn't fail anymore, I've run it 1000 times locally.

This test got introduced in #107050, and I believe the test got fixed in #107675. Unfortunately, the got muted before #107675 got merged, so I can't confirm that PR actually fixed the test on CI.
@DaveCTurner DaveCTurner restored the 2024/04/03/master-node-timeout branch June 17, 2024 06:17
@DaveCTurner DaveCTurner deleted the 2024/04/03/master-node-timeout branch July 30, 2024 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants