MasterService does not complete all tasks on shutdown

Today when the `MasterService` shuts down, it fails waiting tasks but does not necessarily fail the ongoing batch of tasks. For instance, we just drop the batch on the floor here:

https://github.com/elastic/elasticsearch/blob/356e1090e3688a07ce2d52789590741c67c12424/server/src/main/java/org/elasticsearch/cluster/service/MasterService.java#L209-L213

and we swallow rejections here:

https://github.com/elastic/elasticsearch/blob/356e1090e3688a07ce2d52789590741c67c12424/server/src/main/java/org/elasticsearch/cluster/service/MasterService.java#L398-L405

This behaviour has existed for a long time (i.e. it was not introduced by recent changes in the area such as #92021 and #94325) but I still think we should improve it. Note however that it does not work simply to fail the ongoing tasks on rejection: today with acked tasks we call (at most) one of `onAllNodesAcked()`, `onAckFailure()`, `onAckTimeout()`, or `ClusterStateTaskListener#onFailure()`, and implementations rely on this fact, but we may experience a rejection exception _after_ acking has completed. I think that means we have to delay the acking until the end of the publication, because the alternative would be to suppress `onFailure()` calls for acked tasks which seems like a confusing API choice that will lead to bugs.

	if (lifecycle.started() == false) {
	logger.debug("processing [{}]: ignoring, master service not started", summary);
	listener.onResponse(null);
	return;
	}

	assert publicationMayFail() \|\| (exception instanceof EsRejectedExecutionException esre && esre.isExecutorShutdown())
	: exception;
	clusterStateUpdateStatsTracker.onPublicationFailure(
	threadPool.rawRelativeTimeInMillis(),
	clusterStatePublicationEvent,
	0L
	);
	handleException(summary, publicationStartTime, newClusterState, exception);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MasterService does not complete all tasks on shutdown #94930

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MasterService does not complete all tasks on shutdown #94930

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions