Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Feb 8, 2018

Currently the master node logs a warning message whenever it receives a failed shard request. However, this can be noisy because

  • Multiple failed shard requests can be issued for a single shard
  • Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing shard action actually happens. This is another prerequisite step in order to not ignore the shard not-available exceptions in the replication.

Relates #28534

if (failedShardEntry.markAsStale()) {
allocation.removeAllocationId(failedShard);
}
logger.warn(new ParameterizedMessage("failing shard [{}]", failedShardEntry), failedShardEntry.getFailure());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add logging information for the "mark as stale" part here as well as in removeStaleIdsWithoutRoutings?

@dnhatn
Copy link
Member Author

dnhatn commented Feb 8, 2018

@ywelsch, I've added a logging for stale shards. The failedShardEntry already includes the mark as stale information. Please have another look. Thank you!

indexMetaDataBuilder.putInSyncAllocationIds(shardNumber, remainingInSyncAllocations);
}
// Only log the stale shards which have been actually removed.
if (oldInSyncAllocations.size() != remainingInSyncAllocations.size()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not necessary. idsToRemove already contains the info that's needed (it does not contain non-existing ids, you can even add an assertion for that if you like).
Also I would change the message to "{} marking unavailable shards as stale: {}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b0cd954

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Feb 8, 2018

Thanks @ywelsch for reviewing.

@dnhatn dnhatn changed the title Only log warning of failing shard when actually failing it Only log warning when actually failing shards Feb 8, 2018
@dnhatn dnhatn merged commit 5b8870f into elastic:master Feb 8, 2018
@dnhatn dnhatn deleted the log-failed-shard branch February 8, 2018 20:37
dnhatn added a commit that referenced this pull request Feb 8, 2018
Currently the master node logs a warning message whenever it receives a 
failed shard request. However, this can be noisy because

- Multiple failed shard requests can be issued for a single shard
- Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing 
shard action actually happens. This is another prerequisite step in 
order to not ignore the shard not-available exceptions in the
replication.

Relates #28534
@lcawl lcawl added :Search/Search Search-related issues that do not fall into other categories and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Search/Search Search-related issues that do not fall into other categories labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement v6.3.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants