Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

New allocation strategy is not going to retry failed shards. Update the
test to not rely on that behavior

Rel: #86429

New allocation strategy is not going to retry failed shards. Update the
test to not rely on that behaviour
@idegtiarenko idegtiarenko added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.4.0 labels Jul 6, 2022
@idegtiarenko idegtiarenko requested a review from DaveCTurner July 6, 2022 14:20
@idegtiarenko
Copy link
Contributor Author

Just confirmed this also passes on the feature branch

@idegtiarenko idegtiarenko marked this pull request as ready for review July 6, 2022 14:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@idegtiarenko
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/part-2

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a couple of comments.

.put("index.routing.allocation.include._name", primariesNode.getName() + "," + unluckyNode.getName())
)
.get();
ensureYellow("test");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this would pass even if the index reaches green health. I think we want to wait for no initialising shards and then genuinely assert that none of the shards are on the unlucky node.

Copy link
Contributor Author

@idegtiarenko idegtiarenko Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this call is waiting for no initializing shards internally. Added more detailed assertion below.
UPD: actually it is not, fixing...

@idegtiarenko idegtiarenko requested a review from DaveCTurner July 7, 2022 15:15
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@idegtiarenko idegtiarenko merged commit f99ee51 into elastic:master Jul 11, 2022
@idegtiarenko idegtiarenko deleted the fix_corruption_on_network_layer branch July 11, 2022 06:07
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 12, 2022
* upstream/master:
  Pass IndexMetadata to AllocationDecider.can_remain (elastic#88453)
  [TSDB] Cache rollup bucket timestamp to reduce rounding cost (elastic#88420)
  Correct some typos/mistakes in comments/docs (elastic#88446)
  Make ClusterInfo use immutable maps in all cases (elastic#88447)
  Reduce map lookups (elastic#88418)
  Don't index geo_shape field in AbstractBuilderTestCase (elastic#88437)
  Remove usages of TestGeoShapeFieldMapperPlugin from enrich module (elastic#88440)
  Fix test memory leak (elastic#88362)
  Improve error when sorting on incompatible types (elastic#88399)
  Remove usages of BucketCollector#getLeafCollector(LeafReaderContext) (elastic#88414)
  Mute ReactiveStorageIT::testScaleWhileShrinking (elastic#88431)
  Clarify snapshot docs on archive indices (elastic#88417)
  [Stack Monitoring] Switch cgroup memory fields to keyword (elastic#88260)
  Fix RealmIdentifier XContent parser (elastic#88410)
  Make LoggedExec gradle task configuration cache compatible (elastic#87621)
  Update CorruptedFileIT so that it passes with new allocation strategy (elastic#88314)
  Update RareClusterStateIT to work with the new shards allocator (elastic#87922)
  Ensure CreateApiKey always creates a new document (elastic#88413)

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/RollupShardIndexer.java
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jul 20, 2022
`POST _cluster/reroute?retry_failed` doesn't reset the failure counter
on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards
isn't quite enough to ensure that we've finished all possible retries
because there could instead be an ongoing async fetch.

This commit fixes this using a `ClusterStateObserver` to observe the
retry counter instead of using the cluster health action.

Relates elastic#88314
Closes elastic#88615
DaveCTurner added a commit that referenced this pull request Jul 20, 2022
`POST _cluster/reroute?retry_failed` doesn't reset the failure counter
on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards
isn't quite enough to ensure that we've finished all possible retries
because there could instead be an ongoing async fetch.

This commit fixes this using a `ClusterStateObserver` to observe the
retry counter instead of using the cluster health action.

Relates #88314
Closes #88615
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants