Skip to content

Conversation

@hanbj
Copy link
Contributor

@hanbj hanbj commented May 31, 2019

Previous implementations were well thought out, but an index balance would cause all indexes to balance together.

@matriv matriv added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label May 31, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch ywelsch self-requested a review June 4, 2019 14:59
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your interest in contributing to ES. I don't understand what this change is trying to achieve, unfortunately. Can you provide some more explanations? In particular, why is it removing the code that allows index-level settings from overriding the cluster-level ones?

@hanbj
Copy link
Contributor Author

hanbj commented Jun 6, 2019

If an index is allowed to be rebalance, then this line of code "if (allocation. deciders (). canRebalance (allocation). type ()!= Type. YES)" returns false, and the code continues to execute downward. In the balanceByWeights () method, all indexes are traversed according to the weight rebalance, so I set cluster. routing. rebalance. enable: "none" is invalid.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what the purpose of this PR is, especially as it's disabling existing ES functionality. Is it to address a scaling issue? You mention a cluster with 300,000 shards on another PR, so I assume this is related. Why is this cluster making use of index-level rebalancing (i.e. the index.routing.rebalance.enable setting) if that is turning out to be a scaling bottleneck?

return allocation.decision(Decision.NO, NAME, "none rebalance are not allowed");
case PRIMARIES:
if (allocation.routingNodes().hasInactivePrimaries()) {
return allocation.decision(Decision.NO, NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this globally disable rebalancing when there are some inactive primaries of an unrelated index?

return allocation.decision(Decision.YES, NAME, "all primary shards is active and rebalance are allowed");
case REPLICAS:
if (allocation.routingNodes().hasInactiveShards()) {
return allocation.decision(Decision.NO, NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this globally disable rebalancing when there are inactive shards of some unrelated index?

case ALL:
return allocation.decision(Decision.YES, NAME, "all rebalance are allowed");
case NONE:
return allocation.decision(Decision.NO, NAME, "none rebalance are not allowed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said earlier, this changes the behavior of ES not to take the index-level property into account anymore when the cluster-level property is set.

@ywelsch
Copy link
Contributor

ywelsch commented Jul 1, 2019

I don't see a way to take this PR forward (as it is breaking existing functionality), which is why I'm closing this. @hanbj do let me know about the problem this is trying to address and in particular whether it is something that would be covered by other work (e.g. #42738)

@ywelsch ywelsch closed this Jul 1, 2019
@hanbj hanbj deleted the enabled branch May 29, 2020 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants