-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
We have two kinds of nodes: those with ssds (used for indexing and search recent data), those with large spinning disks (used for archiving old indices).
I'd like to setup a mechanism to move old indices from ssds to spinning disks.
The first solution uses reroute command in cluster api. However it feels unnatural since you have to do it shard by shard and decide the target node.
What I want to achieve is the following:
- stick recent indices (the current one being written) to ssds. They have 2 copies.
- at some point (disk on ssds is above 65%), one copy is moved to larger boxes (1 copy is still on ssd to help search, 1 copy on large box)
- when disk is scarce on ssd boxes (90%), we simply drop the copy present on ssd. Since we don't care that much of old data having only one copy is not an issue.
I have tried to implement this with shard awareness allocation and allocation filtering but it does not seem to work as expected.
Nodes have flavor attribute depending on their hardware (ssd or iodisk).
Cluster is using shard awareness based on flavor attribute (cluster.routing.allocation.awareness.attributes: flavor).
- My index template has
routing.allocation.require: ssdto impose two have all copies on ssds first. - At some point, I drop the requirement (effectively `routing.allocation.require: *``). I expect flavor awareness to move one copy to large (iodisk) boxes.
- At a later point, I'll set
number_of_replicasto 0 and changerouting.allocation.requiretoiodiskto drop the shard copy on ssds
Sadly allocation filtering and shard awareness do not seem to cooperate well :
when an new index is created, one copy goes to ssds and the other is not allocated anywhere (index stays in yellow state).
Using curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"logger.cluster.routing.allocation":"trace"}}',
I have observed what happen when a new index is created.
[2014-10-16 06:53:19,462][TRACE][cluster.routing.allocation.decider] [bungeearchive01-par.storage.criteo.preprod] Can not allocate [[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]] on node [qK34VLdhTferCQs2oNJOyg] due to [SameShardAllocationDecider]
[2014-10-16 06:53:19,463][TRACE][cluster.routing.allocation.decider] [bungeearchive01-par.storage.criteo.preprod] Can not allocate [[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]] on node [gE7OTgevSUuoj44RozxK0Q] due to [AwarenessAllocationDecider]
[2014-10-16 06:53:19,463][TRACE][cluster.routing.allocation.decider] [bungeearchive01-par.storage.criteo.preprod] Can not allocate [[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]] on node [Y2k9qXfsTx6X2iQTxg9RBQ] due to [AwarenessAllocationDecider]
[2014-10-16 06:53:19,463][TRACE][cluster.routing.allocation.decider] [bungeearchive01-par.storage.criteo.preprod] Can not allocate [[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]] on node [FwWc2XPPRWuje2KH6AlDEQ] due to [FilterAllocationDecider]
[2014-10-16 06:53:19,492][TRACE][cluster.routing.allocation.allocator] [bungeearchive01-par.storage.criteo.preprod] No Node found to assign shard [[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]]
This transcript shows that
- shard 3 primary replica is on node qK34VLdhTferCQs2oNJOyg (flavor:ssd) which prevent its copy to placed there
- it cannot be placed on gE7OTgevSUuoj44RozxK0Q (ssd as well) because it tries to maximizes dispersion accross flavors
- it cannot be placed on Y2k9qXfsTx6X2iQTxg9RBQ for the same reason
- it cannot be placed on FwWc2XPPRWuje2KH6AlDEQ (flavor: iodisk) because of the filter
Questions:
- am I doing it wrong?
- should I stick with a set of reroute command?
- are awareness and filtering supposed to cooperate?