-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version:
Reproducible on:
- 5.1.1
- 2.3.3
and I suspect all versions in between.
Plugins installed: []
x-pack (this issue should be reproducible without this plugin installed.)
JVM version:
1.8.0_111
OS version:
Ubuntu - Linux peter-Inspiron-7520 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Also tested on OSX Sierra
Description of the problem including expected versus actual behavior:
When allocating shards away from more than one node using cluster.routing.allocation.exclude._ip, specifying a space after the comma in the comma delimited list causes only the first ip to be used in the exclusion (though all are listed in cluster settings), all subsequent ips in the list are ignored.
My test setup was a couple of machines, with one machine running two instances of ES on different ports and the other with a single instance of elasticsearch, clustered over a network.
The machine with two instances used the same ip address but different ports and my testing was to exclude the shards from allocating to the machine running the single instance.
The method I used for this was to exclude both an unused ip (ot used by any node in the cluster) and the ip of the single instance host, by listing the unused ip first in the list.
The effect was that the first ip in the list was the only one used for the exclusion if using a space after the comma in the list. Though cluster settings showed the entire string when retrieving the settings again. So it was Elasticsearch reading that string but only using the first ip in the list.
Without using any spaces, all the ips in the list were successfully excluded.
I suggest we either:
- have Elasticsearch strip out the whitespace after commas
- or we specify a strict format that does not allow spaces after the commas and reject any calls to the API that send an "invalid" string with spaces after the commas with an error message telling the user why it was rejected
- and with either of the above options, validate that the ips in the list are valid ipv4 or ipv6 ip addresses and throw an error if they are not valid (if we don't do this already)
The current user experience with this configuration setting is poor and needs to be fixed so that Elasticsearch does not "fail" silently and also behaves predictably.
Documentation should also be improved to reflect detail around how Elasticsearch works after whatever method is chosen to fix the bug.
Steps to reproduce:
# exclude list using fake first ip and real second ip
#
# first with space after comma, nothing happens since only the first (unused) ip is recognised in the cluster allocation exclusion
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "192.168.178.35, 192.168.178.21"
}
}
# second with the space after the comma removed, which ensures all ips are recognized in the allocation exclusion
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "192.168.178.35,192.168.178.21"
}
}
# reset to no ips to reallocate shards evenly again
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : ""
}
}
Provide logs (if relevant):
Logs only show:
[2016-12-21T16:51:37,778][INFO ][o.e.c.s.ClusterSettings ] [mac] updating [cluster.routing.allocation.exclude.] from [{}] to [{"_ip":"192.168.178.35, 192.168.178.21"}]
[2016-12-21T16:51:58,775][INFO ][o.e.c.s.ClusterSettings ] [mac] updating [cluster.routing.allocation.exclude.] from [{"_ip":"192.168.178.35, 192.168.178.21"}] to [{"_ip":"192.168.178.35,192.168.178.21"}]
[2016-12-21T16:53:16,101][INFO ][o.e.c.s.ClusterSettings ] [mac] updating [cluster.routing.allocation.exclude.] from [{"_ip":"192.168.178.35,192.168.178.21"}] to [{"_ip":""}]