-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Distributed Coordination/Discovery-PluginsAnything related to our integration plugins with EC2, GCP and AzureAnything related to our integration plugins with EC2, GCP and Azure>enhancementhelp wantedadoptmeadoptme
Description
If you start an elasticsearch node, that has trouble with DNS, it will never recover from this and continue spitting exceptions, even if the DNS problems are fixed. The reason for this is, that in UnicastZenPing constructor we have the following code:
for (String host : hosts) {
try {
TransportAddress[] addresses = transportService.addressesFromString(host);
// we only limit to 1 addresses, makes no sense to ping 100 ports
for (int i = 0; (i < addresses.length && i < LIMIT_PORTS_COUNT); i++) {
configuredTargetNodes.add(new DiscoveryNode(UNICAST_NODE_PREFIX + unicastNodeIdGenerator.incrementAndGet() + "#", addresses[i], version.minimumCompatibilityVersion()));
}
} catch (Exception e) {
throw new ElasticsearchIllegalArgumentException("Failed to resolve address for [" + host + "]", e);
}
}
this.configuredTargetNodes = configuredTargetNodes.toArray(new DiscoveryNode[configuredTargetNodes.size()]);
transportService.addressesFromString(host) calls InetSocketAddress which in turn tries to resolve the applied hostname and fails, thus marking returning InetSocketAddress.isResolved() as false - forever. This method is used by netty to check if connecting to the endpoint makes sense at all.
How to reproduce locally
If you want to reproduce, take this config and disable network on your system (will work when network is enabled, as localhost.spinscale.de resolves to 127.0.0.1.
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["localhost.spinscale.de:9300" ]
Fix proposal
- First, remove the exception output, catch
UnresolvedAddressExceptioninUnicastZenPing.sendPings()and log a single line, telling the problem including the hostname - Make sure the
InetAddressand itsisResolved()method is not cached. Not sure what is the best approach here, either create the InetSocketAddress object before each connect try or maybe there are some configurable properties around this
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/Discovery-PluginsAnything related to our integration plugins with EC2, GCP and AzureAnything related to our integration plugins with EC2, GCP and Azure>enhancementhelp wantedadoptmeadoptme