Skip to content

Conversation

@jasontedor
Copy link
Member

Today we eagerly resolve unicast hosts. This means that if DNS changes,
we will never find the host at the new address. Moreover, a single host
failng to resolve causes startup to abort. This commit introduces lazy
resolution of unicast hosts. If a DNS entry changes, there is an
opportunity for the host to be discovered. Note that under the Java
security manager, there is a default positive cache of infinity for
resolved hosts; this means that if a user does want to operate in an
environment where DNS can change, they must adjust
networkaddress.cache.ttl in their security policy. And if a host fails
to resolve, we warn log the hostname but continue pinging other
configured hosts.

When doing DNS resolutions for unicast hostnames, we wait until the DNS
lookups timeout. This appears to be forty-five seconds on modern JVMs,
and it is not configurable. If we do these serially, the cluster can be
blocked during ping for a lengthy period of time. This commit introduces
doing the DNS lookups in parallel on the generic thread pool, and adds a
user-configurable timeout for these lookups.

Closes #14441, closes #16412

@jasontedor jasontedor force-pushed the lazy-resolve branch 4 times, most recently from 88368c0 to 4ab68db Compare November 17, 2016 17:41
Today we eagerly resolve unicast hosts. This means that if DNS changes,
we will never find the host at the new address. Moreover, a single host
failng to resolve causes startup to abort. This commit introduces lazy
resolution of unicast hosts. If a DNS entry changes, there is an
opportunity for the host to be discovered. Note that under the Java
security manager, there is a default positive cache of infinity for
resolved hosts; this means that if a user does want to operate in an
environment where DNS can change, they must adjust
networkaddress.cache.ttl in their security policy. And if a host fails
to resolve, we warn log the hostname but continue pinging other
configured hosts.
@jasontedor jasontedor force-pushed the lazy-resolve branch 2 times, most recently from e3dc063 to acdc4bc Compare November 17, 2016 19:22
@jasontedor
Copy link
Member Author

@bleskes @s1monw I think this is ready for review; would you take a look?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might decide this is okay, but one question is whether these lookups should be throttled at all as right now it is just spamming the generic thread pool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should use unicastConnectExecutor for this and keep it contained (and throttled).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I went with this approach is since this method is also used by the file-based unicast hosts provider so it would need its own executor and we do not have a close method for plugins to allow this executor be shutdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually plugins can implement Closeable and they will be closed when the node shuts down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was looking for it on the Plugins base class, but I see we do close a plugin if it implements java.io.Closeable. I think this should be made more clear and opened #21669.

When doing DNS resolutions for unicast hostnames, we wait until the DNS
lookups timeout. This appears to be forty-five seconds on modern JVMs,
and it is not configurable. If we do these serially, the cluster can be
blocked during ping for a lengthy period of time. This commit introduces
doing the DNS lookups in parallel on the generic thread pool, and adds a
user-configurable timeout for these lookups.
This commit properly sets up test resources for UnicastZenPingTests.
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code and left some minor comments. I have some concerns with the approach because it now delays ping to the extent of the host resolve timeouts, regardless of the incoming ping timeout. I wonder if we should use something like SingleObjectCache to do this refreshes in the background.

private final Map<Integer, SendPingsHandler> receivedResponses = newConcurrentMap();

// a list of temporal responses a node will return for a request (holds requests from other configuredTargetNodes)
// a list of temporal responses a node will return for a request (holds requests from other configuredHosts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's just other nodes - we are pinged by any node in the previous cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 4868321.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should use unicastConnectExecutor for this and keep it contained (and throttled).

logger.warn(message, e);
}
} else {
logger.warn("timed out resolving host [{}]", hostname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add the timeout value here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed f6203d7.

This was just a silly oops, this commit fixes the swallowing of an
interrupted exception in the file-based unicast hosts provider.
This commit adds the resolve timeout to the warn log message displayed
when we timeout resolving unicast hosts.
This commit fixes an incorrect comment in UnicastZenPing; the comment
incorrectly specified which responses is held by the temporal responses.
@jasontedor
Copy link
Member Author

I looked at the code and left some minor comments. I have some concerns with the approach because it now delays ping to the extent of the host resolve timeouts, regardless of the incoming ping timeout.

The timeout can be set low (and maybe the default timeout should be set lower). The DNS entries are cached indefinitely (by the JVM) unless someone changes the security settings, so this is only an issue for users that have changing DNS entries and have changed the DNS cache TTL.

I wonder if we should use something like SingleObjectCache to do this refreshes in the background.

I'm not sure, if someone loses their cluster because DNS changes now they have to wait for the cache TTL to expire and for a background refresh to complete before their cluster can reform.

* master: (42 commits)
  Add support for merging custom meta data in tribe node (elastic#21552)
  [DOCS] Show EC2's auto attribute (elastic#21474)
  Add information about the removal of store throttling to the migration guide.
  Add a recommendation against large documents to the docs. (elastic#21652)
  Add indices options tests to search api REST tests (elastic#21701)
  Fixing indentation in geospatial querying example. (elastic#21682)
  Fix typo in filters aggregation docs (elastic#21690)
  Add BWC layer for Exceptions (elastic#21694)
  Add checkstyle rule to forbid empty javadoc comments (elastic#20881)
  Docs: Added offline install link for discovery-file plugin
  remove pointless catch exception in TransportSearchAction (elastic#21689)
  Rename ClusterState#lookupPrototypeSafe to `lookupPrototype` and remove previous "unsafe" unused variant (elastic#21686)
  Use a buffer to do character to byte conversion in StreamOutput#writeString (elastic#21680)
  Fix integer overflows when dealing with templates. (elastic#21628)
  Fix highlighting on a stored keyword field (elastic#21645)
  Set execute permissions for native plugin programs (elastic#21657)
  adjust visibility of DiscoveryNodes.Delta constructor
  Remove unused DiscoveryNodes.Delta constructor
  Remove unused DiscoveryNode#removeDeadMembers public method
  Remove minNodeVersion and corresponding public `getSmallestVersion` getter method from DiscoveryNodes
  ...
This commit refactors the resolution of unicast hosts from hostnames to
IP addresses to use an executor service instead of the generic thread
pool.
@jasontedor
Copy link
Member Author

@bleskes I pushed a28dc6e and have now responded to all your feedback, including our discussion in another channel where we have agreed to keep the current approach.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left q regarding a potential simplification. I also really like the test change.

For posterity - the reason we opted to go with resolve-on-ping rather and than resolve-in-thebackground is that in FileDiscoveryPlugin we also do dns resolve on ping. Changing that will need a big change but without it we felt there is no need to go on a heroic effort to not delays pings on a slow dns. We may revisit in the future.

for (TransportAddress address : addresses) {
discoveryNodes.add(new DiscoveryNode(idGenerator.get(), address, emptyMap(), emptySet(),
Version.CURRENT.minimumCompatibilityVersion()));
public static List<DiscoveryNode> resolveDiscoveryNodes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of the ResolvedHostname abstraction - what am I missing?

diff --git a/core/src/main/java/org/elasticsearch/discovery/zen/UnicastZenPing.java b/core/src/main/java/org/elasticsearch/discovery/zen/UnicastZenPing.java
index 61bf1cc..3d3495e 100644
--- a/core/src/main/java/org/elasticsearch/discovery/zen/UnicastZenPing.java
+++ b/core/src/main/java/org/elasticsearch/discovery/zen/UnicastZenPing.java
@@ -242,40 +242,36 @@ public class UnicastZenPing extends AbstractComponent implements ZenPing {
             throw new IllegalArgumentException("resolve timeout must be non-negative but was [" + resolveTimeout + "]");
         }
         // create tasks to submit to the executor service; we will wait up to resolveTimeout for these tasks to complete
-        final List<Callable<ResolvedHostname>> callables =
+        final List<Callable<TransportAddress[]>> callables =
             hosts.stream().map(hn -> lookup(hn, transportService, limitPortCounts)).collect(Collectors.toList());
-        final List<Future<ResolvedHostname>> futures =
+        final List<Future<TransportAddress[]>> futures =
             executorService.invokeAll(callables, resolveTimeout.nanos(), TimeUnit.NANOSECONDS);
         final List<DiscoveryNode> discoveryNodes = new ArrayList<>();
         // ExecutorService#invokeAll guarantees that the futures are returned in the iteration order of the tasks so we can associate the
         // hostname with the corresponding task by iterating together
         final Iterator<String> it = hosts.iterator();
-        for (final Future<ResolvedHostname> future : futures) {
+        for (final Future<TransportAddress[]> future : futures) {
             final String hostname = it.next();
-            if (!future.isCancelled()) {
+            if (future.isCancelled()) {
+                logger.warn("timed out after [{}] resolving host [{}]", resolveTimeout, hostname);
+            } else {
+                assert future.isDone(); // guaranteed by the invokeAll
                 try {
-                    final ResolvedHostname resolvedHostname = future.get();
-                    if (resolvedHostname.isSuccess()) {
-                        logger.trace("resolved host [{}] to {}", hostname, resolvedHostname.addresses());
-                        for (final TransportAddress address : resolvedHostname.addresses()) {
-                            discoveryNodes.add(
-                                new DiscoveryNode(
-                                    idGenerator.get(),
-                                    address,
-                                    emptyMap(),
-                                    emptySet(),
-                                    Version.CURRENT.minimumCompatibilityVersion()));
-                        }
-                    } else {
-                        final String message = "failed to resolve host [" + hostname + "]";
-                        logger.warn(message, resolvedHostname.failure());
+                    final TransportAddress[] addresses = future.get();
+                    logger.trace("resolved host [{}] to {}", hostname, addresses);
+                    for (final TransportAddress address : addresses) {
+                        discoveryNodes.add(
+                            new DiscoveryNode(
+                                idGenerator.get(),
+                                address,
+                                emptyMap(),
+                                emptySet(),
+                                Version.CURRENT.minimumCompatibilityVersion()));
                     }
                 } catch (final ExecutionException e) {
                     final String message = "failed to resolve host [" + hostname + "]";
                     logger.warn(message, e);
                 }
-            } else {
-                logger.warn("timed out after [{}] resolving host [{}]", resolveTimeout, hostname);
             }
         }
         return discoveryNodes;
@@ -289,17 +285,11 @@ public class UnicastZenPing extends AbstractComponent implements ZenPing {
      * @param limitPortCounts  the port count limit
      * @return a callable that can be used to submit to an executor service
      */
-    private static Callable<ResolvedHostname> lookup(
+    private static Callable<TransportAddress[]> lookup(
         final String host,
         final TransportService transportService,
         final int limitPortCounts) {
-        return () -> {
-            try {
-                return ResolvedHostname.success(transportService.addressesFromString(host, limitPortCounts));
-            } catch (final UnknownHostException e) {
-                return ResolvedHostname.failure(e);
-            }
-        };
+        return () -> transportService.addressesFromString(host, limitPortCounts);
     }
 
     @Override

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I pushed e2fc5a2.


@Override
public TransportAddress[] addressesFromString(String address, int perAddressLimit) throws Exception {
public TransportAddress[] addressesFromString(String address, int perAddressLimit) throws UnknownHostException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.put("cluster.name", "mismatch")
.put(TransportSettings.PORT.getKey(), startPort + "-" + endPort).build();
// use ephemeral ports
final Settings settings = Settings.builder().put("cluster.name", "test").put(TransportSettings.PORT.getKey(), 0).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

@Override
public void ping(final PingListener listener, final TimeValue duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some docs about what falls under duration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 156a49b.

This commit simplifies the handling of hostname lookups in
UnicastZenPing, removing an unnecessary abstraction and, subsequently,
an unnecessary method.
This commit sets the deafult resolve timeout on DNS lookups in
UnicastZenPing to five seconds.
This commit adds Javadocs for UnicastZenPing#ping, especially clarifying
the purpose of the duration parameter.
@jasontedor jasontedor merged commit 9dc6503 into elastic:master Nov 22, 2016
jasontedor added a commit that referenced this pull request Nov 22, 2016
Today we eagerly resolve unicast hosts. This means that if DNS changes,
we will never find the host at the new address. Moreover, a single host
failng to resolve causes startup to abort. This commit introduces lazy
resolution of unicast hosts. If a DNS entry changes, there is an
opportunity for the host to be discovered. Note that under the Java
security manager, there is a default positive cache of infinity for
resolved hosts; this means that if a user does want to operate in an
environment where DNS can change, they must adjust
networkaddress.cache.ttl in their security policy. And if a host fails
to resolve, we warn log the hostname but continue pinging other
configured hosts.

When doing DNS resolutions for unicast hostnames, we wait until the DNS
lookups timeout. This appears to be forty-five seconds on modern JVMs,
and it is not configurable. If we do these serially, the cluster can be
blocked during ping for a lengthy period of time. This commit introduces
doing the DNS lookups in parallel, and adds a user-configurable timeout
for these lookups.

Relates #21630
@jasontedor
Copy link
Member Author

Thanks @bleskes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't cache resolved hostnames forever Unicast hostname failing to resolve blocks cluster from starting

5 participants