Skip to content

client and data nodes cannot discover master if master changes its IP - kubernetes #39822

@manojtr

Description

@manojtr

Elasticsearch version: 2.4.6

Plugins installed: marvel-agent delete-by-query shield license cloud-aws

JVM version : 1.8.0 Open JDK

OS version: CentOS 7

Description of the problem including expected versus actual behavior:

I am running the ES cluster (1 client pod, 2 data pods and 1 master pod) in kubernetes. The discovery is set to a kubernetes service and all works fine in a normal setup.

But when master pod dies and k8s will create a new pod (with new IP), the client and data fails to connect to master because it is somehow cannot get the new IP from the discovery service. If I do nslookup on the discovery service it is resolving to a new IP but client and data nodes cannot see the IP change. And on client node and data node, it produces the below error

	Caused by: ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]
		at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:158)
		at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
		at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:204)
		at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:151)
		at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:71)
		at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:149)
		at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:172)
		at org.elasticsearch.shield.action.ShieldActionFilter.apply(ShieldActionFilter.java:137)
		at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170)
		at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:144)
		at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:85)
		at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
		at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:359)
		at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
		at org.elasticsearch.marvel.shield.SecuredClient.doExecute(SecuredClient.java:45)
		at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:359)
		at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:86)
		at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:56)
		at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:64)
		at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:116)
		at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101)
		... 3 more

Steps to reproduce:

  1. Create a simple ES cluster with 1 client, 1 data and 1 master in k8s.
  2. Kill the master pod
  3. The master pod will come up with a new IP and waiting other nodes to join.
  4. Both client and dat nodes are stuck by saying no master to connect. It looks like it doesn't respect the DNS discovery name and seems to cache the IP

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions