-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version: 2.4.6
Plugins installed: marvel-agent delete-by-query shield license cloud-aws
JVM version : 1.8.0 Open JDK
OS version: CentOS 7
Description of the problem including expected versus actual behavior:
I am running the ES cluster (1 client pod, 2 data pods and 1 master pod) in kubernetes. The discovery is set to a kubernetes service and all works fine in a normal setup.
But when master pod dies and k8s will create a new pod (with new IP), the client and data fails to connect to master because it is somehow cannot get the new IP from the discovery service. If I do nslookup on the discovery service it is resolving to a new IP but client and data nodes cannot see the IP change. And on client node and data node, it produces the below error
Caused by: ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:158)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:204)
at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:151)
at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:71)
at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:149)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:172)
at org.elasticsearch.shield.action.ShieldActionFilter.apply(ShieldActionFilter.java:137)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:144)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:85)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:359)
at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
at org.elasticsearch.marvel.shield.SecuredClient.doExecute(SecuredClient.java:45)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:359)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:86)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:56)
at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:64)
at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:116)
at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101)
... 3 more
Steps to reproduce:
- Create a simple ES cluster with 1 client, 1 data and 1 master in k8s.
- Kill the master pod
- The master pod will come up with a new IP and waiting other nodes to join.
- Both client and dat nodes are stuck by saying no master to connect. It looks like it doesn't respect the DNS discovery name and seems to cache the IP