Skip to content

elastic 5.1.1 master node always pinged time out, failed to form a cluster #22189

@sinsonglew

Description

@sinsonglew

Elasticsearch version: 5.1.1

Plugins installed: [] only default

JVM version: Java(TM) SE Runtime Environment (build 1.8.0_111-b14)

OS version: 2.6.32-504.12.2.02.el6.x86_64 #1 SMP Tue May 12 11:44:09 CST 2015 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
if nodes greater than 3, the cluster is constantly add and remove nodes, leading to unavailable status.
From the following attached log, nodes ping each other timed out often
expected: a full-nodes started cluster

Provide logs (if relevant):

[2016-12-15T19:39:17,648][INFO ][o.e.c.s.ClusterService   ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:39:42,556][INFO ][o.e.d.z.ZenDiscovery     ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: NodeDisconnectedException[[node244][ee.ff.gg.hh:9371][internal:discovery/zen/join/validate] disconnected]; ]
[2016-12-15T19:40:02,565][INFO ][o.e.d.z.ZenDiscovery     ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: ConnectTransportException[[node244][ee.ff.gg.hh:9371] connect_timeout[30s]]; nested: IOException[Connection timed out: ee.ff.gg.hh/ee.ff.gg.hh:9371]; ]
[2016-12-15T19:40:25,433][INFO ][o.e.d.z.ZenDiscovery     ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: NodeDisconnectedException[[node244][ee.ff.gg.hh:9371][internal:discovery/zen/join/validate] disconnected]; ]
[2016-12-15T19:40:42,442][INFO ][o.e.d.z.ZenDiscovery     ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [NodeDisconnectedException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join] disconnected]]
[2016-12-15T19:41:03,531][INFO ][o.e.c.s.ClusterService   ] [node244] detected_master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}, added {{node241}{gx3rLPdIQvSm_u6mLoIbtQ}{80MXRZOjSN6RAQ7gZGeUcQ}{xx.x.xx.x}{x.x.x.x:9371},{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: zen-disco-receive(from master [master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371} committed version [115]])
[2016-12-15T19:41:09,541][INFO ][o.e.d.z.ZenDiscovery     ] [node244] master_left [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2016-12-15T19:41:09,541][WARN ][o.e.d.z.ZenDiscovery     ] [node244] master left (reason = failed to ping, tried [3] times, each with  maximum [30s] timeout), current nodes: nodes: 
   {node243}{tEv4PrXSTdyecz2nClog4Q}{KQsJYnauSe2W0hxWPte4Iw}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
   {node239}{oWLoFJf-Rs-3q-kK1Y_2BQ}{dekHpx5zQASxKgIO21n1Yg}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
   {node244}{stop1h4jQc-zCvM2hBS56Q}{iV8QrIRiQ5y3cvW_en0sMA}{ee.ff.gg.hh}{ee.ff.gg.hh:9371}, local
   {node241}{gx3rLPdIQvSm_u6mLoIbtQ}{80MXRZOjSN6RAQ7gZGeUcQ}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
   {node242}{k5vCKHcjTSKTK-uhK_MtLw}{DMGU0D-8T-Si-ZpfqmiDTQ}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}

[2016-12-15T19:41:09,542][INFO ][o.e.c.s.ClusterService   ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:41:28,555][INFO ][o.e.c.s.ClusterService   ] [node244] master {new {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}}, removed {{node241}{gx3rLPdIQvSm_u6mLoIbtQ}{80MXRZOjSN6RAQ7gZGeUcQ}{xx.xx.xx.xx}{x.x.x.x:9371},}, added {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: zen-disco-receive(from master [master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371} committed version [120]])
[2016-12-15T19:41:34,564][INFO ][o.e.d.z.ZenDiscovery     ] [node244] master_left [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2016-12-15T19:41:34,564][WARN ][o.e.d.z.ZenDiscovery     ] [node244] master left (reason = failed to ping, tried [3] times, each with  maximum [30s] timeout), current nodes: nodes: 
   {node243}{tEv4PrXSTdyecz2nClog4Q}{KQsJYnauSe2W0hxWPte4Iw}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
   {node239}{oWLoFJf-Rs-3q-kK1Y_2BQ}{dekHpx5zQASxKgIO21n1Yg}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
   {node244}{stop1h4jQc-zCvM2hBS56Q}{iV8QrIRiQ5y3cvW_en0sMA}{ee.ff.gg.hh}{ee.ff.gg.hh:9371}, local
   {node242}{k5vCKHcjTSKTK-uhK_MtLw}{DMGU0D-8T-Si-ZpfqmiDTQ}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}

[2016-12-15T19:41:34,565][INFO ][o.e.c.s.ClusterService   ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:41:55,575][INFO ][o.e.d.z.ZenDiscovery     ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: ConnectTransportException[[node244][ee.ff.gg.hh:9371] connect_timeout[30s]]; nested: IOException[Connection timed out: ee.ff.gg.hh/ee.ff.gg.hh:9371]; ]
[2016-12-15T19:42:15,586][INFO ][o.e.d.z.ZenDiscovery     ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: ConnectTransportException[[node244][ee.ff.gg.hh:9371] connect_timeout[30s]]; nested: IOException[Connection timed out: ee.ff.gg.hh/ee.ff.gg.hh:9371]; ]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions