-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Distributed Coordination/NetworkHttp and internode communication implementationsHttp and internode communication implementations:Distributed Indexing/DistributedA catch all label for anything in the Distributed Indexing Area. Please avoid if you can.A catch all label for anything in the Distributed Indexing Area. Please avoid if you can.feedback_needed
Description
Elasticsearch version: 5.1.1
Plugins installed: [] only default
JVM version: Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
OS version: 2.6.32-504.12.2.02.el6.x86_64 #1 SMP Tue May 12 11:44:09 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
if nodes greater than 3, the cluster is constantly add and remove nodes, leading to unavailable status.
From the following attached log, nodes ping each other timed out often
expected: a full-nodes started cluster
Provide logs (if relevant):
[2016-12-15T19:39:17,648][INFO ][o.e.c.s.ClusterService ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:39:42,556][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: NodeDisconnectedException[[node244][ee.ff.gg.hh:9371][internal:discovery/zen/join/validate] disconnected]; ]
[2016-12-15T19:40:02,565][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: ConnectTransportException[[node244][ee.ff.gg.hh:9371] connect_timeout[30s]]; nested: IOException[Connection timed out: ee.ff.gg.hh/ee.ff.gg.hh:9371]; ]
[2016-12-15T19:40:25,433][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: NodeDisconnectedException[[node244][ee.ff.gg.hh:9371][internal:discovery/zen/join/validate] disconnected]; ]
[2016-12-15T19:40:42,442][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [NodeDisconnectedException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join] disconnected]]
[2016-12-15T19:41:03,531][INFO ][o.e.c.s.ClusterService ] [node244] detected_master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}, added {{node241}{gx3rLPdIQvSm_u6mLoIbtQ}{80MXRZOjSN6RAQ7gZGeUcQ}{xx.x.xx.x}{x.x.x.x:9371},{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: zen-disco-receive(from master [master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371} committed version [115]])
[2016-12-15T19:41:09,541][INFO ][o.e.d.z.ZenDiscovery ] [node244] master_left [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2016-12-15T19:41:09,541][WARN ][o.e.d.z.ZenDiscovery ] [node244] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{node243}{tEv4PrXSTdyecz2nClog4Q}{KQsJYnauSe2W0hxWPte4Iw}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
{node239}{oWLoFJf-Rs-3q-kK1Y_2BQ}{dekHpx5zQASxKgIO21n1Yg}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
{node244}{stop1h4jQc-zCvM2hBS56Q}{iV8QrIRiQ5y3cvW_en0sMA}{ee.ff.gg.hh}{ee.ff.gg.hh:9371}, local
{node241}{gx3rLPdIQvSm_u6mLoIbtQ}{80MXRZOjSN6RAQ7gZGeUcQ}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
{node242}{k5vCKHcjTSKTK-uhK_MtLw}{DMGU0D-8T-Si-ZpfqmiDTQ}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
[2016-12-15T19:41:09,542][INFO ][o.e.c.s.ClusterService ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:41:28,555][INFO ][o.e.c.s.ClusterService ] [node244] master {new {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}}, removed {{node241}{gx3rLPdIQvSm_u6mLoIbtQ}{80MXRZOjSN6RAQ7gZGeUcQ}{xx.xx.xx.xx}{x.x.x.x:9371},}, added {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: zen-disco-receive(from master [master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371} committed version [120]])
[2016-12-15T19:41:34,564][INFO ][o.e.d.z.ZenDiscovery ] [node244] master_left [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2016-12-15T19:41:34,564][WARN ][o.e.d.z.ZenDiscovery ] [node244] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{node243}{tEv4PrXSTdyecz2nClog4Q}{KQsJYnauSe2W0hxWPte4Iw}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
{node239}{oWLoFJf-Rs-3q-kK1Y_2BQ}{dekHpx5zQASxKgIO21n1Yg}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
{node244}{stop1h4jQc-zCvM2hBS56Q}{iV8QrIRiQ5y3cvW_en0sMA}{ee.ff.gg.hh}{ee.ff.gg.hh:9371}, local
{node242}{k5vCKHcjTSKTK-uhK_MtLw}{DMGU0D-8T-Si-ZpfqmiDTQ}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
[2016-12-15T19:41:34,565][INFO ][o.e.c.s.ClusterService ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:41:55,575][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: ConnectTransportException[[node244][ee.ff.gg.hh:9371] connect_timeout[30s]]; nested: IOException[Connection timed out: ee.ff.gg.hh/ee.ff.gg.hh:9371]; ]
[2016-12-15T19:42:15,586][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: ConnectTransportException[[node244][ee.ff.gg.hh:9371] connect_timeout[30s]]; nested: IOException[Connection timed out: ee.ff.gg.hh/ee.ff.gg.hh:9371]; ]
aqlu, ygersie, OmarDarwish and lamian3
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/NetworkHttp and internode communication implementationsHttp and internode communication implementations:Distributed Indexing/DistributedA catch all label for anything in the Distributed Indexing Area. Please avoid if you can.A catch all label for anything in the Distributed Indexing Area. Please avoid if you can.feedback_needed