Skip to content

NetworkDisruptionIT testJobRelocation failing  #35052

@mayya-sharipova

Description

@mayya-sharipova

This test fails from time to time:

REPRODUCE WITH: ./gradlew :x-pack:plugin:ml:internalClusterTest \
  -Dtests.seed=8D4D9C772ADB301E \
  -Dtests.class=org.elasticsearch.xpack.ml.integration.NetworkDisruptionIT \
  -Dtests.method="testJobRelocation" \
  -Dtests.security.manager=true \
  -Dtests.locale=pt-PT \
  -Dtests.timezone=US/Michigan \
  -Dcompiler.java=10 \
  -Druntime.java=8

Log : https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+multijob-unix-compatibility/os=debian/8/console
The failure is not reproducible for me.

Looks like there is problem of forming a cluster: split brain in the cluster (two master nodes detected: node_t3 and node_t0, and not enough master nodes for the quorum:

1> [2018-10-29T13:32:47,544][INFO ][o.e.c.s.MasterService    ] [node_t3] zen-disco-elected-as-master ([2] nodes joined)[{node_t4}{vdyH-Xp0S_eoZNx9KcMaAw}{XaKVGoOGSCympiARN1x8mw}{127.0.0.1}{127.0.0.1:42010}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, {node_t2}{dh3pD3ONQ0yjTKGqY2lj1g}{HCp8GrhtQVmHoybwpNwO9w}{127.0.0.1}{127.0.0.1:39450}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason: new_master {node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, added {{node_t4}{vdyH-Xp0S_eoZNx9KcMaAw}{XaKVGoOGSCympiARN1x8mw}{127.0.0.1}{127.0.0.1:42010}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{node_t2}{dh3pD3ONQ0yjTKGqY2lj1g}{HCp8GrhtQVmHoybwpNwO9w}{127.0.0.1}{127.0.0.1:39450}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}
  1> [2018-10-29T13:32:47,547][INFO ][o.e.c.s.MasterService    ] [node_t0] zen-disco-node-join[{node_t1}{mRSV0xF-SC2Qy2NZz0J_5g}{1KEoP2T5Tze2Vtl3wU7WKg}{127.0.0.1}{127.0.0.1:59887}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason: added {{node_t1}{mRSV0xF-SC2Qy2NZz0J_5g}{1KEoP2T5Tze2Vtl3wU7WKg}{127.0.0.1}{127.0.0.1:59887}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}
  1> [2018-10-29T13:32:47,556][INFO ][o.e.c.s.ClusterApplierService] [node_t2] detected_master {node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, added {{node_t4}{vdyH-Xp0S_eoZNx9KcMaAw}{XaKVGoOGSCympiARN1x8mw}{127.0.0.1}{127.0.0.1:42010}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [1]])
  1> [2018-10-29T13:32:47,556][INFO ][o.e.c.s.ClusterApplierService] [node_t4] detected_master {node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, added {{node_t2}{dh3pD3ONQ0yjTKGqY2lj1g}{HCp8GrhtQVmHoybwpNwO9w}{127.0.0.1}{127.0.0.1:39450}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [1]])
  1> [2018-10-29T13:32:47,557][INFO ][o.e.c.s.ClusterApplierService] [node_t1] detected_master {node_t0}{RGD13uxVTcWUm62e6oUCmQ}{TG2rTEBARZyN4IEJq5KFuA}{127.0.0.1}{127.0.0.1:34213}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, added {{node_t0}{RGD13uxVTcWUm62e6oUCmQ}{TG2rTEBARZyN4IEJq5KFuA}{127.0.0.1}{127.0.0.1:34213}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {node_t0}{RGD13uxVTcWUm62e6oUCmQ}{TG2rTEBARZyN4IEJq5KFuA}{127.0.0.1}{127.0.0.1:34213}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [4]])
  1> [2018-10-29T13:32:47,566][INFO ][o.e.n.Node               ] [[test_TEST-CHILD_VM=[2]-CLUSTER_SEED=[4793419988162627123]-HASH=[6DBB78660FF]-cluster[T#4]]] started
  1> [2018-10-29T13:32:47,566][INFO ][o.e.n.Node               ] [[test_TEST-CHILD_VM=[2]-CLUSTER_SEED=[4793419988162627123]-HASH=[6DBB78660FF]-cluster[T#2]]] started
  1> [2018-10-29T13:32:47,568][INFO ][o.e.c.s.ClusterApplierService] [node_t3] new_master {node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, added {{node_t4}{vdyH-Xp0S_eoZNx9KcMaAw}{XaKVGoOGSCympiARN1x8mw}{127.0.0.1}{127.0.0.1:42010}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{node_t2}{dh3pD3ONQ0yjTKGqY2lj1g}{HCp8GrhtQVmHoybwpNwO9w}{127.0.0.1}{127.0.0.1:39450}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {node_t3}{UoshLV9fRWezwvy7lDcAPw}{GX8SBaUzQESrj_ewmZflRg}{127.0.0.1}{127.0.0.1:52930}{ml.machine_memory=63464030208, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([2] nodes joined)[{node_t4}{vdyH-Xp0S_eoZNx9KcMaAw}{XaKVGoOGSCympiARN1x8mw}{127.0.0.1}{127.0.0.1:42010}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, {node_t2}{dh3pD3ONQ0yjTKGqY2lj1g}{HCp8GrhtQVmHoybwpNwO9w}{127.0.0.1}{127.0.0.1:39450}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]]])
  1> [2018-10-29T13:32:47,570][INFO ][o.e.n.Node               ] [[test_TEST-CHILD_VM=[2]-CLUSTER_SEED=[4793419988162627123]-HASH=[6DBB78660FF]-cluster[T#3]]] started
  1> [2018-10-29T13:32:47,602][INFO ][o.e.l.LicenseService     ] [node_t1] license [a5fb9648-fdb2-46ae-8185-9ee424906bde] mode [trial] - valid
  1> [2018-10-29T13:32:47,603][INFO ][o.e.n.Node               ] [[test_TEST-CHILD_VM=[2]-CLUSTER_SEED=[4793419988162627123]-HASH=[6DBB78660FF]-cluster[T#1]]] started
  1> [2018-10-29T13:32:47,603][INFO ][o.e.c.s.ClusterApplierService] [node_t0] added {{node_t1}{mRSV0xF-SC2Qy2NZz0J_5g}{1KEoP2T5Tze2Vtl3wU7WKg}{127.0.0.1}{127.0.0.1:59887}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {node_t0}{RGD13uxVTcWUm62e6oUCmQ}{TG2rTEBARZyN4IEJq5KFuA}{127.0.0.1}{127.0.0.1:34213}{ml.machine_memory=63464030208, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [4] source [zen-disco-node-join[{node_t1}{mRSV0xF-SC2Qy2NZz0J_5g}{1KEoP2T5Tze2Vtl3wU7WKg}{127.0.0.1}{127.0.0.1:59887}{ml.machine_memory=63464030208, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]]])
  1> [2018-10-29T13:32:47,605][WARN ][o.e.d.z.ElectMasterService] [node_t0] value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [1], total number of master-eligible nodes used for publishing in this round: [2])

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions