Skip to content

Mismatch in security settings causes mostly silent cluster failure #42153

@tvernum

Description

@tvernum

By design, nodes with security disabled cannot join a cluster that has security enabled.

In 7.0 and later (Zen2) that failure is logged only as a cluster formation problem but does not give any indication why the failure happened.

For example, if node01 is:

node.name: node01
xpack.security.enabled: true
xpack.license.self_generated.type: trial
node.master: true
cluster.initial_master_nodes:
    - node01

And node02 is:

node.name: node02
node.master: false
cluster.initial_master_nodes:
    - node01
discovery.seed_hosts: localhost:9300

When node02 attempts to join the cluster it will see that a trial license is in place, and will automatically disable security (because xpack.security.enabled is unset), and then it will stop sending security headers, which will cause it to drop out of the cluster.
But all we get in the log is messages like:

[INFO ][o.e.c.s.ClusterApplierService] [node02] master node changed {previous [{node01}{G5pGcCfsQU6oq86_7IkkMg}{e37wwb0GRMegY7fT5ghpng}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true}], current []}, term: 2, version: 12, reason: becoming candidate: onLeaderFailure
[INFO ][o.e.c.s.MasterService    ] [node01] node-join[{node02}{M4M5AkWHSkygr69XDQUftQ}{mgTAFQmVQ2-dmus3FmpuZA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 2, version: 10, reason: added {{node02}{M4M5AkWHSkygr69XDQUftQ}{mgTAFQmVQ2-dmus3FmpuZA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true},}
[INFO ][o.e.c.s.ClusterApplierService] [node01] added {{node02}{M4M5AkWHSkygr69XDQUftQ}{mgTAFQmVQ2-dmus3FmpuZA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true},}, term: 2, version: 10, reason: Publication{term=2, version=10}
[INFO ][o.e.c.s.MasterService    ] [node01] node-left[{node02}{M4M5AkWHSkygr69XDQUftQ}{mgTAFQmVQ2-dmus3FmpuZA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true} disconnected], term: 2, version: 11, reason: removed {{node02}{M4M5AkWHSkygr69XDQUftQ}{mgTAFQmVQ2-dmus3FmpuZA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true},}
[INFO ][o.e.c.s.ClusterApplierService] [node01] removed {{node02}{M4M5AkWHSkygr69XDQUftQ}{mgTAFQmVQ2-dmus3FmpuZA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true},}, term: 2, version: 11, reason: Publication{term=2, version=11}

But there is no log message that gives an explicit error related to "security".

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.:Security/SecuritySecurity issues without another label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions