Skip to content

ra:transfer_leadership reduces quorum count by 1 during a leadership transfer #251

@adrianroe

Description

@adrianroe

ra:transfer_leadership appears to reduce the quorum count by 1 during a leadership transfer. This leads to cluster failure if you ever call transfer_leader on a cluster that is one server away from being in-quorate. The easiest way to reproduce the behaviour is to run a cluster of 2 nodes and attempt to transfer the leadership, but the same behaviour occurs if you are ever on the cusp of being in-quorate (2 servers up out of 3, 3 out of 4, 3 out of 5...)

Repro: note the timeouts calling ra members after the call to transfer_leader and that once the cluster becomes operational again ra1 is still the leader...

Eshell V12.1.3  (abort with ^G)
(runner@macbookM1)1> ErlangNodes = ['ra1@macbookM1', 'ra2@macbookM1'].
[ra1@macbookM1,ra2@macbookM1]
(runner@macbookM1)2> [io:format("Attempting to communicate with node ~s, response: ~s~n", [N, net_adm:ping(N)]) || N <- ErlangNodes].
Attempting to communicate with node ra1@macbookM1, response: pong
Attempting to communicate with node ra2@macbookM1, response: pong
[ok,ok]
(runner@macbookM1)3> [rpc:call(N, ra, start, []) || N <- ErlangNodes].
[ok,ok]
(runner@macbookM1)4> ServerIds = [{quick_start, N} || N <- ErlangNodes].
[{quick_start,ra1@macbookM1},{quick_start,ra2@macbookM1}]
(runner@macbookM1)5> ClusterName = quick_start.
quick_start
(runner@macbookM1)6> Machine = {simple, fun erlang:'+'/2, 0}.
{simple,fun erlang:'+'/2,0}
(runner@macbookM1)7> {ok, ServersStarted, _ServersNotStarted} = ra:start_cluster(default, ClusterName, Machine, ServerIds).
{ok,[{quick_start,ra2@macbookM1},
     {quick_start,ra1@macbookM1}],
    []}
(runner@macbookM1)8> {ok, StateMachineResult, LeaderId} = ra:process_command(hd(ServersStarted), 5).
{ok,5,{quick_start,ra1@macbookM1}}
(runner@macbookM1)9> {ok, 12, LeaderId1} = ra:process_command(LeaderId, 7).
{ok,12,{quick_start,ra1@macbookM1}}
(runner@macbookM1)10> ra:members({quick_start,ra1@macbookM1}).                                                               {ok,[{quick_start,ra1@macbookM1},
     {quick_start,ra2@macbookM1}],
    {quick_start,ra1@macbookM1}}
(runner@macbookM1)11> ra:transfer_leadership({quick_start,ra1@macbookM1},  {quick_start,ra2@macbookM1}).
ok
(runner@macbookM1)12> ra:members({quick_start,ra1@macbookM1}).
{timeout,{quick_start,ra1@macbookM1}}
(runner@macbookM1)13> ra:members({quick_start,ra1@macbookM1}).
{timeout,{quick_start,ra1@macbookM1}}
(runner@macbookM1)14> ra:members({quick_start,ra1@macbookM1}).
{timeout,{quick_start,ra1@macbookM1}}
(runner@macbookM1)15> ra:members({quick_start,ra1@macbookM1}).
{ok,[{quick_start,ra1@macbookM1},
     {quick_start,ra2@macbookM1}],
    {quick_start,ra1@macbookM1}}
(runner@macbookM1)16>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions