Skip to content

Conversation

@dumbbell
Copy link
Collaborator

Why

The tests are configured to use the same cluster of RabbitMQ nodes but recreate the cluster for every tests.

This leads to the nodes being unclustered and clustered again abruptly becauase rabbit_ct_broker_helpers assumes that nodes are unclustered when it clusters them. One symptom is the coordinator which becomes unavailable after the first test:

[warning] <0.969.0> Coordinator timeout on server 'rmq-ct-cluster-2-21054@localhost' when processing command new_stream
[debug] <0.692.0> RabbitMQ metadata store: follower leader cast - redirecting to {rabbitmq_metadata,'rmq-ct-cluster-2-21054@localhost'}
[debug] <0.969.0> rabbit_stream_reader terminating in state 'open' with reason '{case_clause,{protocol_error,internal_error,[67,97,110,110|...],[[113,117|...],'rmq-ct-cluster-1-21000@localhost',{...}]}}'
[error] <0.969.0> ** State machine <0.969.0> terminating
[error] <0.969.0> ** Last event = {info,{tcp,#Port<0.99>,
[error] <0.969.0>                            <<0,0,0,19,0,13,0,1,0,0,0,1,0,5,100,117,109,109,
[error] <0.969.0>                              121,0,0,0,0>>}}
[error] <0.969.0> ** When server state  = {open,
[error] <0.969.0>                          {statem_data,ranch_tcp,
[error] <0.969.0>                           {stream_connection,
[error] <0.969.0>                            <<"127.0.0.1:39542 -> 127.0.0.1:21015">>,
[error] <0.969.0>                            {0,0,0,0,0,65535,32512,1},
[error] <0.969.0>                            {0,0,0,0,0,65535,32512,1},
[error] <0.969.0>                            21015,39542,
[error] <0.969.0>                            {<<"PLAIN">>,rabbit_auth_mechanism_plain},
[error] <0.969.0>                            done,1761145223725,<0.968.0>,#Port<0.99>,#{},#{},
[error] <0.969.0>                            #{},#{},#Ref<0.2113571418.668598274.3149>,
[error] <0.969.0>                            {user,<<"guest">>,
[error] <0.969.0>                             [administrator],
[error] <0.969.0>                             [{rabbit_auth_backend_internal,
[error] <0.969.0>                               #Fun<rabbit_auth_backend_internal.3.18474459>}]},
[error] <0.969.0>                            <<"/">>,opened,1048576,0,
[error] <0.969.0>                            {none,none},
[error] <0.969.0>                            #{},#{},
[error] <0.969.0>                            {state,none,5000,undefined},
[error] <0.969.0>                            false,#Ref<0.2113571418.668598274.3150>,tcp,
[error] <0.969.0>                            undefined,0,#{},2,60000,undefined,0,undefined},
[error] <0.969.0>                           {stream_connection_state,
[error] <0.969.0>                            {rabbit_stream_core,{cfg},[],undefined,{[],[]}},
[error] <0.969.0>                            false,#{}},
[error] <0.969.0>                           {configuration,50000,12500,1048576,60,10000}}}
[error] <0.969.0> ** Reason for termination = error:{case_clause,
[error] <0.969.0>                                    {protocol_error,internal_error,
[error] <0.969.0>                                     "Cannot declare ~ts on node '~ts': ~255p",
[error] <0.969.0>                                     ["queue 'dummy' in vhost '/'",
[error] <0.969.0>                                      'rmq-ct-cluster-1-21000@localhost',
[error] <0.969.0>                                      {error,coordinator_unavailable}]}}
[error] <0.969.0> ** Callback modules = [rabbit_stream_reader]
[error] <0.969.0> ** Callback mode = [state_functions,state_enter]
[error] <0.969.0> ** Stacktrace =
[error] <0.969.0> **  [{rabbit_stream_manager,do_create_stream,4,
[error] <0.969.0>                             [{file,"rabbit_stream_manager.erl"},{line,537}]},
[error] <0.969.0>      {rabbit_stream_reader,handle_frame_post_auth,4,
[error] <0.969.0>                            [{file,"rabbit_stream_reader.erl"},{line,2158}]},
[error] <0.969.0>      {lists,foldl,3,[{file,"lists.erl"},{line,2466}]},
[error] <0.969.0>      {rabbit_stream_reader,open,3,
[error] <0.969.0>                            [{file,"rabbit_stream_reader.erl"},{line,689}]},
[error] <0.969.0>      {gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,3748}]},
[error] <0.969.0>      {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,333}]}]
[error] <0.969.0>
[error] <0.969.0>   crasher:
[error] <0.969.0>     initial call: rabbit_stream_reader:init/1
[error] <0.969.0>     pid: <0.969.0>
[error] <0.969.0>     registered_name: []
[error] <0.969.0>     exception error: no case clause matching
[error] <0.969.0>                      {protocol_error,internal_error,
[error] <0.969.0>                                      "Cannot declare ~ts on node '~ts': ~255p",
[error] <0.969.0>                                      ["queue 'dummy' in vhost '/'",
[error] <0.969.0>                                       'rmq-ct-cluster-1-21000@localhost',
[error] <0.969.0>                                       {error,coordinator_unavailable}]}
[error] <0.969.0>       in function  rabbit_stream_manager:do_create_stream/4 (rabbit_stream_manager.erl:537)
[error] <0.969.0>       in call from rabbit_stream_reader:handle_frame_post_auth/4 (rabbit_stream_reader.erl:2158)
[error] <0.969.0>       in call from lists:foldl/3 (lists.erl:2466)
[error] <0.969.0>       in call from rabbit_stream_reader:open/3 (rabbit_stream_reader.erl:689)
[error] <0.969.0>       in call from gen_statem:loop_state_callback/11 (gen_statem.erl:3748)
[error] <0.969.0>     ancestors: [<0.967.0>,<0.874.0>,<0.873.0>,<0.872.0>,<0.870.0>,<0.869.0>,
[error] <0.969.0>                   rabbit_stream_sup,<0.866.0>]
[error] <0.969.0>     message_queue_len: 0
[error] <0.969.0>     messages: []
[error] <0.969.0>     links: [<0.967.0>]
[error] <0.969.0>     dictionary: [{'$logger_metadata$',#{domain => [rabbitmq,connection]}}]
[error] <0.969.0>     trap_exit: true
[error] <0.969.0>     status: running
[error] <0.969.0>     heap_size: 46422
[error] <0.969.0>     stack_size: 29
[error] <0.969.0>     reductions: 5301495

How

Let's use a different cluster for each testcase. This was probably the initial intention anyway.

[Why]
The tests are configured to use the same cluster of RabbitMQ nodes but
recreate the cluster for every tests.

This leads to the nodes being unclustered and clustered again abruptly
becauase rabbit_ct_broker_helpers assumes that nodes are unclustered
when it clusters them. One symptom is the coordinator which becomes
unavailable after the first test:

    [warning] <0.969.0> Coordinator timeout on server 'rmq-ct-cluster-2-21054@localhost' when processing command new_stream
    [debug] <0.692.0> RabbitMQ metadata store: follower leader cast - redirecting to {rabbitmq_metadata,'rmq-ct-cluster-2-21054@localhost'}
    [debug] <0.969.0> rabbit_stream_reader terminating in state 'open' with reason '{case_clause,{protocol_error,internal_error,[67,97,110,110|...],[[113,117|...],'rmq-ct-cluster-1-21000@localhost',{...}]}}'
    [error] <0.969.0> ** State machine <0.969.0> terminating
    [error] <0.969.0> ** Last event = {info,{tcp,#Port<0.99>,
    [error] <0.969.0>                            <<0,0,0,19,0,13,0,1,0,0,0,1,0,5,100,117,109,109,
    [error] <0.969.0>                              121,0,0,0,0>>}}
    [error] <0.969.0> ** When server state  = {open,
    [error] <0.969.0>                          {statem_data,ranch_tcp,
    [error] <0.969.0>                           {stream_connection,
    [error] <0.969.0>                            <<"127.0.0.1:39542 -> 127.0.0.1:21015">>,
    [error] <0.969.0>                            {0,0,0,0,0,65535,32512,1},
    [error] <0.969.0>                            {0,0,0,0,0,65535,32512,1},
    [error] <0.969.0>                            21015,39542,
    [error] <0.969.0>                            {<<"PLAIN">>,rabbit_auth_mechanism_plain},
    [error] <0.969.0>                            done,1761145223725,<0.968.0>,#Port<0.99>,#{},#{},
    [error] <0.969.0>                            #{},#{},#Ref<0.2113571418.668598274.3149>,
    [error] <0.969.0>                            {user,<<"guest">>,
    [error] <0.969.0>                             [administrator],
    [error] <0.969.0>                             [{rabbit_auth_backend_internal,
    [error] <0.969.0>                               #Fun<rabbit_auth_backend_internal.3.18474459>}]},
    [error] <0.969.0>                            <<"/">>,opened,1048576,0,
    [error] <0.969.0>                            {none,none},
    [error] <0.969.0>                            #{},#{},
    [error] <0.969.0>                            {state,none,5000,undefined},
    [error] <0.969.0>                            false,#Ref<0.2113571418.668598274.3150>,tcp,
    [error] <0.969.0>                            undefined,0,#{},2,60000,undefined,0,undefined},
    [error] <0.969.0>                           {stream_connection_state,
    [error] <0.969.0>                            {rabbit_stream_core,{cfg},[],undefined,{[],[]}},
    [error] <0.969.0>                            false,#{}},
    [error] <0.969.0>                           {configuration,50000,12500,1048576,60,10000}}}
    [error] <0.969.0> ** Reason for termination = error:{case_clause,
    [error] <0.969.0>                                    {protocol_error,internal_error,
    [error] <0.969.0>                                     "Cannot declare ~ts on node '~ts': ~255p",
    [error] <0.969.0>                                     ["queue 'dummy' in vhost '/'",
    [error] <0.969.0>                                      'rmq-ct-cluster-1-21000@localhost',
    [error] <0.969.0>                                      {error,coordinator_unavailable}]}}
    [error] <0.969.0> ** Callback modules = [rabbit_stream_reader]
    [error] <0.969.0> ** Callback mode = [state_functions,state_enter]
    [error] <0.969.0> ** Stacktrace =
    [error] <0.969.0> **  [{rabbit_stream_manager,do_create_stream,4,
    [error] <0.969.0>                             [{file,"rabbit_stream_manager.erl"},{line,537}]},
    [error] <0.969.0>      {rabbit_stream_reader,handle_frame_post_auth,4,
    [error] <0.969.0>                            [{file,"rabbit_stream_reader.erl"},{line,2158}]},
    [error] <0.969.0>      {lists,foldl,3,[{file,"lists.erl"},{line,2466}]},
    [error] <0.969.0>      {rabbit_stream_reader,open,3,
    [error] <0.969.0>                            [{file,"rabbit_stream_reader.erl"},{line,689}]},
    [error] <0.969.0>      {gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,3748}]},
    [error] <0.969.0>      {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,333}]}]
    [error] <0.969.0>
    [error] <0.969.0>   crasher:
    [error] <0.969.0>     initial call: rabbit_stream_reader:init/1
    [error] <0.969.0>     pid: <0.969.0>
    [error] <0.969.0>     registered_name: []
    [error] <0.969.0>     exception error: no case clause matching
    [error] <0.969.0>                      {protocol_error,internal_error,
    [error] <0.969.0>                                      "Cannot declare ~ts on node '~ts': ~255p",
    [error] <0.969.0>                                      ["queue 'dummy' in vhost '/'",
    [error] <0.969.0>                                       'rmq-ct-cluster-1-21000@localhost',
    [error] <0.969.0>                                       {error,coordinator_unavailable}]}
    [error] <0.969.0>       in function  rabbit_stream_manager:do_create_stream/4 (rabbit_stream_manager.erl:537)
    [error] <0.969.0>       in call from rabbit_stream_reader:handle_frame_post_auth/4 (rabbit_stream_reader.erl:2158)
    [error] <0.969.0>       in call from lists:foldl/3 (lists.erl:2466)
    [error] <0.969.0>       in call from rabbit_stream_reader:open/3 (rabbit_stream_reader.erl:689)
    [error] <0.969.0>       in call from gen_statem:loop_state_callback/11 (gen_statem.erl:3748)
    [error] <0.969.0>     ancestors: [<0.967.0>,<0.874.0>,<0.873.0>,<0.872.0>,<0.870.0>,<0.869.0>,
    [error] <0.969.0>                   rabbit_stream_sup,<0.866.0>]
    [error] <0.969.0>     message_queue_len: 0
    [error] <0.969.0>     messages: []
    [error] <0.969.0>     links: [<0.967.0>]
    [error] <0.969.0>     dictionary: [{'$logger_metadata$',#{domain => [rabbitmq,connection]}}]
    [error] <0.969.0>     trap_exit: true
    [error] <0.969.0>     status: running
    [error] <0.969.0>     heap_size: 46422
    [error] <0.969.0>     stack_size: 29
    [error] <0.969.0>     reductions: 5301495

[How]
Let's use a different cluster for each testcase. This was probably the
initial intention anyway.
@dumbbell dumbbell self-assigned this Oct 22, 2025
@acogoluegnes acogoluegnes marked this pull request as ready for review October 23, 2025 06:09
@acogoluegnes acogoluegnes marked this pull request as draft October 23, 2025 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants