Skip to content

Follower aggregator panics in e2e tests in run-only mode #2770

@jpraynaud

Description

@jpraynaud

Why

When running the e2e test in run-only mode, the follower aggregators fail after few minutes with the same error:

Error: task 27 panicked with message "FOREIGN KEY constraint failed (code 19)"

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: mithril_aggregator::commands::serve_command::ServeCommand::execute::{{closure}}
   2: tokio::runtime::park::CachedParkThread::block_on
   3: tokio::runtime::context::runtime::enter_runtime
   4: tokio::runtime::runtime::Runtime::block_on
   5: mithril_aggregator::main
   6: std::sys::backtrace::__rust_begin_short_backtrace
   7: std::rt::lang_start::{{closure}}
   8: std::rt::lang_start_internal
   9: main
  10: __libc_start_call_main
             at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  11: __libc_start_main_impl
             at ./csu/../csu/libc-start.c:360:3
  12: _start

The e2e test is started with the following parameters:

--skip-signature-delayer --number-of-aggregators=3 --number-of-signers=4 --use-relays --relay-signer-registration-mode=passthrough --relay-signature-registration-mode=p2p --aggregate-signature-type=Concatenation  --use-dmq --dmq-node-flavor=haskell --run-only --cardano-slot-length 0.25 --cardano-epoch-length 30

Hypothesis 1

The problem occurs when using the DMQ node (and thus the signature processor).
The scenario could be the following:

  • An individual signature is being created for an open message which gets deleted before the insertion of the signature
  • This leads to the foreign key error
  • This error is triggering a panic of the select! call in the run method of the SequentialSignatureProcessor
  • Maybe adding an else branch to the select! call

Reference of cases where select! panics:

/// The `select!` macro panics if all branches are disabled **and** there is no
/// provided `else` branch. A branch is disabled when the provided `if`
/// precondition returns `false` **or** when the pattern does not match the
/// result of `<async expression>`.

Hypothesis 2

The database cursor can panic and this can be the source of the problem

What

Investigate the source of the problem and create a fix.

How

  • Investigate the source of the error
  • Create a fix

Metadata

Metadata

Assignees

Labels

bug ⚠️Something isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions