Skip to content

type-context.h should return error instead of asserting on simultaneous IPC calls #206

@ryanofsky

Description

@ryanofsky

Currently, when a client makes simultaneous IPC calls on a single server thread (assigning the same Context.thread value in the request), assert(!m_fn) triggers in Waiter::post.

Since this error is pretty easy to trigger in python and rust cilents not using libmultiprocess, it would be better for the server to refuse these requests and provide a clear error message instead of crashing. This should just require a small change to the Waiter class and the code calling it in type-context.h. Context: bitcoin/bitcoin#33201 (comment) and https://gnusha.org/bitcoin-core-dev/2025-09-05.log:

04:50 < sipa> ryanofsky: any idea about this error in https://github.com/bitcoin/bitcoin/actions/runs/17491932432/job/49683489249?pr=33201 :
04:51 < sipa>  node0 stderr bitcoin-node: ipc/libmultiprocess/include/mp/proxy-io.h:279: void mp::Waiter::post(Fn &&) [Fn = (lambda at ./ipc/libmultiprocess/include/mp/type-context.h:67:19)]: Assertion `!m_fn' failed. 
04:55 < kevkevin> looks like the test got hung up on this line in the ipc test: template4 = File "/home/admin/actions-runner/_work/_temp/build/test/functional/interface_ipc.py", line 156, in async_routine: await template2.result.waitNext(ctx, waitoptions)
04:56 < sipa> yeah, bitcoin-node crashed during that invocation
05:01 < sipa> i can reproduce it locally, though it's pretty rare (once in 60 runs, so far)
05:04 < kevkevin> hmm interesting, I'll try to reproduce it locally myself
05:05 < ryanofsky> that is interesting, i've never seen that assert hit before. i think it could happen if two ipc clients try to execute a different IPC calls on the same thread at the same time
05:06 < sipa> huh!
05:06 < ryanofsky> libmultiprocess assumes a 1:1 threading model where a there is one server thread for every client thread, emulating a traditional c++ call stack
05:06 < ryanofsky> allowing recursive mutexes, synchronous callbacks, etc
05:06 < sipa> should the python side do anything specific to arrange that?
05:07 < sipa> i would assume not, because there is only one thread on the python side that does all IPC calls
05:07 < ryanofsky> it should just make two IPC calls at the same time, and they should be calls that do not return instantly like waitNext
05:08 < ryanofsky> yeah, it is unexpected that current python test would cause that
05:08 < dodo> c
05:09 -!- hacker4web3bitco [~hacker4we@user/hacker4web3bitco] has joined #bitcoin-core-dev
05:10 < ryanofsky> maybe there is a race condition where c++ code sends a response but the server thread doesn't return to being idle, and python code sends the next request quickly enough to hit the assert
05:10 -!- bitcoinlover [~hacker4we@user/hacker4web3bitco] has quit [Ping timeout: 248 seconds]
05:11 < sipa> oh, i may have messed something up
05:11 < sipa> waitnext = template2.result.waitNext(ctx, waitoptions)
05:11 < sipa> template4 = await template2.result.waitNext(ctx, waitoptions)
05:11 < sipa> might create two asynchronous calls simultaneously?
05:12 < ryanofsky> it seems like that would just chain the calls so the server runs them back to back, and make the bug more likely to happen
05:13 < ryanofsky> but if you added an await on the first line it might prevent this
05:13 < sipa> the first one is just unnecessary - i inlined it into the second one, but forgot to remove the original
05:14 < sipa> you say "make the bug more likely", so you think that the bug isn't in this code itself, but elsewhere?
05:14 < ryanofsky> actually i misread. yeah i assumed you were using waitnext variable second line and the calls would be chained. but actually this does look like simultaneous calls
05:14 < sipa> alright, let me run it 1000 times to see if it reappears, but if not, i'm going to assume that was the culprit
05:14 < ryanofsky> no i think this code does explain the bug
05:15 < sipa> thanks!
05:15 < ryanofsky> i just first misread it as chained calls that would be pipelined when actually those are parallel calls
05:15 < sipa> right
05:16 < ryanofsky> thanks for the test, and finding this. server could definitely do better here, it could just refuse the request instead of asserting
05:18 < sipa> or at least a more helpful error message
05:18 < sipa> i could imagine third parties trying to use the interface might hit something like this too
05:20 < ryanofsky> yeah the error message is the most important thing in practice

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions