-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Currently, when a client makes simultaneous IPC calls on a single server thread (assigning the same Context.thread value in the request), assert(!m_fn)
triggers in Waiter::post
.
Since this error is pretty easy to trigger in python and rust cilents not using libmultiprocess, it would be better for the server to refuse these requests and provide a clear error message instead of crashing. This should just require a small change to the Waiter
class and the code calling it in type-context.h
. Context: bitcoin/bitcoin#33201 (comment) and https://gnusha.org/bitcoin-core-dev/2025-09-05.log:
04:50 < sipa> ryanofsky: any idea about this error in https://github.com/bitcoin/bitcoin/actions/runs/17491932432/job/49683489249?pr=33201 :
04:51 < sipa> node0 stderr bitcoin-node: ipc/libmultiprocess/include/mp/proxy-io.h:279: void mp::Waiter::post(Fn &&) [Fn = (lambda at ./ipc/libmultiprocess/include/mp/type-context.h:67:19)]: Assertion `!m_fn' failed.
04:55 < kevkevin> looks like the test got hung up on this line in the ipc test: template4 = File "/home/admin/actions-runner/_work/_temp/build/test/functional/interface_ipc.py", line 156, in async_routine: await template2.result.waitNext(ctx, waitoptions)
04:56 < sipa> yeah, bitcoin-node crashed during that invocation
05:01 < sipa> i can reproduce it locally, though it's pretty rare (once in 60 runs, so far)
05:04 < kevkevin> hmm interesting, I'll try to reproduce it locally myself
05:05 < ryanofsky> that is interesting, i've never seen that assert hit before. i think it could happen if two ipc clients try to execute a different IPC calls on the same thread at the same time
05:06 < sipa> huh!
05:06 < ryanofsky> libmultiprocess assumes a 1:1 threading model where a there is one server thread for every client thread, emulating a traditional c++ call stack
05:06 < ryanofsky> allowing recursive mutexes, synchronous callbacks, etc
05:06 < sipa> should the python side do anything specific to arrange that?
05:07 < sipa> i would assume not, because there is only one thread on the python side that does all IPC calls
05:07 < ryanofsky> it should just make two IPC calls at the same time, and they should be calls that do not return instantly like waitNext
05:08 < ryanofsky> yeah, it is unexpected that current python test would cause that
05:08 < dodo> c
05:09 -!- hacker4web3bitco [~hacker4we@user/hacker4web3bitco] has joined #bitcoin-core-dev
05:10 < ryanofsky> maybe there is a race condition where c++ code sends a response but the server thread doesn't return to being idle, and python code sends the next request quickly enough to hit the assert
05:10 -!- bitcoinlover [~hacker4we@user/hacker4web3bitco] has quit [Ping timeout: 248 seconds]
05:11 < sipa> oh, i may have messed something up
05:11 < sipa> waitnext = template2.result.waitNext(ctx, waitoptions)
05:11 < sipa> template4 = await template2.result.waitNext(ctx, waitoptions)
05:11 < sipa> might create two asynchronous calls simultaneously?
05:12 < ryanofsky> it seems like that would just chain the calls so the server runs them back to back, and make the bug more likely to happen
05:13 < ryanofsky> but if you added an await on the first line it might prevent this
05:13 < sipa> the first one is just unnecessary - i inlined it into the second one, but forgot to remove the original
05:14 < sipa> you say "make the bug more likely", so you think that the bug isn't in this code itself, but elsewhere?
05:14 < ryanofsky> actually i misread. yeah i assumed you were using waitnext variable second line and the calls would be chained. but actually this does look like simultaneous calls
05:14 < sipa> alright, let me run it 1000 times to see if it reappears, but if not, i'm going to assume that was the culprit
05:14 < ryanofsky> no i think this code does explain the bug
05:15 < sipa> thanks!
05:15 < ryanofsky> i just first misread it as chained calls that would be pipelined when actually those are parallel calls
05:15 < sipa> right
05:16 < ryanofsky> thanks for the test, and finding this. server could definitely do better here, it could just refuse the request instead of asserting
05:18 < sipa> or at least a more helpful error message
05:18 < sipa> i could imagine third parties trying to use the interface might hit something like this too
05:20 < ryanofsky> yeah the error message is the most important thing in practice
sipa, TheCharlatan and ismaelsadeeq
Metadata
Metadata
Assignees
Labels
No labels