Skip to content

Conversation

@johndydo
Copy link
Contributor

Sending multiple messages in rapid succession (or concurrently) causes deadlock.

While building an Acceptor implementation, I experienced the symptoms of a deadlock when the server happened to receive two requests in quick succession. The messages would be processed on the same goroutine as the FromApp callback and a response sent back using SendToTarget. The session thread got stuck in the second call to SendToTarget.

To debug the issue, I created an Initiator (which runs in a separate process on the same machine) that generates many requests:

for {
	time.Sleep(time.Second * 10)

	for i := 0; i < 10; i++ {
		go func() {
			log.Println("before send")
			must(quickfix.SendToTarget(NewRequestMessage(), sessionId))
			log.Println("after send")
		}()
	}
}

Surprisingly the Initiator would also deadlock quickly and stop printing "after send".

The deadlock occurs between the session thread and the EventTimer thread (for the stateTimer heartbeat timer).

Here is a diagram of the two threads:

     Session          EventTimer stateTimer
     ------          ----------------------
t=0  select          select
t=1	             case Timer fires
	             EventTimer.f()
	             blocking send to unbuffered channel sessionEvent
	             --- waiting for Session to synchronously receive from sessionEvent ---
	             --- never gets to select case which would read reset channel ---
t=2  case messageEvent
     SendAppMessages() with at least 2 messages
t=3  send msg #1 to socket
t=4  stateTimer.Reset(s.HeartBtInt)
     buffered send to Timer's reset channel (fills the buffer, but does not block)
t=5  send msg #2 to socket
t=6  stateTimer.Reset(s.HeartBtInt)
     buffered send to Timer's reset channel (blocks since the channel buffer is full)
     --- send is waiting for Timer's reset channel to not be full ---
     --- never gets to the select case which reads sessionEvent ---

In summary, the session thread's select happens to enter the case for messageEvent, but there
is a race with the timer thread which enters the case for executing the timer action.

The timer's action is to send to an unbuffered channel that can only get read when the session thread is in a different select case than it is currently in.

If there happens to be more than 1 message to send, then the session thread will call EventTimer.Reset more than once, which will guarantee that the reset channel will become full and now these threads are deadlocked.

The deadlock won't happen if EventTimer.Reset simply resets the timer in a nonblocking fashion.

Past commit b75fbea#diff-8dc0bfb75b5413b6f4ed2d880e648ad9 moved from using time.After to the current implementation.

This PR changes the EventTimer to instead use time.Timer directly, without the need for a reset channel, as any thread can call Reset on it to a future time, without the possibility of getting blocked.

EventTimer.Reset implemented using a channel would deadlock if
thread A calls Reset more than once while the EventTimer is invoking
its action and the action blocks on thread A (i.e. sending to an
unbuffered channel that thread A must receive from)
@johndydo johndydo mentioned this pull request Mar 22, 2018
@cbusbey
Copy link
Contributor

cbusbey commented Mar 23, 2018

👍 👍 👍

Thanks for digging into this, @johndydo !

@cbusbey cbusbey merged commit 4d7842f into quickfixgo:master Mar 23, 2018
@cbusbey cbusbey added the Bug Behavior not matching expected label Mar 23, 2018
level2player pushed a commit to longbridge/quickfix that referenced this pull request Jul 19, 2022
Change EventTimer to reset its Timer without requiring a reset channel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Behavior not matching expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants