Skip to content

Infinite loop due to thread safety of MpscUnboundedArrayQueue.poll #947

@xiazuojie

Description

@xiazuojie

As its name (Multi Producer Single Consumer) indicates, MpscUnboundedArrayQueue is not thread-safe when polling. If multiple threads are polling from it, those threads may end up in infinite loop here:

                do
                {
                    e = lvElement(buffer, offset);
                }
                while (e == null);

UnboundedProcessor is internally using MpscUnboundedArrayQueue. When a connection needs to be terminated (due to no keep-alive acks or other reasons), more than 1 threads will try to poll from its MpscUnboundedArrayQueue.

As shown below,

  • "reactor-tcp-nio-7" is the "normal" working thread polling from the queue.
  • "parallel-7" is the thread responsible for terminating a connection due to "no keep-alive acks". It tries to poll all elements from the queue in order to release their memory.
public void clear() {
    while (!queue.isEmpty()) {
      T t = queue.poll();
      if (t != null) {
        release(t);
      }
    }
  }
"reactor-tcp-nio-7@16412" daemon prio=5 tid=0x250 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at io.rsocket.internal.jctools.queues.BaseMpscLinkedArrayQueue.poll(BaseMpscLinkedArrayQueue.java:256)
      at io.rsocket.internal.UnboundedProcessor.poll(UnboundedProcessor.java:330)
      at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.poll(FluxMapFuseable.java:174)
      at reactor.netty.channel.MonoSendMany$SendManyInner.run(MonoSendMany.java:264)
      at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.lang.Thread.run(Thread.java:745)

"parallel-7@16290" daemon prio=5 tid=0x258 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at io.rsocket.internal.jctools.queues.BaseMpscLinkedArrayQueue.poll(BaseMpscLinkedArrayQueue.java:256)
      at io.rsocket.internal.UnboundedProcessor.clear(UnboundedProcessor.java:346)
      at io.rsocket.internal.UnboundedProcessor.cancel(UnboundedProcessor.java:317)
      at io.rsocket.internal.UnboundedProcessor.dispose(UnboundedProcessor.java:364)
      at io.rsocket.RSocketRequester.terminate(RSocketRequester.java:594)
      at io.rsocket.RSocketRequester.tryTerminate(RSocketRequester.java:559)
      at io.rsocket.RSocketRequester.tryTerminateOnKeepAlive(RSocketRequester.java:541)
      at io.rsocket.RSocketRequester$$Lambda$904.1238285436.accept(Unknown Source:-1)
      at io.rsocket.keepalive.KeepAliveSupport.tryTimeout(KeepAliveSupport.java:110)
      at io.rsocket.keepalive.KeepAliveSupport$ClientKeepAliveSupport.onIntervalTick(KeepAliveSupport.java:146)
      at io.rsocket.keepalive.KeepAliveSupport.lambda$start$0(KeepAliveSupport.java:54)
      at io.rsocket.keepalive.KeepAliveSupport$$Lambda$906.2130835366.accept(Unknown Source:-1)
      at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
      at reactor.core.publisher.FluxInterval$IntervalRunnable.run(FluxInterval.java:123)
      at reactor.core.scheduler.PeriodicWorkerTask.call(PeriodicWorkerTask.java:59)
      at reactor.core.scheduler.PeriodicWorkerTask.run(PeriodicWorkerTask.java:73)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

Expected Behavior

Thread-safe. No infinite loop.

Actual Behavior

Threads sometimes end up in infinite loop when terminating a connection. It has happened 3 times in production for me.

Steps to Reproduce

Demo of infinite loop when multiple threads try to poll from MpscUnboundedArrayQueue.

import io.rsocket.internal.jctools.queues.MpscUnboundedArrayQueue
import org.junit.Test
import reactor.util.concurrent.Queues

class MpscUnboundedArrayQueueTest {
    @Test
    fun test() {
        val a = MpscUnboundedArrayQueue<String>(Queues.SMALL_BUFFER_SIZE)

        for(x in 0 until 4) {
            val t = Thread(
                Runnable {
                    while (true) {
                        a.offer(java.time.LocalTime.now().toString())
                        Thread.sleep(10)
                    }
                }, "p$x"
            )
            t.isDaemon = true
            t.start()
        }
        for(x in 0 until 4) {
            Thread(
                Runnable {
                    while (true) {
                        println(java.time.LocalTime.now().toString() + ": " + Thread.currentThread().toString() + ": " + a.poll())
                    }
                }, "c$x"
            ).start()
        }
        while (true) {
            Thread.sleep(10000)
        }
    }
}

Possible Solution

Replace MpscUnboundedArrayQueue with MPMC (Multi Producer Multi Consumer) queue.

Your Environment

  • RSocket version(s) used: 1.0.0-RC6
  • Other relevant libraries versions (eg. netty, ...):
  • Platform (eg. JVM version (javar -version) or Node version (node --version)): Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
  • OS and version (eg uname -a):Darwin MacBook-Pro.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugsupersededIssue is superseded by another

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions