Skip to content

[CI] IllegalReferenceCountException in FullClusterRestartIT #30288

@DaveCTurner

Description

@DaveCTurner

A failure in the BWC tests of 6.3 vs 5.0.2:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.3+bwc-tests/19/console

FAILURE  253s | FullClusterRestartIT.testSecurityNativeRealm <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: there are still running tasks:
   > {time_in_queue=28ms, time_in_queue_millis=28, source=_add_listener_, executing=true, priority=HIGH, insert_order=765}
   > {time_in_queue=18ms, time_in_queue_millis=18, source=cluster_reroute(async_shard_fetch), executing=false, priority=HIGH, insert_order=767}
   > {time_in_queue=28ms, time_in_queue_millis=28, source=delayed_allocation_reroute, executing=false, priority=NORMAL, insert_order=766}
   >  at __randomizedtesting.SeedInfo.seed([3CA270480BA6CAFB:B75A8C1B31106AE3]:0)
   >  at org.elasticsearch.test.rest.ESRestTestCase.lambda$waitForClusterStateUpdatesToFinish$0(ESRestTestCase.java:347)
   >  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:754)
   >  at org.elasticsearch.test.rest.ESRestTestCase.waitForClusterStateUpdatesToFinish(ESRestTestCase.java:338)
   >  at org.elasticsearch.test.rest.ESRestTestCase.cleanUpCluster(ESRestTestCase.java:151)
   >  at java.lang.Thread.run(Thread.java:748)
   >  Suppressed: java.lang.AssertionError: there are still running tasks:
   > {time_in_queue=40ms, time_in_queue_millis=40, source=cluster_reroute(async_shard_fetch), executing=true, priority=HIGH, insert_order=637}
...
Tests with failures:
  - org.elasticsearch.xpack.restart.FullClusterRestartIT.testSecurityNativeRealm
  - org.elasticsearch.xpack.restart.FullClusterRestartIT.testWatcher

However, other tests in this fixture also seemed very slow:

HEARTBEAT J0 PID([email protected]): 2018-05-01T05:33:54, stalled for  252s at: FullClusterRestartIT.testSecurityNativeRealm
HEARTBEAT J0 PID([email protected]): 2018-05-01T05:37:46, stalled for  231s at: FullClusterRestartIT.testWatcher
HEARTBEAT J0 PID([email protected]): 2018-05-01T05:41:02, stalled for  190s at: FullClusterRestartIT.testSingleDoc
HEARTBEAT J0 PID([email protected]): 2018-05-01T05:44:40, stalled for  211s at: FullClusterRestartIT.testSqlFailsOnIndexWithTwoTypes

Digging into the logs a bit, I find this which doesn't look right:

[2018-05-01T05:29:46,598][DEBUG][o.e.c.s.ClusterService   ] [node-0] processing [zen-disco-receive(from master [master {node-1}{-Wjuwb5sSUiVnV5ldBENqA}{qs8hEUgARRul6e9hCQ6Vog}{127.0.0.1}{127.0.0.1:53625}{testattr=test} committed version [119]])]:
[2018-05-01T05:29:46,627][WARN ][o.e.x.s.t.n.SecurityNetty4Transport] [node-0] exception caught on transport layer [[id: 0xd86dc7ea, L:/127.0.0.1:39843 - R:/127.0.0.1:60615]], closing connection
io.netty.handler.codec.DecoderException: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) ~[netty-codec-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) ~[netty-codec-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:350) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:610) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:513) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:467) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:437) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) [netty-common-4.1.5.Final.jar:4.1.5.Final]
        at java.base/java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
        at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:111) ~[?:?]
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1068) ~[?:?]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:900) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411) ~[?:?]
        ... 15 more
[2018-05-01T05:29:46,638][DEBUG][i.n.h.s.SslHandler       ] [id: 0x2280b8e2, L:/127.0.0.1:39843 - R:/127.0.0.1:60618] Swallowing a harmless 'connection reset by peer / broken pipe' error that occurred while writing close_notify in response to the
java.io.IOException: Connection reset by peer
        at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:?]
        at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:?]
        at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:283) ~[?:?]
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:250) ~[?:?]
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:226) ~[?:?]
        at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:382) ~[?:?]
        at io.netty.buffer.PooledHeapByteBuf.setBytes(PooledHeapByteBuf.java:261) ~[netty-buffer-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1100) ~[netty-buffer-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:367) ~[netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:118) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:610) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:513) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:467) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:437) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) [netty-common-4.1.5.Final.jar:4.1.5.Final]
        at java.base/java.lang.Thread.run(Thread.java:844) [?:?]

This is on the 5.0.2 node (running on JVM 10):

[2018-05-01T05:28:21,514][INFO ][o.e.n.Node               ] [node-0] version[5.0.2], pid[7422], build[f6b4951/2016-11-24T10:07:18.101Z], OS[Linux/3.13.0-137-generic/amd64], JVM["Oracle Corporation"/Java HotSpot(TM) 64-Bit Server VM/10/10+46]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions