Skip to content

OutOfMemoryError not killing the node #28967

@nachogiljaldo

Description

@nachogiljaldo

Elasticsearch version (bin/elasticsearch --version): 6.1.3

Plugins installed: [ingest-geoip,ingest-user-agent,repository-s3,x-pack]

JVM version (java -version):

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

OS version (uname -a if on a Unix-like system): Linux bd9a03495c76 4.4.0-66-generic #87~14.04.1-Ubuntu SMP Fri Mar 3 17:32:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Node suffers an OOM error and the following message is logged:
[2018-03-09T04:18:01,808][ERROR][org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler] fatal error in thread [elasticsearch[instance-0000000008][bulk][T#739]], exiting java.lang.OutOfMemoryError: Java heap space
However, the instance does not die and it keeps on throwing OOMs from time remaining totally unusable.

Steps to reproduce:

Provide logs (if relevant):
Lots of similar logs are there in the last 24h:

[2018-03-09T11:21:55,312][ERROR][org.elasticsearch.index.engine.Engine] tragic event in index writer java.lang.OutOfMemoryError: Java heap space

This is the full stacktrace of one of the errors:

This is the first OOM that was thrown and did not result on the instance dying:

[2018-03-08T12:31:30,448][ERROR][org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler] fatal error in thread [Thread-35], exiting
java.lang.OutOfMemoryError: Java heap space
	at io.netty.buffer.UnpooledHeapByteBuf.allocateArray(UnpooledHeapByteBuf.java:85) ~[?:?]
	at io.netty.buffer.UnpooledByteBufAllocator.allocateArray(UnpooledByteBufAllocator.java:147) ~[?:?]
	at io.netty.buffer.UnpooledHeapByteBuf.<init>(UnpooledHeapByteBuf.java:58) ~[?:?]
	at io.netty.buffer.UnpooledByteBufAllocator.<init>(UnpooledByteBufAllocator.java:142) ~[?:?]
	at io.netty.buffer.UnpooledByteBufAllocator.newHeapBuffer(UnpooledByteBufAllocator.java:64) ~[?:?]
	at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:162) ~[?:?]
	at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:153) ~[?:?]
	at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:135) ~[?:?]
	at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:80) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel.read(AbstractNioByteChannel.java:122) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor.run(SingleThreadEventExecutor.java:858) ~[?:?]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]

Other, later, error I've seen that does not result on the instance dying:

[2018-03-09T06:34:19,412][WARN ][org.elasticsearch.cluster.action.shard.ShardStateAction] [filebeat-2018.03.06][3] received shard failed for shard id [[filebeat-2018.03.06][3]], allocation id [2-mEAryVT_2ElzJAYhZWcQ], primary term [0], message [shard failure, reason [lucene commit failed]], failure [AlreadyClosedException[this IndexWriter is closed]; nested: OutOfMemoryError[Java heap space]; ]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:896) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:910) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
	at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3377) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
	at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2113) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2106) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1493) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1024) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.indices.flush.SyncedFlushService.performPreSyncedFlush(SyncedFlushService.java:414) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.indices.flush.SyncedFlushService.access(SyncedFlushService.java:70) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.indices.flush.SyncedFlushService.messageReceived(SyncedFlushService.java:696) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.indices.flush.SyncedFlushService.messageReceived(SyncedFlushService.java:692) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.doRun(SecurityServerTransportInterceptor.java:258) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.EsExecutors.execute(EsExecutors.java:135) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.lambda-zsh(SecurityServerTransportInterceptor.java:307) ~[?:?]
	at org.elasticsearch.action.ActionListener.onResponse(ActionListener.java:60) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda(ServerTransportFilter.java:165) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationUtils.maybeRun(AuthorizationUtils.java:182) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationUtils.setRunAsRoles(AuthorizationUtils.java:176) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationUtils.authorize(AuthorizationUtils.java:158) ~[?:?]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda(ServerTransportFilter.java:167) ~[?:?]
	at org.elasticsearch.action.ActionListener.onResponse(ActionListener.java:60) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.lambda(AuthenticationService.java:195) ~[?:?]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.lambda(AuthenticationService.java:228) ~[?:?]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.lookForExistingAuthentication(AuthenticationService.java:239) ~[?:?]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticateAsync(AuthenticationService.java:193) ~[?:?]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.access-zsh(AuthenticationService.java:147) ~[?:?]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:116) ~[?:?]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter.inbound(ServerTransportFilter.java:141) ~[?:?]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.messageReceived(SecurityServerTransportInterceptor.java:314) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.transport.TransportService.doRun(TransportService.java:652) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.ThreadContext.doRun(ThreadContext.java:637) ~[elasticsearch-6.1.3.jar:6.1.3]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.1.3.jar:6.1.3]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
	at java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.OutOfMemoryError: Java heap space

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Core/Infra/ResiliencyKeep running when everything is ok. Die quickly if things go horribly wrong.>bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions