Skip to content

Memory leak when shard fails to allocate on missing synonym dictionary #19879

@adrianocrestani

Description

@adrianocrestani

Elasticsearch version: 2.1.1 and 2.3.5

Plugins installed: []

JVM version: Oracle JDK 1.8.0_60

OS version: Ubuntu 4.8.2 (Trusty 14.04)

Description of the problem including expected versus actual behavior:

We ran into a memory issue in our ES cluster when some shards tried to be allocated on certain nodes that were missing a synonym dictionary. The heap usage went from 15GB to 30GB and the behaviour was 100% reproducible.

Those nodes did not OOM, but were spending a lot of time doing GCs. The whole cluster got very unresponsive where stats and search requests would take a long time to respond and often time out. The dictionary was missing on 4 out of 16 data nodes, only those 4 nodes started running out of memory, the other 12 nodes had the synonym dictionary in place and were able to allocate shards.

Cluster configuration: ES 2.1.1, 16 data nodes, 2 masters, 6 proxy nodes.

We were able to reproduce the memory leak problem with the steps below.

Steps to reproduce:

  1. Install ES 2.3.5 on two different nodes

  2. Create a synonym dictionary using the command below on each node:
    mkdir $ES_HOME/config/analysis/ && cat /usr/share/dict/words | sed -r 's/(.*)/\1,\1_synonym/' > $ES_HOME/config/analysis/synonym.txt

  3. Start both nodes with 256MB of heap.
    export ES_HEAP_SIZE=256m && $ES_HOME/bin/elasticsearch

  4. Create an index using the following schema:

curl -XPUT localhost:9200/test_index -d '{
  "settings": {
    "similarity": {
      "ta_bm25": {
        "type": "BM25",
        "b": 0.1,
        "k1": 20
      }
    },
    "index": {
      "number_of_shards": 3,
      "analysis": {
        "filter": {
          "my_synonym": {
            "type": "synonym",
            "synonyms_path": "analysis/synonym.txt"
          },
          "word_delimiter_filter": {
            "type": "word_delimiter",
            "catenate_words": true,
            "catenate_all": true,
            "preserve_original": true,
            "stem_english_possessive": false
          },
          "abbreviation_filter": {
            "type": "pattern_replace",
            "pattern": "([a-zA-Z0-9]{1})[.]",
            "replacement": "$1"
          },
          "ngram": {
            "type": "edgeNGram",
            "min_gram": 1,
            "max_gram": 12
          },
          "stem_en": {
            "type": "stemmer",
            "name": "minimal_english"
          }
        },
        "analyzer": {
          "search_analyzer": {
            "filter": [
              "asciifolding",
              "lowercase",
              "abbreviation_filter",
              "word_delimiter_filter",
              "my_synonym",
              "stem_en"
            ],
            "tokenizer": "whitespace"
          },
          "index_analyzer": {
            "tokenizer": "whitespace",
            "filter": [
              "asciifolding",
              "lowercase",
              "abbreviation_filter",
              "word_delimiter_filter",
              "stem_en"
            ]
          }
        }
      },
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "my_type": {
      "_source": {
        "enabled": false
      },
      "_all": {
        "enabled": false
      },
      "properties": {
        "name": {
          "type": "string",
          "store": "no",
          "index": "analyzed",
          "analyzer": "index_analyzer",
          "search_analyzer": "search_analyzer",
          "similarity": "BM25"
        }
      }
    }
  }
}'
  1. Create a test index:
    for k in {1..10000} ; do for i in {1..100} ; do printf "{\"index\": {\"_index\":\"test_index\", \"_type\":\"my_type\",\"_id\":\"${k}-${i}\"}}\n{\"name\":\"$(shuf -n 5 /usr/share/dict/words | tr '\n' ' ')\"}\n" ; done | curl -XPUT 'localhost:9200/_bulk' --data-binary @- ; done

  2. Once the indexing is done, there should be one index called test_index, with one million documents, 3 shards and one replica. That means that both nodes should have 3 shards, one replica each. The command below should say that heap is around 40-50%.
    curl -s 'localhost:9200/_cat/nodes?v&h=name,heap.current,heap.max,heap.percent'

  3. Shutdown one of the nodes.

  4. Remove the dictionary file from that node and start it again.

rm -rf $ES_HOME/config/analysis/synonym.txt
export ES_HEAP_SIZE=256m && $ES_HOME/bin/elasticsearch
  1. At this point, the node without the dictionary file will attempt to allocate the index shards, but will fail with the exception pasted log section below. ES attempts to do that over and over. Use the command below to keep track of the allocated heap.
    curl -s 'localhost:9200/_cat/nodes?v&h=name,heap.current,heap.max,heap.percent'

  2. After about 15-20 minutes, the heap will go above 95% and GC doesn't seem to help.

  3. To make things even worse, recreate the synonym dictionary (recreate, copy from a backed up file or from the other node).

  4. At this point, the node that is running out of memory will finally be able to allocate the those shards. Unfortunately, it will OOM.

Provide logs (if relevant):

[2016-08-08 15:15:11,172][WARN ][indices.cluster ] [Xi'an Chi Xan] [[test_index][0]] marking and sending shard failed due to [failed to create index] [16/1885]
[test_index] IndexCreationException[failed to create index]; nested: IllegalArgumentException[IOException while reading synonyms_path_path: /home/acrestani/Downloads/elasticsearch-2.3.5/config/analysis
/synonym.txt (No such file or directory)];
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: IOException while reading synonyms_path_path: /home/acrestani/Downloads/elasticsearch-2.3.5/config/analysis/synonym.txt (No such file or directory)
at org.elasticsearch.index.analysis.Analysis.getReaderFromFile(Analysis.java:288)
at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.<init>(SynonymTokenFilterFactory.java:64)
at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:54)
at org.elasticsearch.common.inject.InjectorImpl$4$1.call(InjectorImpl.java:823)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
at org.elasticsearch.common.inject.InjectorImpl$4.get(InjectorImpl.java:818)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:236)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:151)
at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:70)
at sun.reflect.GeneratedConstructorAccessor5.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)

After deploying the synonym dictionary it prints this:

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid5205.hprof ...
Heap dump file created [317848006 bytes in 1.196 secs]
[2016-08-08 15:35:54,090][WARN ][threadpool ] [Xi'an Chi Xan] failed to run org.elasticsearch.discovery.zen.fd.MasterFaultDetection$MasterPinger@3eb528ce
java.lang.OutOfMemoryError: Java heap space
Exception in thread "elasticsearch[Xi'an Chi Xan][http_server_boss][T#1]" Exception in thread "elasticsearch[Xi'an Chi Xan][generic][T#5]" Exception in thread "elasticsearch[Xi'an Chi Xan][transport_client_boss][T#1]" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68)
at java.lang.StringBuilder.<init>(StringBuilder.java:101)
at org.elasticsearch.common.logging.support.LoggerMessageFormat.format(LoggerMessageFormat.java:52)
at org.elasticsearch.common.logging.support.AbstractESLogger.warn(AbstractESLogger.java:109)
at org.elasticsearch.transport.netty.NettyInternalESLogger.warn(NettyInternalESLogger.java:83)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:340)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
[2016-08-08 15:35:54,095][WARN ][transport.netty ] [Xi'an Chi Xan] exception caught on transport layer [[id: 0x20cb9904, /10.33.154.216:60026 => /10.96.160.99:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [69030], action [internal:discovery/zen/fd/ping], readerIndex [75] vs expected [181]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:121)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions