-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version (bin/elasticsearch --version):
Version: 7.4.1, Build: oss/docker/fc0eeb6e2c25915d63d871d344e3d0b45ea0ea1e/2019-10-22T17:16:35.176724Z, JVM: 13
(deployed in k8s with 3 data nodes and 3 dedicated masters)
Plugins installed:
analysis-url 1.0.0 (custom analysis plugin providing a token filter for URLs)
repository-gcs 7.4.1
repository-s3 7.4.1
JVM version (java -version):
openjdk version "13" 2019-09-17
OpenJDK Runtime Environment AdoptOpenJDK (build 13+33)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 13+33, mixed mode, sharing)
OS version (uname -a if on a Unix-like system):
Linux elasticsearch-0 4.14.138+ #1 SMP Tue Sep 3 02:58:08 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Yesterday we had an ES cluster repeatedly going red due to merge failures. In each case it was due to the error: java.lang.IllegalStateException: totalPointCount=4424 was passed when we were created, but we just hit 0329 values.
This was seen a total of 12 times in the last 24 hours on 3 different indices and on two different nodes.
After the merge failure ES does not attempt to reallocate the shard and the cluster goes yellow/red.
Doing a /_cluster/reroute?retry_failed=true causes the shard to be allocated instantaneously and the cluster health returns to green.
Attempting a force merge on the index, once it has reallocated, succeeds.
Have not found any other cases of this error in any of our historical logs (30d) or any of our other clusters (we have 2 other identically configured clusters in different regions as well as multiple test clusters).
The error seems to come from lucene though I don't know enough about ES/lucene internals to trace this state back.
We tried restarting one of the nodes that was getting into this state and have not seen a failure from it since then, however these have been so infrequent I'm not sure how significant that is and would like understand what the problem is before restarting other nodes where this has been seen.
Provide logs (if relevant):
{"type": "server", "timestamp": "2019-12-25T15:32:55,877Z", "level": "WARN", "component": "o.e.i.e.Engine", "cluster.name": "elasticsearch", "node.name": "elasticsearch-0", "message": " [v27.tcpevent-shrink-000188][0] failed engine [merge failed]", "cluster.uuid": "Wk2FFMJbQGKUAPPeZjek2w", "node.id": "45jcOptdTjqnA_fG1punmg" ,
"stacktrace": ["org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: totalPointCount=2051853 was passed when we were created, but we just hit 0782 values",
"at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2389) [elasticsearch-7.4.1.jar:7.4.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.4.1.jar:7.4.1]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.4.1.jar:7.4.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:830) [?:?]",
"Caused by: java.lang.IllegalStateException: totalPointCount=2051853 was passed when we were created, but we just hit 0782 values",
"at org.apache.lucene.util.bkd.BKDWriter$OneDimensionBKDWriter.add(BKDWriter.java:562) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.util.bkd.BKDWriter.merge(BKDWriter.java:497) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:213) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:202) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:162) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4462) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4056) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101) ~[elasticsearch-7.4.1.jar:7.4.1]",
"at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]"] }
{"type": "server", "timestamp": "2019-12-25T15:32:55,885Z", "level": "WARN", "component": "o.e.i.c.IndicesClusterStateService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-0", "message": "[v27.tcpevent-shrink-000188][0] marking and sending shard failed due to [shard failure, reason [merge failed]]", "cluster.uuid": "Wk2FFMJbQGKUAPPeZjek2w", "node.id": "45jcOptdTjqnA_fG1punmg" ,
"stacktrace": ["org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: totalPointCount=2051853 was passed when we were created, but we just hit 0782 values",
"at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2389) ~[elasticsearch-7.4.1.jar:7.4.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.4.1.jar:7.4.1]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.4.1.jar:7.4.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:830) [?:?]",
"Caused by: java.lang.IllegalStateException: totalPointCount=2051853 was passed when we were created, but we just hit 0782 values",
"at org.apache.lucene.util.bkd.BKDWriter$OneDimensionBKDWriter.add(BKDWriter.java:562) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.util.bkd.BKDWriter.merge(BKDWriter.java:497) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:213) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:202) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:162) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4462) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4056) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]",
"at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101) ~[elasticsearch-7.4.1.jar:7.4.1]",
"at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]"] }