-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
6.6.1
If a dictionary file (e.g., stop words file for an analysis chain) is missing from the file system, the cluster can go into a recovery loop trying to start the shard (without success):
[<master>] failing shard [failed shard, shard [<index_name>][0], node[FCTG63TLQKSKR-TP1mWnRA], relocating [AkawpiLAQrWjdscFm8wKRA], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=AnHSDjEsRTmgxYlfGKVxyQ, rId=wQVYB77ES1ygVXvWFh6fLw], expected_shard_size[261], message [failed to create index], failure [IllegalArgumentException[IOException while reading stopwords_path: /app/config/<example>-stopwords.txt]; nested: NoSuchFileException[/app/config/<example>-stopwords.txt]; ], markAsStale [true]]
java.lang.IllegalArgumentException: IOException while reading stopwords_path: /app/config/<example>-stopwords.txt
at org.elasticsearch.index.analysis.Analysis.getWordList(Analysis.java:264) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.analysis.Analysis.getWordList(Analysis.java:231) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.analysis.Analysis.parseWords(Analysis.java:170) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.analysis.Analysis.parseStopWords(Analysis.java:194) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.analysis.StopTokenFilterFactory.<init>(StopTokenFilterFactory.java:47) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:355) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:178) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:159) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.IndexService.<init>(IndexService.java:164) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:397) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:519) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:473) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:156) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndices(IndicesClusterStateService.java:462) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:232) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:486) ~[elasticsearch-6.6.1.jar:6.6.1]
at java.lang.Iterable.forEach(Iterable.java:75) ~[?:1.8.0_144]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:483) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:470) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:421) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:165) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:660) [elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [elasticsearch-6.6.1.jar:6.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.nio.file.NoSuchFileException: /app/config/<example>-stopwords.txt
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at co.elastic.cloud.quotaawarefs.QuotaAwareFileSystemProvider.newByteChannel(QuotaAwareFileSystemProvider.java:264) ~[quota-aware-fs-1.1.1-SNAPSHOT.jar:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_144]
at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_144]
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_144]
at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_144]
at java.nio.file.Files.newBufferedReader(Files.java:2784) ~[?:1.8.0_144]
at org.elasticsearch.index.analysis.Analysis.getWordList(Analysis.java:255) ~[elasticsearch-6.6.1.jar:6.6.1]
... 26 more
This has led to issues like the memory leak bug (#48230) that is fixed in a later version. The other undesirable effect is that it causes the master node to be consumed in dealing with a never-ending loop of shard-failed tasks (which are higher priority than "normal" tasks like snapshots). For example, this causes snapshot requests to keep failing with a ProcessClusterEventTimeoutException until the IOException is addressed (or until the master node is less busy).
failed to create snapshot
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (create_snapshot [scheduled-1591143244-instance-0000000013]) within 5m
at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:129) ~[elasticsearch-6.6.1.ja....
Have we considered making these permanent failures so that it doesn't go and retry recovery indefinitely, which can cause other issues in the cluster?
This recovery loop could also happen with other non-IOExceptions such as when the dictionary file is not serializable:
[node_name] [index_name][0] received shard failed for shard id [[index_name][0]], allocation id [TQL8ZQWpS5yfJvv9Amg-Xw], primary term [0], message [failed to create index], failure [NotSerializableExceptionWrapper[runtime_exception: Illegal user dictionary entry i am - the number of segmentations (2) does not the match number of readings (1)]]
org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: runtime_exception: Illegal user dictionary entry i am - the number of segmentations (2) does not the match number of readings (1)
at org.apache.lucene.analysis.ja.dict.UserDictionary.<init>(UserDictionary.java:112) ~[?:?]
at org.apache.lucene.analysis.ja.dict.UserDictionary.open(UserDictionary.java:81) ~[?:?]
at org.elasticsearch.index.analysis.KuromojiTokenizerFactory.getUserDictionary(KuromojiTokenizerFactory.java:65) ~[?:?]
at org.elasticsearch.index.analysis.KuromojiTokenizerFactory.<init>(KuromojiTokenizerFactory.java:52) ~[?:?]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:342) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenizerFactories(AnalysisRegistry.java:176) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:154) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.index.IndexService.<init>(IndexService.java:145) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:363) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:448) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:413) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:147) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndices(IndicesClusterStateService.java:444) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:202) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:814) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:768) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) ~[elasticsearch-5.6.9.jar:5.6.9]
at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) ~[elasticsearch-5.6.9.jar:5.6.9]