Skip to content

Mounting a previously searchable snapshotted index leads to extremely confusing errors about translog failure #68792

@dakrone

Description

@dakrone

I was manually testing (on master, b914994) a scenario it'd be nice to be able to use within ILM, and doing the following:

POST /_license/start_trial?acknowledge=true

PUT /_snapshot/repo1
{
  "type": "fs",
  "settings": {
    "location": "repo1"
  }
}

PUT /_snapshot/repo2
{
  "type": "fs",
  "settings": {
    "location": "repo2"
  }
}

POST /foo/_doc
{
  "foo": "bar"
}

PUT /_snapshot/repo1/snap1?wait_for_completion

POST /_snapshot/repo1/snap1/_mount?wait_for_completion
{
  "index": "foo",
  "renamed_index": "restored-foo",
  "index_settings": {
    "index.number_of_replicas": 0
  },
  "ignored_index_settings": [ "index.refresh_interval" ]
}

GET /restored-foo/_search

PUT /_snapshot/repo2/snap2?wait_for_completion

POST /_snapshot/repo2/snap2/_mount?wait_for_completion
{
  "index": "restored-foo",
  "renamed_index": "partial-restored-foo",
  "index_settings": {
    "index.number_of_replicas": 0
  },
  "ignored_index_settings": [ "index.refresh_interval" ]
}

Leads to restoring the partial-restored-foo index, but the index is red, and the ES log contains:

[2021-02-09T15:24:11,994][WARN ][o.e.i.CompositeIndexEventListener] [runTask-0] [partial-restored-foo] failed to invoke the listener before the shard recovery starts for [partial-restored-foo][0]
org.elasticsearch.index.translog.TranslogException: failed to associate a new translog
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.associateNewEmptyTranslogWithIndex(SearchableSnapshotIndexEventListener.java:107) ~[?:?]
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.beforeIndexShardRecovery(SearchableSnapshotIndexEventListener.java:68) ~[?:?]
	at org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:254) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1385) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:444) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:269) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1942) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:2671) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:728) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.elasticsearch.index.store.ImmutableDirectoryException: file [corrupted_dR7BxgMnTAq9SfcYqbk0hg] is not mutable
	at org.elasticsearch.index.store.InMemoryNoOpCommitDirectory.ensureMutable(InMemoryNoOpCommitDirectory.java:126) ~[?:?]
	at org.elasticsearch.index.store.InMemoryNoOpCommitDirectory.createOutput(InMemoryNoOpCommitDirectory.java:79) ~[?:?]
	at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]
	at org.elasticsearch.index.store.ByteSizeCachingDirectory.createOutput(ByteSizeCachingDirectory.java:118) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]
	at org.elasticsearch.index.store.Store.markStoreCorrupted(Store.java:1354) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:181) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.associateNewEmptyTranslogWithIndex(SearchableSnapshotIndexEventListener.java:100) ~[?:?]
	... 13 more
[2021-02-09T15:24:12,006][WARN ][o.e.i.c.IndicesClusterStateService] [runTask-0] [partial-restored-foo][0] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [partial-restored-foo][0]: Recovery failed on {runTask-0}{-S4QWJmoQ7m0bDCUz-y4nQ}{aqye9hlGSn66hYEOqSDJRQ}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}{ml.machine_memory=67405721600, xpack.installed=true, transform.node=true, testattr=test, ml.max_open_jobs=20, ml.max_jvm_size=518979584}
	at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$21(IndexShard.java:2724) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:125) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:274) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1942) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:2671) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:728) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.elasticsearch.index.translog.TranslogException: failed to associate a new translog
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.associateNewEmptyTranslogWithIndex(SearchableSnapshotIndexEventListener.java:107) ~[?:?]
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.beforeIndexShardRecovery(SearchableSnapshotIndexEventListener.java:68) ~[?:?]
	at org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:254) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1385) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:444) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:269) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	... 8 more
Caused by: org.elasticsearch.index.store.ImmutableDirectoryException: file [corrupted_dR7BxgMnTAq9SfcYqbk0hg] is not mutable
	at org.elasticsearch.index.store.InMemoryNoOpCommitDirectory.ensureMutable(InMemoryNoOpCommitDirectory.java:126) ~[?:?]
	at org.elasticsearch.index.store.InMemoryNoOpCommitDirectory.createOutput(InMemoryNoOpCommitDirectory.java:79) ~[?:?]
	at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]
	at org.elasticsearch.index.store.ByteSizeCachingDirectory.createOutput(ByteSizeCachingDirectory.java:118) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]
	at org.elasticsearch.index.store.Store.markStoreCorrupted(Store.java:1354) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:181) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.associateNewEmptyTranslogWithIndex(SearchableSnapshotIndexEventListener.java:100) ~[?:?]
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.beforeIndexShardRecovery(SearchableSnapshotIndexEventListener.java:68) ~[?:?]
	at org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:254) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1385) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:444) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:269) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	... 8 more
[2021-02-09T15:24:12,018][WARN ][o.e.c.r.a.AllocationService] [runTask-0] failing shard [failed shard, shard [partial-restored-foo][0], node[-S4QWJmoQ7m0bDCUz-y4nQ], [P], recovery_source[snapshot recovery [o-d6zJ3_RPyfFPy6yzO7Pw] from repo2:snap2/9GwyHaIwQq2-OL8u2v72SQ], s[INITIALIZING], a[id=Wqh2UCJwS_uZIHZa_GM5zQ], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2021-02-09T22:24:11.891Z], delayed=false, details[restore_source[repo2/snap2]], allocation_status[fetching_shard_data]], expected_shard_size[477], message [failed recovery], markAsStale [true], failure [org.elasticsearch.indices.recovery.RecoveryFailedException: [partial-restored-foo][0]: Recovery failed on {runTask-0}{-S4QWJmoQ7m0bDCUz-y4nQ}{aqye9hlGSn66hYEOqSDJRQ}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}{ml.machine_memory=67405721600, xpack.installed=true, transform.node=true, testattr=test, ml.max_open_jobs=20, ml.max_jvm_size=518979584}
	at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$21(IndexShard.java:2724)
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:125)
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:274)
	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1942)
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:2671)
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:728)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: [partial-restored-foo/veKL0LHZS3WBR0o1L8Lsjw][[partial-restored-foo][0]] org.elasticsearch.index.translog.TranslogException: failed to associate a new translog
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.associateNewEmptyTranslogWithIndex(SearchableSnapshotIndexEventListener.java:107)
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.beforeIndexShardRecovery(SearchableSnapshotIndexEventListener.java:68)
	at org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:254)
	at org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1385)
	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:444)
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:269)
	... 8 more
Caused by: org.elasticsearch.index.store.ImmutableDirectoryException: file [corrupted_dR7BxgMnTAq9SfcYqbk0hg] is not mutable
	at org.elasticsearch.index.store.InMemoryNoOpCommitDirectory.ensureMutable(InMemoryNoOpCommitDirectory.java:126)
	at org.elasticsearch.index.store.InMemoryNoOpCommitDirectory.createOutput(InMemoryNoOpCommitDirectory.java:79)
	at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74)
	at org.elasticsearch.index.store.ByteSizeCachingDirectory.createOutput(ByteSizeCachingDirectory.java:118)
	at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74)
	at org.elasticsearch.index.store.Store.markStoreCorrupted(Store.java:1354)
	at org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:181)
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotIndexEventListener.associateNewEmptyTranslogWithIndex(SearchableSnapshotIndexEventListener.java:100)
	... 13 more
]]

Especially concerning is the line:

org.elasticsearch.index.store.ImmutableDirectoryException: file [corrupted_dR7BxgMnTAq9SfcYqbk0hg] is not mutable

Since it named the file as "corrupted".

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions