Obey lock order if working with store to get metadata snapshots #24787

s1monw · 2017-05-19T09:04:48Z

Today when we get a metadata snapshot from the index shard we ensure
that if there is no engine started on the shard that we lock the index
writer before we go and fetch the store metadata. Yet, if we concurrently
recover that shard, recovery finalization might fail since it can't acquire
the IW lock on the directory. This is mainly due to the wrong order of acquiring
the IW lock and the metadata lock. Fetching store metadata without a started engine
should block on the metadata lock in Store.java but since IndexShard locks the writer
first we get into a failed recovery dance especially in test. In production
this is less of an issue since we rarely get into this situation if at all.

Closes #24481

Today when we get a metadata snapshot from the index shard we ensure that if there is no engine started on the shard that we lock the index writer before we go and fetch the store metadata. Yet, if we concurrently recover that shard, recovery finalization might fail since it can't acquire the IW lock on the directory. This is mainly due to the wrong order of aquiring the IW lock and the metadata lock. Fetching store metadata without a started engine should block on the metadata lock in Store.java but since IndexShard locks the writer first we get into a failed recovery dance especially in test. In production this is less of an issue since we rarely get into this siutation if at all. Closes elastic#24481

bleskes

good catch

Today when we get a metadata snapshot from the index shard we ensure that if there is no engine started on the shard that we lock the index writer before we go and fetch the store metadata. Yet, if we concurrently recover that shard, recovery finalization might fail since it can't acquire the IW lock on the directory. This is mainly due to the wrong order of aquiring the IW lock and the metadata lock. Fetching store metadata without a started engine should block on the metadata lock in Store.java but since IndexShard locks the writer first we get into a failed recovery dance especially in test. In production this is less of an issue since we rarely get into this siutation if at all. Closes #24481

Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IW lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IW lock can be still held by `snapshotStoreMetadata`. This commit makes sure to create a CheckIndex under the metadata lock. Closes elastic#24481 Relates elastic#24787

Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IndexWriter lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IndexWriter lock can be still held by method snapshotStoreMetadata. This commit makes sure to create a CheckIndex under the metadata lock. Closes #24481 Closes #27731 Relates #24787

s1monw added :Internal :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >bug v5.4.2 v5.5.0 v6.0.0 labels May 19, 2017

s1monw requested a review from bleskes May 19, 2017 09:04

fix javadoc

4dacf1e

bleskes approved these changes May 19, 2017

View reviewed changes

s1monw merged commit b17d23d into elastic:master May 19, 2017

s1monw deleted the issues/24481 branch May 19, 2017 13:36

clintongormley added v5.4.1 and removed v5.4.2 labels May 29, 2017

clintongormley added v6.0.0-beta1 and removed v6.0.0 labels Jul 25, 2017

dnhatn mentioned this pull request Dec 12, 2017

Check and repair index under the store metadata lock #27768

Merged

clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Obey lock order if working with store to get metadata snapshots #24787

Obey lock order if working with store to get metadata snapshots #24787

Uh oh!

s1monw commented May 19, 2017

Uh oh!

bleskes left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Obey lock order if working with store to get metadata snapshots #24787

Obey lock order if working with store to get metadata snapshots #24787

Uh oh!

Conversation

s1monw commented May 19, 2017

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants