-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Obey lock order if working with store to get metadata snapshots #24787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Today when we get a metadata snapshot from the index shard we ensure that if there is no engine started on the shard that we lock the index writer before we go and fetch the store metadata. Yet, if we concurrently recover that shard, recovery finalization might fail since it can't acquire the IW lock on the directory. This is mainly due to the wrong order of aquiring the IW lock and the metadata lock. Fetching store metadata without a started engine should block on the metadata lock in Store.java but since IndexShard locks the writer first we get into a failed recovery dance especially in test. In production this is less of an issue since we rarely get into this siutation if at all. Closes elastic#24481
bleskes
approved these changes
May 19, 2017
Contributor
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
s1monw
added a commit
that referenced
this pull request
May 19, 2017
Today when we get a metadata snapshot from the index shard we ensure that if there is no engine started on the shard that we lock the index writer before we go and fetch the store metadata. Yet, if we concurrently recover that shard, recovery finalization might fail since it can't acquire the IW lock on the directory. This is mainly due to the wrong order of aquiring the IW lock and the metadata lock. Fetching store metadata without a started engine should block on the metadata lock in Store.java but since IndexShard locks the writer first we get into a failed recovery dance especially in test. In production this is less of an issue since we rarely get into this siutation if at all. Closes #24481
s1monw
added a commit
that referenced
this pull request
May 19, 2017
Today when we get a metadata snapshot from the index shard we ensure that if there is no engine started on the shard that we lock the index writer before we go and fetch the store metadata. Yet, if we concurrently recover that shard, recovery finalization might fail since it can't acquire the IW lock on the directory. This is mainly due to the wrong order of aquiring the IW lock and the metadata lock. Fetching store metadata without a started engine should block on the metadata lock in Store.java but since IndexShard locks the writer first we get into a failed recovery dance especially in test. In production this is less of an issue since we rarely get into this siutation if at all. Closes #24481
dnhatn
added a commit
to dnhatn/elasticsearch
that referenced
this pull request
Dec 12, 2017
Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IW lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IW lock can be still held by `snapshotStoreMetadata`. This commit makes sure to create a CheckIndex under the metadata lock. Closes elastic#24481 Relates elastic#24787
dnhatn
added a commit
that referenced
this pull request
Dec 20, 2017
Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IndexWriter lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IndexWriter lock can be still held by method snapshotStoreMetadata. This commit makes sure to create a CheckIndex under the metadata lock. Closes #24481 Closes #27731 Relates #24787
dnhatn
added a commit
that referenced
this pull request
Dec 20, 2017
Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IndexWriter lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IndexWriter lock can be still held by method snapshotStoreMetadata. This commit makes sure to create a CheckIndex under the metadata lock. Closes #24481 Closes #27731 Relates #24787
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Distributed Indexing/Engine
Anything around managing Lucene and the Translog in an open shard.
v5.4.1
v5.5.0
v6.0.0-beta1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Today when we get a metadata snapshot from the index shard we ensure
that if there is no engine started on the shard that we lock the index
writer before we go and fetch the store metadata. Yet, if we concurrently
recover that shard, recovery finalization might fail since it can't acquire
the IW lock on the directory. This is mainly due to the wrong order of acquiring
the IW lock and the metadata lock. Fetching store metadata without a started engine
should block on the metadata lock in Store.java but since IndexShard locks the writer
first we get into a failed recovery dance especially in test. In production
this is less of an issue since we rarely get into this situation if at all.
Closes #24481