-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Tighten up concurrent store metadata listing and engine writes #19684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@imotov can you please take a look at the snapshot and restore parts? I also think it's good to wait for @s1monw to have a look too, but it's good start iterating now. |
| synchronized (mutex) { | ||
| // if the engine is not running, we can access the store directly, but we need to make sure no one starts | ||
| // the engine on us. If the engine is running, we can get a snapshot via the deletion policy which is initialized. | ||
| // That can be done out of mutex, since the engine can be closed half way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm but what happens if the engine is closed just before, or while, we call deletionPolicy.snapshot() below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the deletion policy is safe to use then, but I agree this is all very icky, but at least it's in one place now so we can improve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK indeed I looked at SnapshotDeletionPolicy and it looks OK if you pull a snapshot, and then IW closes (you get the last commit before IW closed, and any files it references will remain existing until the next IW is created), or if IW closes and you pull a snapshot (you get whatever IW committed on close). And its methods are sync'd.
|
Thanks @bleskes ... I like the rename. I left some naive questions about concurrent behaviors of |
|
Thanks @bleskes, LGTM. |
|
@bleskes the snapshot/restore part looks reasonable. Thanks! |
| // That can be done out of mutex, since the engine can be closed half way. | ||
| Engine engine = getEngineOrNull(); | ||
| if (engine == null) { | ||
| store.incRef(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we incRef this store in both cases can we do it outside of the mutex instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
I left some comments LGTM in general thanks @bleskes |
|
Thx @s1monw . I pushed an update |
|
LGTM |
* master: Fix REST test documentation [Test] move methods from bwc test to test package for use in plugins (elastic#19738) package-info.java should be in src/main only. Split regular histograms from date histograms. elastic#19551 Tighten up concurrent store metadata listing and engine writes (elastic#19684) Plugins: Make NamedWriteableRegistry immutable and add extenion point for named writeables Add documentation for the 'elasticsearch-translog' tool [TEST] Increase time waiting for all shards to move off/on to a node Fixes the active shard count check in the case of (elastic#19760) Fixes cat tasks operation in detailed mode ignore some docker craziness in scccomp environment checks
In several places in our code we need to get a consistent list of files + metadata of the current index. We currently have a couple of ways to do in the
Storeclass, which also does the right things and tries to verify the integrity of the smaller files. Sadly, those methods can run into trouble if anyone writes into the folder while they are busy. Most notably, the index shard's engine decides to commit half way and remove asegment_Nfile before the store got to checksum (but did already list it). This race condition typically doesn't happen as almost all of the places where we list files also happen to be places where the relevant shard doesn't yet have an engine. There is however an exception (of course :)) which is the API to list shard stores, used by the master when it is looking for shard copies to assign to.I already took one shot at fixing this in #19416 , but it turns out not to be enough - see for example https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/822.
The first inclination to fix this was to add more locking to the different Store methods and acquire the
IndexWriterlock, thus preventing any engine for accessing if if the a shard is offline and use the current index commit snapshotting logic already existing inIndexShardfor when the engine is started. That turned out to be a bad idea as we create more subtleties where, for example, a store listing can prevent a shard from starting up (the writer lock doesn't wait if it can't get access, but fails immediately, which is good). Another example is running on a shared directory where some other engine may actually hold the lock.Instead I decided to take another approach:
snapshotStoremethod to IndexShard that takes care of all the concurrency aspects with the engine, which is now possible because it's all in the same place. It's still a bit ugly but at least it's all in one place and we can evaluate how to improve on this later on. I also renamed thesnapshotIndexmethod toacquireIndexCommitto avoid confusion and I think it communicates better what it does.