Tighten up concurrent store metadata listing and engine writes #19684

bleskes · 2016-07-29T12:44:38Z

In several places in our code we need to get a consistent list of files + metadata of the current index. We currently have a couple of ways to do in the Store class, which also does the right things and tries to verify the integrity of the smaller files. Sadly, those methods can run into trouble if anyone writes into the folder while they are busy. Most notably, the index shard's engine decides to commit half way and remove a segment_N file before the store got to checksum (but did already list it). This race condition typically doesn't happen as almost all of the places where we list files also happen to be places where the relevant shard doesn't yet have an engine. There is however an exception (of course :)) which is the API to list shard stores, used by the master when it is looking for shard copies to assign to.

I already took one shot at fixing this in #19416 , but it turns out not to be enough - see for example https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/822.

The first inclination to fix this was to add more locking to the different Store methods and acquire the IndexWriter lock, thus preventing any engine for accessing if if the a shard is offline and use the current index commit snapshotting logic already existing in IndexShard for when the engine is started. That turned out to be a bad idea as we create more subtleties where, for example, a store listing can prevent a shard from starting up (the writer lock doesn't wait if it can't get access, but fails immediately, which is good). Another example is running on a shared directory where some other engine may actually hold the lock.

Instead I decided to take another approach:

Remove all the various methods on store and keep one, which accepts an index commit (which can be null) and also clearly communicates that the caller is responsible for concurrent access. This also tightens up the API which is a plus.
Add a snapshotStore method to IndexShard that takes care of all the concurrency aspects with the engine, which is now possible because it's all in the same place. It's still a bit ugly but at least it's all in one place and we can evaluate how to improve on this later on. I also renamed the snapshotIndex method to acquireIndexCommit to avoid confusion and I think it communicates better what it does.

…cess

bleskes · 2016-07-29T12:45:57Z

@imotov can you please take a look at the snapshot and restore parts?
@mikemccand we talked about this a bit, so can I ask you to look from a lucene perspective?

I also think it's good to wait for @s1monw to have a look too, but it's good start iterating now.

mikemccand · 2016-07-29T13:31:03Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+        synchronized (mutex) {
+            // if the engine is not running, we can access the store directly, but we need to make sure no one starts
+            // the engine on us. If the engine is running, we can get a snapshot via the deletion policy which is initialized.
+            // That can be done out of mutex, since the engine can be closed half way.


Hmm but what happens if the engine is closed just before, or while, we call deletionPolicy.snapshot() below?

I think the deletion policy is safe to use then, but I agree this is all very icky, but at least it's in one place now so we can improve.

OK indeed I looked at SnapshotDeletionPolicy and it looks OK if you pull a snapshot, and then IW closes (you get the last commit before IW closed, and any files it references will remain existing until the next IW is created), or if IW closes and you pull a snapshot (you get whatever IW committed on close). And its methods are sync'd.

mikemccand · 2016-07-29T13:44:39Z

Thanks @bleskes ... I like the rename. I left some naive questions about concurrent behaviors of IndexShard ...

mikemccand · 2016-07-29T14:24:04Z

Thanks @bleskes, LGTM.

imotov · 2016-07-30T01:56:10Z

@bleskes the snapshot/restore part looks reasonable. Thanks!

s1monw · 2016-08-02T10:21:19Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+            // That can be done out of mutex, since the engine can be closed half way.
+            Engine engine = getEngineOrNull();
+            if (engine == null) {
+                store.incRef();


since we incRef this store in both cases can we do it outside of the mutex instead?

s1monw · 2016-08-02T10:42:14Z

I left some comments LGTM in general thanks @bleskes

bleskes · 2016-08-02T21:11:29Z

Thx @s1monw . I pushed an update

s1monw · 2016-08-02T21:22:23Z

LGTM

* master: Fix REST test documentation [Test] move methods from bwc test to test package for use in plugins (elastic#19738) package-info.java should be in src/main only. Split regular histograms from date histograms. elastic#19551 Tighten up concurrent store metadata listing and engine writes (elastic#19684) Plugins: Make NamedWriteableRegistry immutable and add extenion point for named writeables Add documentation for the 'elasticsearch-translog' tool [TEST] Increase time waiting for all shards to move off/on to a node Fixes the active shard count check in the case of (elastic#19760) Fixes cat tasks operation in detailed mode ignore some docker craziness in scccomp environment checks

bleskes added 12 commits July 27, 2016 17:48

add locking to store access

dbe0f01

fix some tests

5eca202

more logging

ae49df4

add a direct access to the store, so engine maybe open or close

f4e2122

tests ands some fixing

c1981e9

fix shardow recovery

609bb3b

move recovery target service to new indexShard.snapshotStore

6c694f5

sigh

b5f75b4

sigh2

852b5ac

reduce unsafe methods in Store

7bb6085

fix npe

a055615

Merge remote-tracking branch 'upstream/master' into store_metadata_ac…

ebf5231

…cess

bleskes added >bug :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. v5.0.0-beta1 labels Jul 29, 2016

mikemccand reviewed Jul 29, 2016
View reviewed changes

to @mikemccand with love

ff9df4c

s1monw reviewed Aug 2, 2016
View reviewed changes

bleskes added 4 commits August 2, 2016 22:49

feedback

18dad67

merge from master

66bc91d

rewrite the right thing

a4b275d

add UOE to ShadowIndexshards

dec2a26

bleskes merged commit f6aeb35 into elastic:master Aug 3, 2016

bleskes deleted the store_metadata_access branch August 3, 2016 06:34

clintongormley added v5.0.0-alpha5 and removed v5.0.0-beta1 labels Aug 4, 2016

clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018

Tighten up concurrent store metadata listing and engine writes #19684

Tighten up concurrent store metadata listing and engine writes #19684

Uh oh!

Conversation

bleskes commented Jul 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bleskes commented Jul 29, 2016

Uh oh!

mikemccand Jul 29, 2016

Choose a reason for hiding this comment

Uh oh!

bleskes Jul 29, 2016

Choose a reason for hiding this comment

Uh oh!

mikemccand Jul 29, 2016

Choose a reason for hiding this comment

Uh oh!

mikemccand commented Jul 29, 2016

Uh oh!

mikemccand commented Jul 29, 2016

Uh oh!

imotov commented Jul 30, 2016

Uh oh!

s1monw Aug 2, 2016

Choose a reason for hiding this comment

Uh oh!

bleskes Aug 2, 2016

Choose a reason for hiding this comment

Uh oh!

s1monw commented Aug 2, 2016

Uh oh!

bleskes commented Aug 2, 2016

Uh oh!

s1monw commented Aug 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bleskes commented Jul 29, 2016 •

edited

Loading