Skip to content

Conversation

@bleskes
Copy link
Contributor

@bleskes bleskes commented Jul 29, 2016

In several places in our code we need to get a consistent list of files + metadata of the current index. We currently have a couple of ways to do in the Store class, which also does the right things and tries to verify the integrity of the smaller files. Sadly, those methods can run into trouble if anyone writes into the folder while they are busy. Most notably, the index shard's engine decides to commit half way and remove a segment_N file before the store got to checksum (but did already list it). This race condition typically doesn't happen as almost all of the places where we list files also happen to be places where the relevant shard doesn't yet have an engine. There is however an exception (of course :)) which is the API to list shard stores, used by the master when it is looking for shard copies to assign to.

I already took one shot at fixing this in #19416 , but it turns out not to be enough - see for example https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/822.

The first inclination to fix this was to add more locking to the different Store methods and acquire the IndexWriter lock, thus preventing any engine for accessing if if the a shard is offline and use the current index commit snapshotting logic already existing in IndexShard for when the engine is started. That turned out to be a bad idea as we create more subtleties where, for example, a store listing can prevent a shard from starting up (the writer lock doesn't wait if it can't get access, but fails immediately, which is good). Another example is running on a shared directory where some other engine may actually hold the lock.

Instead I decided to take another approach:

  1. Remove all the various methods on store and keep one, which accepts an index commit (which can be null) and also clearly communicates that the caller is responsible for concurrent access. This also tightens up the API which is a plus.
  2. Add a snapshotStore method to IndexShard that takes care of all the concurrency aspects with the engine, which is now possible because it's all in the same place. It's still a bit ugly but at least it's all in one place and we can evaluate how to improve on this later on. I also renamed the snapshotIndex method to acquireIndexCommit to avoid confusion and I think it communicates better what it does.

@bleskes bleskes added >bug :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. v5.0.0-beta1 labels Jul 29, 2016
@bleskes
Copy link
Contributor Author

bleskes commented Jul 29, 2016

@imotov can you please take a look at the snapshot and restore parts?
@mikemccand we talked about this a bit, so can I ask you to look from a lucene perspective?

I also think it's good to wait for @s1monw to have a look too, but it's good start iterating now.

synchronized (mutex) {
// if the engine is not running, we can access the store directly, but we need to make sure no one starts
// the engine on us. If the engine is running, we can get a snapshot via the deletion policy which is initialized.
// That can be done out of mutex, since the engine can be closed half way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm but what happens if the engine is closed just before, or while, we call deletionPolicy.snapshot() below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the deletion policy is safe to use then, but I agree this is all very icky, but at least it's in one place now so we can improve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK indeed I looked at SnapshotDeletionPolicy and it looks OK if you pull a snapshot, and then IW closes (you get the last commit before IW closed, and any files it references will remain existing until the next IW is created), or if IW closes and you pull a snapshot (you get whatever IW committed on close). And its methods are sync'd.

@mikemccand
Copy link
Contributor

Thanks @bleskes ... I like the rename. I left some naive questions about concurrent behaviors of IndexShard ...

@mikemccand
Copy link
Contributor

Thanks @bleskes, LGTM.

@imotov
Copy link
Contributor

imotov commented Jul 30, 2016

@bleskes the snapshot/restore part looks reasonable. Thanks!

// That can be done out of mutex, since the engine can be closed half way.
Engine engine = getEngineOrNull();
if (engine == null) {
store.incRef();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we incRef this store in both cases can we do it outside of the mutex instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@s1monw
Copy link
Contributor

s1monw commented Aug 2, 2016

I left some comments LGTM in general thanks @bleskes

@bleskes
Copy link
Contributor Author

bleskes commented Aug 2, 2016

Thx @s1monw . I pushed an update

@s1monw
Copy link
Contributor

s1monw commented Aug 2, 2016

LGTM

@bleskes bleskes merged commit f6aeb35 into elastic:master Aug 3, 2016
@bleskes bleskes deleted the store_metadata_access branch August 3, 2016 06:34
jasontedor added a commit to jaymode/elasticsearch that referenced this pull request Aug 3, 2016
* master:
  Fix REST test documentation
  [Test] move methods from bwc test to test package for use in plugins (elastic#19738)
  package-info.java should be in src/main only.
  Split regular histograms from date histograms. elastic#19551
  Tighten up concurrent store metadata listing and engine writes (elastic#19684)
  Plugins: Make NamedWriteableRegistry immutable and add extenion point for named writeables
  Add documentation for the 'elasticsearch-translog' tool
  [TEST] Increase time waiting for all shards to move off/on to a node
  Fixes the active shard count check in the case of (elastic#19760)
  Fixes cat tasks operation in detailed mode
  ignore some docker craziness in scccomp environment checks
@clintongormley clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v5.0.0-alpha5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants