Remove BlobContainer APIs that Are not Universally Applicable #42886

original-brownbear · 2019-06-05T10:20:01Z

Removing two pieces of the BlobContainer API here that don't apply well to all implementations and needlessly complicate the code:

Having both write and writeAtomic makes no sense. On Cloud providers all writes are atomic and on FS and HDFS the added cost of a move and directory fsync operation is hardly relevant.
- On the other hand, allowing for partial writes (though extremely unlikely in practice) simply has the potential of creating more garbage blobs.
Removing the fail of exists flag because it does not work on the most widely used implementation (S3) anyway
- It simply makes no sense having this flag. Either it is relevant for the stable operation of the plugin repository in which case S3 is broken by definition or it isn't in which case it's dead code.
- The actual resiliency improvement form this flag is completely minor since we use UUIDs for almost all the blobs now anyway. The flag is in fact causing a bug (see SNAPSHOT: More Resilient Writes to Blob Stores #36927 for a detailed explanation of that)
closes Master failover during snapshotting could leave the snapshot incomplete #25281

* This flag doesn't work on S3 and some NFS implementations => it shouldn't be part of the API since code must not depend on its broken semantics

…eady-exists-logic

elasticmachine · 2019-06-05T10:20:03Z

Pinging @elastic/es-distributed

original-brownbear · 2019-06-05T10:23:45Z

libs/core/src/main/java/org/elasticsearch/core/internal/io/IOUtils.java

        return unremoved;
    }

-    // TODO: replace with constants class if needed (cf. org.apache.lucene.util.Constants)


See #42883, the failure described there actually comes up easily in tests here when testing concurrent access to the snapshot repository.

…eady-exists-logic

tlrx · 2019-06-13T11:17:37Z

Concerning the first point, I agree that using a single writeBlobAtomic makes sense but I'm a bit concerned about the potential performance impact on shared file systems like NFS. Do you have any idea? Today we only use atomic writes for the repository's index generation file and doing so for all data files might have a cost.

Concerning the second point, failing to write a blob if it already exists has been added recently and provides an extra safety for concurrent snapshot finalizations which could happen if multiple snapshots are created or deleted successively. By removing this check were are now silently ignoring a potential concurrent blob modification and I don't think we should do so.

original-brownbear · 2019-06-13T14:54:30Z

@tlrx

Thanks for taking a look!

I'm a bit concerned about the potential performance impact on shared file systems like NFS. Do you have any idea?

As I tried to argue above, I'd say it doesn't matter. All we're doing is adding one operation (the move) to each write. Writing the segment blobs takes longer than writing small meta blobs in the real world (you have the request latency + the time it takes to stream the data) so at worst we're not even doubling the time per segment blob (and that would be in the weird edge case of very small segment blobs). In the real world with like 100MB+ segment blobs I don't think this matters much (especially considering that the default write rate limit is 50MB/s :)).
=> yes weird snapshots with a huge number of almost empty segment files could theoretically take twice as long but not more I think. Real world snapshots will barely see a difference if any I think.

Concerning the second point, failing to write a blob if it already exists has been added recently and provides an extra safety for concurrent snapshot finalizations which could happen if multiple snapshots are created or deleted successively. By removing this check were are now silently ignoring a potential concurrent blob modification and I don't think we should do so.

I would argue that this check doesn't work on S3 at all anyway. On NFS it's not guaranteed to work either. So yes this is an additional safety net but we can't base the functionality on it since we don't have it in the most use plugin. This is actually somewhat bad even, because the majority of our tests run against the FS repository (and the test that I unmuted failed because of this check) and will behave "quasi safer" than they are in the real world.
As far as the finalization goes, this isn't really an issue in the real world because a stuck master will simply write the exact same blob that the current master wrote concurrently. The only protection we get is for the theoretical case of a master that is stuck for the whole time it takes to run another snapshot after it got stuck. From analyzing logs for the snapshot resiliency work on cloud I can say with confidence that this has never happened on Cloud ever (like literally never :)). Another way to think about it is, that the master still has the safety net (that also works on S3) of listing the index-N blobs before finalizing so you'd have to get stuck after listing the index-N for (in the real world) 30s+ and then unstuck and write the index-N.
And even then, all you're corrupting is the index.latest content because the index-N would just have the old content => I don't think there is any value in this check at all anymore :)

tlrx · 2019-06-18T11:57:52Z

I think that this PR could be split into two different PRs and discussed separately.

* Extracted from elastic#42886 * Having both write and writeAtomic makes no sense. On Cloud providers all writes are atomic and on FS and HDFS the added cost of a move and directory fsync operation is hardly relevant. * On the other hand, allowing for partial writes (though extremely unlikely in practice) simply has the potential of creating more garbage blobs that aren't even deserializable. This change neatly ensures that every blob by a non-temp name has been fully written and fsynced (in case of NFS + HDFS).

original-brownbear · 2019-06-18T12:30:51Z

@tlrx yea hat makes sense, sorry for mixing thins up here. I extracted the removal of atomic writes to #43329 let's start there, it's probably less controversial? :)

original-brownbear · 2019-09-24T08:51:46Z

Closing this as it will become irrelevant as a result of #46250 and follow-ups (since we'll use the CS for consistency of the repository contents) and we can simply remove these methods once they become unused.

original-brownbear added 10 commits June 4, 2019 09:10

Remove Misleading FailIfAlreadyExists Flag from BlobContainer Writes

52d6c2e

* This flag doesn't work on S3 and some NFS implementations => it shouldn't be part of the API since code must not depend on its broken semantics

Merge remote-tracking branch 'elastic/master' into remove-fail-if-alr…

cca29fd

…eady-exists-logic

cleaner

2e18c7a

cleaner

687b118

cleaner

fb412a7

Remove assertion on fsync dir ioe

2204f04

Merge remote-tracking branch 'elastic/master' into remove-fail-if-alr…

0628e6a

…eady-exists-logic

nicer

8f650c7

Merge remote-tracking branch 'elastic/master' into remove-fail-if-alr…

cccfc3e

…eady-exists-logic

fix broken test

d3a3c3b

original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >refactoring v8.0.0 v7.3.0 labels Jun 5, 2019

original-brownbear mentioned this pull request Jun 5, 2019

SNAPSHOT: More Resilient Writes to Blob Stores #36927

Closed

original-brownbear commented Jun 5, 2019

View reviewed changes

original-brownbear added 2 commits June 5, 2019 22:09

Merge remote-tracking branch 'elastic/master' into remove-fail-if-alr…

74c656d

…eady-exists-logic

fix merge conflict

06a9285

original-brownbear requested review from andrershov, tlrx and ywelsch June 6, 2019 07:38

original-brownbear mentioned this pull request Jun 18, 2019

Remove Specific Atomic Write Method from BlobContainer #43329

Closed

jpountz added v7.4.0 and removed v7.3.0 labels Jul 3, 2019

colings86 added v7.5.0 and removed v7.4.0 labels Aug 30, 2019

original-brownbear closed this Sep 24, 2019

original-brownbear removed :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >refactoring v7.5.0 v8.0.0 labels Sep 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove BlobContainer APIs that Are not Universally Applicable #42886

Remove BlobContainer APIs that Are not Universally Applicable #42886

Uh oh!

original-brownbear commented Jun 5, 2019

Uh oh!

elasticmachine commented Jun 5, 2019

Uh oh!

original-brownbear Jun 5, 2019 •

edited

Loading

Uh oh!

tlrx commented Jun 13, 2019

Uh oh!

original-brownbear commented Jun 13, 2019

Uh oh!

tlrx commented Jun 18, 2019

Uh oh!

original-brownbear commented Jun 18, 2019

Uh oh!

original-brownbear commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Remove BlobContainer APIs that Are not Universally Applicable #42886

Remove BlobContainer APIs that Are not Universally Applicable #42886

Uh oh!

Conversation

original-brownbear commented Jun 5, 2019

Uh oh!

elasticmachine commented Jun 5, 2019

Uh oh!

original-brownbear Jun 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented Jun 13, 2019

Uh oh!

original-brownbear commented Jun 13, 2019

Uh oh!

tlrx commented Jun 18, 2019

Uh oh!

original-brownbear commented Jun 18, 2019

Uh oh!

original-brownbear commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

original-brownbear Jun 5, 2019 •

edited

Loading