Cleaner Handling of Store Refcount in BlobStoreRepository #47560

original-brownbear · 2019-10-04T10:35:25Z

If a shard gets closed we properly abort its snapshot
before closing it. We should in thise case make sure to
not throw a confusing exception about trying to increment
the reference on an already closed shard in the async tasks
if the snapshot is already aborted.
Also, added an assertion to make sure that aborts are in
fact the only situation in which we run into a concurrently
closed store.

If a shard gets closed we properly abort its snapshot before closing it. We should in thise case make sure to not throw a confusing exception about trying to increment the reference on an already closed shard in the async tasks if the snapshot is already aborted. Also, added an assertion to make sure that aborts are in fact the only situation in which we run into a concurrently closed store.

elasticmachine · 2019-10-04T10:35:27Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2019-10-04T12:07:19Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

                        try {
                            if (alreadyFailed.get() == false) {
-                                snapshotFile(snapshotFileInfo, indexId, shardId, snapshotId, snapshotStatus, store);
+                                if (store.tryIncRef()) {


This whole loop is kinda awkward to begin with ... makes me wonder if we shouldn't just run this on the generic pool and makethe parallelism for snapshots configurable explicitly exactly like we do for recoveries ...

makethe parallelism for snapshots configurable explicitly exactly like we do for recoveries

Not sure to follow you :(

Sorry badly explained :)

I find this whole loop over all the files really strange. We currently create one Runnable for each file to upload individually then enqueue all the runnables. That forces us to do the strange alreadyFailed flag to not get crazy exceptions and also to increment and decrement the ref count on the store for each file individually.
It seems like it would be more correct/simpler and less hacky to simply have a queue of files and have workers pull from that queue until its empty. Then each worker can just get that reference once and we don't have to run all N tasks for N files even if the first file fails uploading.

Thanks for explaining, it makes sense but I don't see this as a requirement to merge this PR. Let's keep this in our mind for the rainy boring days ;)

Yea, this was more of a general comment to justify the weird code :)

original-brownbear · 2019-10-04T12:23:06Z

Jenkins run elasticsearch-ci/bwc

original-brownbear · 2019-10-04T12:30:40Z

@tlrx I don't think this has much/any practical impact but this is spamming test logs all the time, that's the whole motivation here :)

original-brownbear · 2019-10-04T12:54:49Z

Jenkins run elasticsearch-ci/bwc

…-handling

original-brownbear · 2019-10-04T14:45:50Z

Jenkins run elasticsearch-ci/packaging-sample

original-brownbear · 2019-10-04T17:03:34Z

Thanks Tanguy!

) If a shard gets closed we properly abort its snapshot before closing it. We should in thise case make sure to not throw a confusing exception about trying to increment the reference on an already closed shard in the async tasks if the snapshot is already aborted. Also, added an assertion to make sure that aborts are in fact the only situation in which we run into a concurrently closed store.

…47594) If a shard gets closed we properly abort its snapshot before closing it. We should in thise case make sure to not throw a confusing exception about trying to increment the reference on an already closed shard in the async tasks if the snapshot is already aborted. Also, added an assertion to make sure that aborts are in fact the only situation in which we run into a concurrently closed store.

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.5.0 labels Oct 4, 2019

shorter

ffc06b1

original-brownbear requested a review from tlrx October 4, 2019 12:05

original-brownbear commented Oct 4, 2019

View reviewed changes

tlrx approved these changes Oct 4, 2019

View reviewed changes

Merge remote-tracking branch 'elastic/master' into fix-messy-refcount…

7afe6a0

…-handling

original-brownbear merged commit 9141e05 into elastic:master Oct 4, 2019

original-brownbear deleted the fix-messy-refcount-handling branch October 4, 2019 17:03

original-brownbear mentioned this pull request Oct 4, 2019

Cleaner Handling of Store Refcount in BlobStoreRepository (#47560) #47594

Merged

original-brownbear mentioned this pull request Oct 16, 2019

Simplify Shard Snapshot Upload Code #48155

Merged

original-brownbear restored the fix-messy-refcount-handling branch August 6, 2020 18:35

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cleaner Handling of Store Refcount in BlobStoreRepository #47560

Cleaner Handling of Store Refcount in BlobStoreRepository #47560

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

elasticmachine commented Oct 4, 2019

Uh oh!

original-brownbear Oct 4, 2019

Uh oh!

tlrx Oct 4, 2019

Uh oh!

original-brownbear Oct 4, 2019 •

edited

Loading

Uh oh!

tlrx Oct 4, 2019

Uh oh!

original-brownbear Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cleaner Handling of Store Refcount in BlobStoreRepository #47560

Cleaner Handling of Store Refcount in BlobStoreRepository #47560

Uh oh!

Conversation

original-brownbear commented Oct 4, 2019

Uh oh!

elasticmachine commented Oct 4, 2019

Uh oh!

original-brownbear Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

tlrx Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear Oct 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

original-brownbear commented Oct 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

original-brownbear Oct 4, 2019 •

edited

Loading