Removed duplicate deleteBlob methods #18813

gfyoung · 2016-06-10T10:39:07Z

Title is self-explanatory. Closes #18529.

abeyad · 2016-06-10T15:02:57Z

...src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java

Since we know the individual file here, we could give a more descriptive error message by saying "error deleting index file [" + blobName + "] during cleanup"

Ah, yes, given the lambda format, that certainly makes more sense. Done.

abeyad · 2016-06-10T15:06:33Z

@gfyoung I reviewed the code and it looks good, I put one minor comment. Also, it would be better to have a more descriptive commit message. The title for the commit message is good, but there should be a body for the commit message enumerating the two methods that were removed from the interface and why they were removed (i.e. explaining that its a cleaner interface to have just one method, and we gain nothing by the other two methods as they don't afford us atomic deletes anyway).

Thanks for taking this on!

gfyoung · 2016-06-10T15:15:54Z

@abeyad : Certainly. I've updated the commit message for the duplicate deleteBlob method removal, and added a more descriptive error message as requested. Ready to merge if there are no other concerns.

abeyad · 2016-06-10T15:19:45Z

LGTM

tlrx · 2016-06-10T18:54:36Z

...cs/src/main/java/org/elasticsearch/common/blobstore/gcs/GoogleCloudStorageBlobContainer.java

I think that there is now dead code behind this. Can you please verify?

I think I removed all of the dead code that I could possibly remove (see bottom of file).

I'm talking about GoogleCloudStorageBlobStore.deleteBlobs() method. If it's not used anymore in the BlobContainer I think it is useless now.

deleteBlobs is called by deleteBlobsByPrefix which in turn is called by delete. I suppose I could refactor to put everything in under delete?

tlrx · 2016-06-10T18:54:54Z

I left a comment.

s1monw · 2016-06-10T18:59:32Z

...src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java

can we please stick to the for loop. We should be consistent in our loops unless it's really necessary or beneficial

s1monw · 2016-06-10T19:01:03Z

awesome cleanup - I think we need a note in the migration guide and should deprecate these methods in 2.x @abeyad can you take care of this once it's in?

abeyad · 2016-06-10T19:03:04Z

@s1monw will do

jasontedor · 2016-06-10T19:04:19Z

...src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java

Why are we swallowing this? I know that's the way that it was before, but is it right?

@imotov could you shed some light on why we swallow the IOException here for data files?

I think a future enhancement (not part of this PR) should be to throw the IOException and make the callers handle it. Unless there is a compelling reason not to do it that way.

imotov · 2016-06-10T19:36:24Z

-1. As @tlrx said before this will have a significant negative impact on performance of the delete operation on slow repositories. I think we should instead take advantage batch deletes on S3 the same way as we do it on GCE. Snapshot deletion can lead to potential deletion of a large number of files at ones and we will have to wait for roundtrip for each operation before preceding to delete the next file.

s1monw · 2016-06-10T19:46:31Z

-1. As @tlrx said before this will have a significant negative impact on performance of the delete operation on slow repositories. I think we should instead take advantage batch deletes on S3 the same way as we do it on GCE. Snapshot deletion can lead to potential deletion of a large number of files at ones and we will have to wait for roundtrip for each operation before preceding to delete the next file.

what is significant? 1 second per file? 100ms per fiel? I think API simplicity should be preferred compared to batching. The interfaces we use for S/R are extremely polluted with optimizations I really wonder if it's worth it.

imotov · 2016-06-10T20:31:02Z

what is significant?

I would expect the latency for the delete request to be anywhere between 30ms and 300ms per file depending on location. Assuming, that we delete a snapshot with 100 shards, 20 files per shard with average latency of 50ms per file, we are talking about the difference between 5 sec and 1 minute 40 sec to delete the snapshot.

tlrx · 2016-06-13T06:16:32Z

-1. As @tlrx said before this will have a significant negative impact on performance of the delete operation on slow repositories.

@imotov That's right, I agree with this statement. But what convinced me to remove the multiple variations of deleteBlob is that batch deletions make exception and error handling complex. Now we have a Task API I think it is OK to have longer snapshot deletion as long as we can regularly check the progress.

imotov · 2016-06-13T13:55:14Z

@tlrx not sure I see how Task API would help with potential 10-20 times slow down in snapshot deletion speed. We can get the performance back by running deletes in parallel but that will increase complexity on other layers and might start triggering throttling on some providers that are not happy with too many small simultaneous requests rushing in. I don't see how this is a good change for S3 and GCE plugins, but if you and @dadoonet like it, I am not going to argue.

dadoonet · 2016-06-13T19:48:55Z

I was looking at this today. Actually this deleteBlobs() method has been added because of #12697.
If we remove this method, we can close #12697 as it won't be implemented.

We can remove deleteBlobs for now and reimplement if we need to optimize at some point. It will be easier than years ago because plugins and core code lives now in the same repo.

So I'm not against this cleanup.

gfyoung · 2016-06-15T23:28:03Z

Okay, let's quickly summarise what's going on here (easier for myself to understand):

@abeyad proposed the change to simplify the interface here and has given the green light to this PR

@imotov had concerns about latency for S3 and GCE but would not object if @tlrx and @dadoonet both gave the green light to this PR

@tlrx did not believe the latency impact would be too significant to outweigh the benefits of a simplified interface and AFAIU has approved this change

@dadoonet also appears to have given the green light for this change AFAIU

So...can we merge this ❓

tlrx · 2016-06-16T06:09:29Z

@tlrx did not believe the latency impact would be too significant and AFAIU has approved this change

Sorry, that's not what I said. I agree with @imotov and I also think that this will have a significant negative impact on performance. On the other hand, Ali, Simon and Robert have the argument that this change make the API cleaner and easier to maintain. I do agree with this too, as well as error handling in batch requests can be tricky to handle correctly too. These points make me think that we can sacrifice some perf at snapshot deletion time in favor of cleaner code if that make our life (and user's life too) better. That's why I'm OK with this change.

gfyoung · 2016-06-16T06:19:13Z

@tlrx : updated my comment to clarify based on what you said

My question still stands though.

s1monw · 2016-06-20T08:02:50Z

maybe we can add a compromise here and try to interpret a wildcard suffix. If somebody calls delete("foo*") we try to fetch all files matching the prefix / wildcard and delete them? This would simplify the interface and we can make batching an impl detail?

abeyad · 2016-06-20T14:08:21Z

+1

On Mon, Jun 20, 2016 at 4:03 AM, Simon Willnauer [email protected]
wrote:

maybe we can add a compromise here and try to interpret a wildcard suffix.
If somebody calls delete("foo*") we try to fetch all files matching the
prefix / wildcard and delete them? This would simplify the interface and we
can make batching an impl detail?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#18813 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABjkQdbQMqLKI1iNJ04JkBAp9yAsXxXcks5qNklngaJpZM4Iy1Kv
.

gfyoung · 2016-06-21T08:02:48Z

@s1monw : integrating deleteBlobsByPrefix functionality into deleteBlobs sounds like an interesting idea. Depending on how much traction it gets, I wonder though if it might be best served for a separate PR as follow-up?

abeyad · 2016-06-27T21:08:06Z

@gfyoung Sorry I just realized you didn't get a response to this. Since we are doing the cleanup in this PR, I think it makes more sense to have a deleteBlob with potentially a wildcard prefix string passed in for this PR, because then we won't be getting rid of the bulk updates code, only to reinsert it again in another PR. Does that make sense?

gfyoung · 2016-06-30T01:52:33Z

@abeyad : if you read your explanation again, you didn't really provide justification for doing it now.

On second look, while the idea proposed by @s1monw is interesting, if you look at how deleteBlobsByPrefix is implemented in AbstractBlobContainer, you can see that it's really just a for-loop in most cases that calls deleteBlob in the process. Suddenly AFAICT, integration is not so simple.

Unless we have functionality that can do regex deletions (which I don't think we can do across ALL implementations), essentially we will have to go through some gymnastics of refactoring the current deleteBlob code to make sure that it checks whether we have a wildcard and then somehow integrate a for-loop?

Consequently, my thought process is shifting towards just removing deleteBlobs but leaving deleteBlobsByPrefix unless a nice, more suitable solution can be proposed instead of the one I just described.

s1monw · 2016-06-30T07:04:57Z

@imotov will this solve you objection ^^

imotov · 2016-07-06T19:30:24Z

Sorry for the delay, I missed this ping somehow. I think deleteBlobsByPrefix is a leftover from the times when a blob store was used by gateways and it's not actually used by snapshot/restore. Therefore I don't see how we would be able to take advantage of support for wildcards in deleteBlob unless it also supports listing blobs using comma-separated syntax or full regex syntax, which we could use to list multiple blob files in a single call.

Anyway, we just doubled the time and increased by 10-50 times the price of a snapshot deletion on S3 by implementing file existence check in #18815 for a similar gain in resiliency. So, my objections here over loosing potential savings on S3 don't seem to make much sense any more.

gfyoung · 2016-07-06T19:41:18Z

@imotov: Actually, #18815 (also my PR) got reverted but will be re-merged once a nagging test failure has been fixed. Not sure if that changes your stance.

IINM, are you no longer objecting to the deletions (i.e. we can even remove the byPrefix method as has already been done)?

Removed the following methods from the BlobContainer interface: 1) deleteBlobs 2) deleteBlobsByPrefix These removals help to clean up the interface. In addition, these methods were not very useful because they did not allow for atomic deletions. Closes gh-18529.

…nalize

gfyoung · 2016-07-13T14:34:08Z

@imotov : any response to my comments?

@everyone : can this be merged if there are no other concerns?

abeyad · 2016-07-13T17:43:31Z

Since none of the current blob container implementations take advantage of batching with deleteBlobsByPrefix, the only one that will is @tlrx 's Google implementation. As discussed with @s1monw, we will merge this PR now and @tlrx can add an optional wildcard suffix to the single deleteBlob interface so batching can be done under the hood in working on the Google storage blob container implementation.

abeyad · 2016-07-13T19:58:00Z

@gfyoung thanks for your work on this!

abeyad reviewed Jun 10, 2016
View reviewed changes

tlrx reviewed Jun 10, 2016
View reviewed changes

s1monw reviewed Jun 10, 2016
View reviewed changes

s1monw added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v5.0.0-alpha4 labels Jun 10, 2016

jasontedor reviewed Jun 10, 2016
View reviewed changes

clintongormley added v5.0.0-alpha5 and removed v5.0.0-alpha4 labels Jun 22, 2016

gfyoung added 3 commits July 13, 2016 10:33

Added more descriptive error msg for BlobStoreIndexShardRepository.fi…

66d895c

…nalize

Reverted lambda's to for-loops

83c4118

abeyad merged commit 3f2e106 into elastic:master Jul 13, 2016

gfyoung deleted the dedup-delete-blob branch July 14, 2016 04:05

dadoonet mentioned this pull request Jul 21, 2016

Optimize AzureBlobStore#delete method #12697

Closed

clintongormley added the >non-issue label Jul 29, 2016

abeyad mentioned this pull request Aug 1, 2016

Adds deprecation notices on removed BlobContainer methods #19729

Merged

Removed duplicate deleteBlob methods #18813

Removed duplicate deleteBlob methods #18813

Uh oh!

Conversation

gfyoung commented Jun 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abeyad commented Jun 10, 2016

Uh oh!

gfyoung commented Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abeyad commented Jun 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented Jun 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s1monw commented Jun 10, 2016

Uh oh!

abeyad commented Jun 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imotov commented Jun 10, 2016

Uh oh!

s1monw commented Jun 10, 2016

Uh oh!

imotov commented Jun 10, 2016

Uh oh!

tlrx commented Jun 13, 2016

Uh oh!

imotov commented Jun 13, 2016

Uh oh!

dadoonet commented Jun 13, 2016

Uh oh!

gfyoung commented Jun 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlrx commented Jun 16, 2016

Uh oh!

gfyoung commented Jun 16, 2016

Uh oh!

s1monw commented Jun 20, 2016

Uh oh!

abeyad commented Jun 20, 2016

Uh oh!

gfyoung commented Jun 21, 2016

Uh oh!

abeyad commented Jun 27, 2016

Uh oh!

gfyoung commented Jun 30, 2016

Uh oh!

s1monw commented Jun 30, 2016

Uh oh!

imotov commented Jul 6, 2016

Uh oh!

gfyoung commented Jul 6, 2016

Uh oh!

gfyoung commented Jul 13, 2016

Uh oh!

abeyad commented Jul 13, 2016

Uh oh!

abeyad commented Jul 13, 2016

Uh oh!

gfyoung commented Jun 10, 2016 •

edited

Loading

gfyoung commented Jun 15, 2016 •

edited

Loading