-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Removed duplicate deleteBlob methods #18813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know the individual file here, we could give a more descriptive error message by saying "error deleting index file [" + blobName + "] during cleanup"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, given the lambda format, that certainly makes more sense. Done.
|
@gfyoung I reviewed the code and it looks good, I put one minor comment. Also, it would be better to have a more descriptive commit message. The title for the commit message is good, but there should be a body for the commit message enumerating the two methods that were removed from the interface and why they were removed (i.e. explaining that its a cleaner interface to have just one method, and we gain nothing by the other two methods as they don't afford us atomic deletes anyway). Thanks for taking this on! |
|
@abeyad : Certainly. I've updated the commit message for the duplicate |
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is now dead code behind this. Can you please verify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I removed all of the dead code that I could possibly remove (see bottom of file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about GoogleCloudStorageBlobStore.deleteBlobs() method. If it's not used anymore in the BlobContainer I think it is useless now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleteBlobs is called by deleteBlobsByPrefix which in turn is called by delete. I suppose I could refactor to put everything in under delete?
|
I left a comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please stick to the for loop. We should be consistent in our loops unless it's really necessary or beneficial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
awesome cleanup - I think we need a note in the migration guide and should deprecate these methods in 2.x @abeyad can you take care of this once it's in? |
|
@s1monw will do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we swallowing this? I know that's the way that it was before, but is it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imotov could you shed some light on why we swallow the IOException here for data files?
I think a future enhancement (not part of this PR) should be to throw the IOException and make the callers handle it. Unless there is a compelling reason not to do it that way.
|
-1. As @tlrx said before this will have a significant negative impact on performance of the delete operation on slow repositories. I think we should instead take advantage batch deletes on S3 the same way as we do it on GCE. Snapshot deletion can lead to potential deletion of a large number of files at ones and we will have to wait for roundtrip for each operation before preceding to delete the next file. |
what is significant? 1 second per file? 100ms per fiel? I think API simplicity should be preferred compared to batching. The interfaces we use for S/R are extremely polluted with optimizations I really wonder if it's worth it. |
I would expect the latency for the delete request to be anywhere between 30ms and 300ms per file depending on location. Assuming, that we delete a snapshot with 100 shards, 20 files per shard with average latency of 50ms per file, we are talking about the difference between 5 sec and 1 minute 40 sec to delete the snapshot. |
@imotov That's right, I agree with this statement. But what convinced me to remove the multiple variations of |
|
@tlrx not sure I see how Task API would help with potential 10-20 times slow down in snapshot deletion speed. We can get the performance back by running deletes in parallel but that will increase complexity on other layers and might start triggering throttling on some providers that are not happy with too many small simultaneous requests rushing in. I don't see how this is a good change for S3 and GCE plugins, but if you and @dadoonet like it, I am not going to argue. |
|
I was looking at this today. Actually this We can remove deleteBlobs for now and reimplement if we need to optimize at some point. It will be easier than years ago because plugins and core code lives now in the same repo. So I'm not against this cleanup. |
|
Okay, let's quickly summarise what's going on here (easier for myself to understand): @abeyad proposed the change to simplify the interface here and has given the green light to this PR @imotov had concerns about latency for S3 and GCE but would not object if @tlrx and @dadoonet both gave the green light to this PR @tlrx did not believe the latency impact would be too significant to outweigh the benefits of a simplified interface and AFAIU has approved this change @dadoonet also appears to have given the green light for this change AFAIU So...can we merge this ❓ |
Sorry, that's not what I said. I agree with @imotov and I also think that this will have a significant negative impact on performance. On the other hand, Ali, Simon and Robert have the argument that this change make the API cleaner and easier to maintain. I do agree with this too, as well as error handling in batch requests can be tricky to handle correctly too. These points make me think that we can sacrifice some perf at snapshot deletion time in favor of cleaner code if that make our life (and user's life too) better. That's why I'm OK with this change. |
|
@tlrx : updated my comment to clarify based on what you said My question still stands though. |
|
maybe we can add a compromise here and try to interpret a wildcard suffix. If somebody calls |
|
+1 On Mon, Jun 20, 2016 at 4:03 AM, Simon Willnauer [email protected]
|
|
@s1monw : integrating |
|
@gfyoung Sorry I just realized you didn't get a response to this. Since we are doing the cleanup in this PR, I think it makes more sense to have a deleteBlob with potentially a wildcard prefix string passed in for this PR, because then we won't be getting rid of the bulk updates code, only to reinsert it again in another PR. Does that make sense? |
|
@abeyad : if you read your explanation again, you didn't really provide justification for doing it now. On second look, while the idea proposed by @s1monw is interesting, if you look at how Unless we have functionality that can do Consequently, my thought process is shifting towards just removing |
|
@imotov will this solve you objection ^^ |
|
Sorry for the delay, I missed this ping somehow. I think Anyway, we just doubled the time and increased by 10-50 times the price of a snapshot deletion on S3 by implementing file existence check in #18815 for a similar gain in resiliency. So, my objections here over loosing potential savings on S3 don't seem to make much sense any more. |
Removed the following methods from the BlobContainer interface: 1) deleteBlobs 2) deleteBlobsByPrefix These removals help to clean up the interface. In addition, these methods were not very useful because they did not allow for atomic deletions. Closes gh-18529.
|
Since none of the current blob container implementations take advantage of batching with deleteBlobsByPrefix, the only one that will is @tlrx 's Google implementation. As discussed with @s1monw, we will merge this PR now and @tlrx can add an optional wildcard suffix to the single |
|
@gfyoung thanks for your work on this! |
Title is self-explanatory. Closes #18529.