Skip to content

Conversation

@gfyoung
Copy link
Contributor

@gfyoung gfyoung commented Jun 10, 2016

Title is self-explanatory. Closes #18529.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the individual file here, we could give a more descriptive error message by saying "error deleting index file [" + blobName + "] during cleanup"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, given the lambda format, that certainly makes more sense. Done.

@abeyad
Copy link

abeyad commented Jun 10, 2016

@gfyoung I reviewed the code and it looks good, I put one minor comment. Also, it would be better to have a more descriptive commit message. The title for the commit message is good, but there should be a body for the commit message enumerating the two methods that were removed from the interface and why they were removed (i.e. explaining that its a cleaner interface to have just one method, and we gain nothing by the other two methods as they don't afford us atomic deletes anyway).

Thanks for taking this on!

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 10, 2016

@abeyad : Certainly. I've updated the commit message for the duplicate deleteBlob method removal, and added a more descriptive error message as requested. Ready to merge if there are no other concerns.

@abeyad
Copy link

abeyad commented Jun 10, 2016

LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there is now dead code behind this. Can you please verify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I removed all of the dead code that I could possibly remove (see bottom of file).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about GoogleCloudStorageBlobStore.deleteBlobs() method. If it's not used anymore in the BlobContainer I think it is useless now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteBlobs is called by deleteBlobsByPrefix which in turn is called by delete. I suppose I could refactor to put everything in under delete?

@tlrx
Copy link
Member

tlrx commented Jun 10, 2016

I left a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please stick to the for loop. We should be consistent in our loops unless it's really necessary or beneficial

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@s1monw
Copy link
Contributor

s1monw commented Jun 10, 2016

awesome cleanup - I think we need a note in the migration guide and should deprecate these methods in 2.x @abeyad can you take care of this once it's in?

@s1monw s1monw added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v5.0.0-alpha4 labels Jun 10, 2016
@abeyad
Copy link

abeyad commented Jun 10, 2016

@s1monw will do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we swallowing this? I know that's the way that it was before, but is it right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imotov could you shed some light on why we swallow the IOException here for data files?

I think a future enhancement (not part of this PR) should be to throw the IOException and make the callers handle it. Unless there is a compelling reason not to do it that way.

@imotov
Copy link
Contributor

imotov commented Jun 10, 2016

-1. As @tlrx said before this will have a significant negative impact on performance of the delete operation on slow repositories. I think we should instead take advantage batch deletes on S3 the same way as we do it on GCE. Snapshot deletion can lead to potential deletion of a large number of files at ones and we will have to wait for roundtrip for each operation before preceding to delete the next file.

@s1monw
Copy link
Contributor

s1monw commented Jun 10, 2016

-1. As @tlrx said before this will have a significant negative impact on performance of the delete operation on slow repositories. I think we should instead take advantage batch deletes on S3 the same way as we do it on GCE. Snapshot deletion can lead to potential deletion of a large number of files at ones and we will have to wait for roundtrip for each operation before preceding to delete the next file.

what is significant? 1 second per file? 100ms per fiel? I think API simplicity should be preferred compared to batching. The interfaces we use for S/R are extremely polluted with optimizations I really wonder if it's worth it.

@imotov
Copy link
Contributor

imotov commented Jun 10, 2016

what is significant?

I would expect the latency for the delete request to be anywhere between 30ms and 300ms per file depending on location. Assuming, that we delete a snapshot with 100 shards, 20 files per shard with average latency of 50ms per file, we are talking about the difference between 5 sec and 1 minute 40 sec to delete the snapshot.

@tlrx
Copy link
Member

tlrx commented Jun 13, 2016

-1. As @tlrx said before this will have a significant negative impact on performance of the delete operation on slow repositories.

@imotov That's right, I agree with this statement. But what convinced me to remove the multiple variations of deleteBlob is that batch deletions make exception and error handling complex. Now we have a Task API I think it is OK to have longer snapshot deletion as long as we can regularly check the progress.

@imotov
Copy link
Contributor

imotov commented Jun 13, 2016

@tlrx not sure I see how Task API would help with potential 10-20 times slow down in snapshot deletion speed. We can get the performance back by running deletes in parallel but that will increase complexity on other layers and might start triggering throttling on some providers that are not happy with too many small simultaneous requests rushing in. I don't see how this is a good change for S3 and GCE plugins, but if you and @dadoonet like it, I am not going to argue.

@dadoonet
Copy link
Contributor

I was looking at this today. Actually this deleteBlobs() method has been added because of #12697.
If we remove this method, we can close #12697 as it won't be implemented.

We can remove deleteBlobs for now and reimplement if we need to optimize at some point. It will be easier than years ago because plugins and core code lives now in the same repo.

So I'm not against this cleanup.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 15, 2016

Okay, let's quickly summarise what's going on here (easier for myself to understand):

@abeyad proposed the change to simplify the interface here and has given the green light to this PR

@imotov had concerns about latency for S3 and GCE but would not object if @tlrx and @dadoonet both gave the green light to this PR

@tlrx did not believe the latency impact would be too significant to outweigh the benefits of a simplified interface and AFAIU has approved this change

@dadoonet also appears to have given the green light for this change AFAIU

So...can we merge this ❓

@tlrx
Copy link
Member

tlrx commented Jun 16, 2016

@tlrx did not believe the latency impact would be too significant and AFAIU has approved this change

Sorry, that's not what I said. I agree with @imotov and I also think that this will have a significant negative impact on performance. On the other hand, Ali, Simon and Robert have the argument that this change make the API cleaner and easier to maintain. I do agree with this too, as well as error handling in batch requests can be tricky to handle correctly too. These points make me think that we can sacrifice some perf at snapshot deletion time in favor of cleaner code if that make our life (and user's life too) better. That's why I'm OK with this change.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 16, 2016

@tlrx : updated my comment to clarify based on what you said

My question still stands though.

@s1monw
Copy link
Contributor

s1monw commented Jun 20, 2016

maybe we can add a compromise here and try to interpret a wildcard suffix. If somebody calls delete("foo*") we try to fetch all files matching the prefix / wildcard and delete them? This would simplify the interface and we can make batching an impl detail?

@abeyad
Copy link

abeyad commented Jun 20, 2016

+1

On Mon, Jun 20, 2016 at 4:03 AM, Simon Willnauer [email protected]
wrote:

maybe we can add a compromise here and try to interpret a wildcard suffix.
If somebody calls delete("foo*") we try to fetch all files matching the
prefix / wildcard and delete them? This would simplify the interface and we
can make batching an impl detail?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#18813 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABjkQdbQMqLKI1iNJ04JkBAp9yAsXxXcks5qNklngaJpZM4Iy1Kv
.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 21, 2016

@s1monw : integrating deleteBlobsByPrefix functionality into deleteBlobs sounds like an interesting idea. Depending on how much traction it gets, I wonder though if it might be best served for a separate PR as follow-up?

@abeyad
Copy link

abeyad commented Jun 27, 2016

@gfyoung Sorry I just realized you didn't get a response to this. Since we are doing the cleanup in this PR, I think it makes more sense to have a deleteBlob with potentially a wildcard prefix string passed in for this PR, because then we won't be getting rid of the bulk updates code, only to reinsert it again in another PR. Does that make sense?

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 30, 2016

@abeyad : if you read your explanation again, you didn't really provide justification for doing it now.

On second look, while the idea proposed by @s1monw is interesting, if you look at how deleteBlobsByPrefix is implemented in AbstractBlobContainer, you can see that it's really just a for-loop in most cases that calls deleteBlob in the process. Suddenly AFAICT, integration is not so simple.

Unless we have functionality that can do regex deletions (which I don't think we can do across ALL implementations), essentially we will have to go through some gymnastics of refactoring the current deleteBlob code to make sure that it checks whether we have a wildcard and then somehow integrate a for-loop?

Consequently, my thought process is shifting towards just removing deleteBlobs but leaving deleteBlobsByPrefix unless a nice, more suitable solution can be proposed instead of the one I just described.

@s1monw
Copy link
Contributor

s1monw commented Jun 30, 2016

@imotov will this solve you objection ^^

@imotov
Copy link
Contributor

imotov commented Jul 6, 2016

Sorry for the delay, I missed this ping somehow. I think deleteBlobsByPrefix is a leftover from the times when a blob store was used by gateways and it's not actually used by snapshot/restore. Therefore I don't see how we would be able to take advantage of support for wildcards in deleteBlob unless it also supports listing blobs using comma-separated syntax or full regex syntax, which we could use to list multiple blob files in a single call.

Anyway, we just doubled the time and increased by 10-50 times the price of a snapshot deletion on S3 by implementing file existence check in #18815 for a similar gain in resiliency. So, my objections here over loosing potential savings on S3 don't seem to make much sense any more.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 6, 2016

@imotov: Actually, #18815 (also my PR) got reverted but will be re-merged once a nagging test failure has been fixed. Not sure if that changes your stance.

IINM, are you no longer objecting to the deletions (i.e. we can even remove the byPrefix method as has already been done)?

gfyoung added 3 commits July 13, 2016 10:33
Removed the following methods from the
BlobContainer interface:

1) deleteBlobs
2) deleteBlobsByPrefix

These removals help to clean up the interface.
In addition, these methods were not very useful
because they did not allow for atomic deletions.

Closes gh-18529.
@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 13, 2016

@imotov : any response to my comments?

@everyone : can this be merged if there are no other concerns?

@abeyad
Copy link

abeyad commented Jul 13, 2016

Since none of the current blob container implementations take advantage of batching with deleteBlobsByPrefix, the only one that will is @tlrx 's Google implementation. As discussed with @s1monw, we will merge this PR now and @tlrx can add an optional wildcard suffix to the single deleteBlob interface so batching can be done under the hood in working on the Google storage blob container implementation.

@abeyad abeyad merged commit 3f2e106 into elastic:master Jul 13, 2016
@abeyad
Copy link

abeyad commented Jul 13, 2016

@gfyoung thanks for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue v5.0.0-alpha5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants