Raised IOException on deleteBlob #18815

gfyoung · 2016-06-10T14:02:25Z

Title is self-explanatory. Closes #18530.

gfyoung · 2016-06-10T14:02:38Z

Suggestions on how to write tests for these changes?

jasontedor · 2016-06-10T14:24:51Z

core/src/main/java/org/elasticsearch/common/blobstore/fs/FsBlobContainer.java

I don't think that a security exception should be re-thrown as an IOException.

A DirectoryNotEmptyException is an IOException, why is it being wrapped in an IOException?

See my question in my comment here. Couldn't either exception be considered as being "unable to delete the blob"?

I think it's mistake to wrap a SecurityException in an IOException, I think that the Javadocs need to be changed. And again, a DirectoryNotEmptyException is already an IOException it doesn't need to be wrapped and allowing it to bubble up does not violate what should be the contract for this method.

Regarding the DirectoryNotFoundException, my bad. I misread your comment there. With regards to the SecurityException, I do see what you mean, though probably input from @abeyad would also be useful here since he did bring the issue initially.

We should not catch the SecurityException at all. Let it propagate. We should not have even gotten to this point if the security manager did not give us access here, but in any case, its not an exception we should handle at this level. It should just be propagated.

We should not catch the SecurityException at all. Let it propagate.

Precisely.

jasontedor · 2016-06-10T14:28:16Z

Title is self-explanatory.

It explains the what, but not the why and the why is really important.

gfyoung · 2016-06-10T14:30:13Z

@jasontedor : I understand that these exceptions are not IOException, but then is the issue I'm closing a real issue then? Also, if you look at the two implementations that already throw IOException (see here), those caught exceptions are not IOException either.

gfyoung · 2016-06-10T14:30:49Z

@jasontedor : I think the issue that I am closing explains the importance. Do I need to repeat it in the PR?

abeyad · 2016-06-10T15:06:20Z

@gfyoung Regarding writing tests, take a look at FsBlobStoreContainerTests. It extends ESBlobStoreContainerTestCase which contains useful general BlobContainer tests that can be used for each of the different BlobContainer implementations.

jasontedor · 2016-06-10T15:39:07Z

I understand that these exceptions are not IOException, but then is the issue I'm closing a real issue then?

Yes, the leniency is the real issue.

jasontedor · 2016-06-10T15:39:43Z

I think the issue that I am closing explains the importance. Do I need to repeat it in the PR?

Yes, the why needs to be in the commit message which ends up as the default body for the initial PR comment.

gfyoung · 2016-06-10T15:41:56Z

@jasontedor : Regarding the leniency, sure thing. I suppose a more careful examination of the possible exceptions thrown needs to be done here. Regarding the commit body, fair enough. That can be added.

gfyoung · 2016-06-12T02:36:45Z

Added a deleteBlob test for the IOException.
Added BlobStoreContainer tests AzureBlobContainer, HdfsBlobContainer, and S3BlobContainer, but I had issues writing their newBlobStore methods, especially as simply as the GoogleCloudStorageBlobStoreContainerTests for example.

Feedback on how to properly write them would be great!

abeyad · 2016-06-12T16:30:07Z

...tory-hdfs/src/test/java/org/elasticsearch/repositories/hdfs/HdfsBlobStoreContainerTests.java

I don't think this file context is initialized properly. Look at HdfsRepository for how it is initialized, with the appropriate security checks. You will need to create a temp dir for this, see the createTempDir() method that ESBlobStoreContainerTestCase inherits.

Did you run this test? What did it say, if anything?

It fails at the moment because of permissions. However I was pretty sure I didn't properly initialize everything either.

Yeah I figured... my feeling is these tests weren't written before because they require mocking of the S3/Azure/HDFS services (HDFS may actually work if you give it a createTempDir() and initialize it the same way HdfsRepository does). I will be interested to here @imotov and @tlrx thoughts. The Google blob store repository has a mock storage client that mocks the behavior of what a real Google storage client would do, without actually connecting to an external service. Something similar would need to be done for S3 and Azure.

Yeah, I was looking for some sort of mock objects as for the Google object, but I could not find any such thing. Should probably see if I can Mockito somehow.

Right, for example, in the case of S3, you would have to create a mock
implementation for the AmazonS3 interface. Its a pretty lengthy interface,
but you would only need to provide mock implementations for the methods
that S3BlobStore uses, which are just a few. The rest of the methods could
just return null or some dummy values.

On Sun, Jun 12, 2016 at 12:45 PM, gfyoung [email protected] wrote:

In
plugins/repository-hdfs/src/test/java/org/elasticsearch/repositories/hdfs/HdfsBlobStoreContainerTests.java
#18815 (comment)
:

* specific language governing permissions and limitations

* under the License.

*/
+package org.elasticsearch.repositories.hdfs;
+
+import org.apache.hadoop.fs.FileContext;
+import org.elasticsearch.common.blobstore.BlobStore;
+import org.elasticsearch.repositories.ESBlobStoreContainerTestCase;
+
+import java.io.IOException;
+
+public class HdfsBlobStoreContainerTests extends ESBlobStoreContainerTestCase {
+

@OverRide

protected BlobStore newBlobStore() throws IOException {

return new HdfsBlobStore(FileContext.getFileContext(), "", 100);

Yeah, I was looking for some sort of mock objects as for the Google
object, but I could not find any such thing. Should probably see if I can
Mockito somehow.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elastic/elasticsearch/pull/18815/files/5ecfabc0dffffcbec1cbf7c5c7db0e3ab05a8c07#r66723801,
or mute the thread
https://github.com/notifications/unsubscribe/ABjkQVZ4QGKT05XXuqbFiz4DONPvQmbGks5qLDeUgaJpZM4Iy-tw
.

gfyoung · 2016-06-22T01:28:48Z

Is there any update on whether or not we can use actual repositories for both Azure or S3 for these tests? If that can be done, that will save the working of having to mock everything for both repositories.

jasontedor · 2016-06-22T01:30:43Z

Is there any update on whether or not we can use actual repositories for both Azure or S3 for these tests?

We can not. These tests need to be able to run from developer machines.

gfyoung · 2016-06-22T01:34:15Z

@jasontedor : Ah, I see. Hmm...then perhaps some more input from others on how to possibly mock the Azure and S3 repositories? ~~The mocking strategy suggested by @abeyad looked promising, but further discussion of it appears to me that it will only hit a dead end AFAICT.~~

gfyoung · 2016-06-23T05:55:36Z

Sometimes a fresh look at a problem is needed. Took a different approach to mocking S3-related components and found that I really just needed to mock the client as @abeyad had mentioned at some point. Onto Azure and HDFS now!

gfyoung · 2016-06-23T06:22:05Z

AzureStorageServiceMock really made things a lot easier for the AzureBlobStoreContainer tests than I had perceived previously. Only HDFS is left now!

gfyoung · 2016-06-23T07:21:28Z

Having some issues with HDFS: instantiating the BlobStore results in a permissions error (even when I create a temp directory that I then instantiate FileContext with).

An exception is thrown because when FileContext.getFileContext(...) is called, it cannot call UserGroupInformation.getCurrentUser() via Subject.getSubject() (in Hadoop source code) due to permission denials. How to get around this?

abeyad · 2016-06-23T13:03:58Z

@gfyoung great news, let me know when its ready and I can give it another review.

gfyoung · 2016-06-23T13:05:54Z

@abeyad : read my comment above - some assistance on the HDFS permissions issue would be great!

abeyad · 2016-06-23T13:07:42Z

Not familiar with HDFS but perhaps @jbaiera could help?

jbaiera · 2016-06-23T16:25:48Z

@abeyad Re HDFS: Many of the Hadoop services' authentication is based around Hadoop's UserGroupInformation object, which is poked around with even if auth is turned off. When you first request the UserGroupInformation object in the Hadoop Client, it goes through it's internal setup steps. One of the steps is registering with a metrics gathering system so that the UGI can emit quantiles for number of login successes and failures. This is for the benefit of reporting and monitoring systems like Ganglia. The snag occurs when UGI registers it's metrics object. The system tries accessing the UgiMetrics object's fields using reflection to collect them as metrics. When the wrapper code asks for all declared fields on UgiMetrics it trips on the SecurityManager as it does not have the permission for accessDeclaredMembers. I'm afraid I'm not too familiar with the Security side of Java to help much further than that...

gfyoung · 2016-06-23T20:00:31Z

~~Hmmm...seems like a stripped down mock of the FileContext might be needed but not sure yet how to go about that...~~
EDIT: Arghhh...private constructor only in FileContext --> no extending 😢

Added BlobContainer tests for Azure storage and caught a bug at the same time in which deleteBlob was not raising an IOException when the blobName did not exist.

Added BlobContainer tests for HDFS storage and caught a bug at the same time in which deleteBlob was not raising an IOException when the blobName did not exist.

gfyoung · 2016-07-01T01:38:50Z

@abeyad : Change has been made. Ready to merge if there are no other concerns.

This reverts commit d24cc65 as it seems to be causing test failures.

javanna · 2016-07-01T09:12:17Z

@abeyad I reverted this commit as it was causing consistent test failures, here is an example:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-intake/942/console

gfyoung · 2016-07-01T13:46:06Z

@javanna @abeyad :

IMHO reverting the entire commit was a bit overkill. In such situations, I have seen maintainers simply disable tests temporarily with the expectation that the committer (in this case, me) would address them. Almost never have I seen anyone shy away from the task.

A closer look at the test failures suggests that we have a conflict of behaviours here.

The two test failures are coming from SharedClusterSnapshotRestoreIT.java at the following tests:

testDeleteSnapshotWithMissingMetadata
testDeleteSnapshotWithCorruptedSnapshotFile

Both test failures was triggered by the change from Files.deleteIfExists to Files.delete in FsBlobContainer.java, which is a lot stricter than Files.deleteIfExists because it also throws a NoSuchFileException, which is what is thrown in both cases.

In the first test, because the metadata file is deleted, when FsBlobStoreContainer tries to delete it again (during the snapshot deletion), it can no longer do it. In the second test, the file corruption makes it unreadable, therefore leading to the exception once again.

Thus, the question is, what needs to change? Does FsBlobStoreContainer need to be more lenient again? Or do changes need to be made to the snapshot deletion behaviour (where is the code for that BTW)?

javanna · 2016-07-01T13:50:35Z

yes @gfyoung I could have probably annotated the test with @AwaitsFix and opened an issue about it. I went for baking it off as the PR was very recently merged, those tests were consistently failing and the whole team couldn't run tests hence push anything tested. Don't worry though, once those tests run successfully the PR will be for sure merged back.

gfyoung · 2016-07-01T13:51:36Z

@javanna : no worries, any thoughts on the rest of what I wrote?

abeyad · 2016-07-01T14:11:14Z

IMHO reverting the entire commit was a bit overkill. In such situations, I have seen maintainers simply disable tests temporarily with the expectation that the committer (in this case, me) would address them. Almost never have I seen anyone shy away from the task.

A commit that causes test failures generally means a bug somewhere that was not thought through. Anyone could create a snapshot build at any time off of master - we don't want to risk master introducing new bugs from a recent PR, to the extent possible, and tests failing is an indication of it. Hence, I believe @javanna made the right call. Once its resolved, the everything will be merged back in, so its not an issue.

testDeleteSnapshotWithMissingMetadata

We should call expectThrows(IOException.class, () -> client.admin().cluster().prepareDeleteSnapshot("test-repo", "test-snap-1").get())

The line below it asserting the snapshot is deleted will probably not be true too, you'll have to check and change the assertion in that case as well.

testDeleteSnapshotWithCorruptedSnapshotFile

Whats the stack trace here? If corrupted files (which can happen) cause the entire snapshot to be undeletable, then I think we have a problem. If that's the case, we should open a separate issue indicating snapshot/restore has to handle corrupted files more robustly, and put an @AwaitsFix on this test linking to the newly created github issue.

The thing is, if the snap-*.dat file is corrupted, that contains all the info about everything else in the snapshot, so in this test's case, I think it should throw an exception if its undeletable due to corruption. However, if other actual snapshot files could not be deleted due to corruption, but the rest of the snapshot's files could be deleted, I think we still want to get delete the snapshot and all the files we could delete.

abeyad · 2016-07-01T14:12:14Z

@gfyoung Also, run gradle clean check for everything once your commit is ready, that will run all of the tests (including the ones for the plugins, etc)

gfyoung · 2016-07-01T14:20:35Z

What about following what BlobStoreIndexShardRepository.java does by catching the IOException on each deletion call and then logging it with logger.debug?

That might be a nicer solution for both test failures (and for users too perhaps), as we could still clean up all the files that are delete-able but make the user aware of those failures.

gfyoung · 2016-07-01T14:25:52Z

Also, your proposed change for testDeleteSnapshotWithMissingMetadata does not appear to work, at least when I run tests locally.

abeyad · 2016-07-01T14:28:15Z

@gfyoung i'll pull down your branch and experiment

abeyad · 2016-07-03T06:09:38Z

@gfyoung I've created a branch here: https://github.com/abeyad/elasticsearch/tree/stricter-delete-blob. I'd like to issue a PR against your branch, but I don't have access to your elasticsearch fork.

Before incorporating, I'd like feedback from @imotov first on what I've done.

gfyoung · 2016-07-03T06:19:37Z

@abeyad : My original branch is here. You should be able to access that hopefully. Thanks for doing this! Hopefully we can get it right with Jenkins this time around. 😄

abeyad · 2016-07-08T13:53:11Z

@gfyoung I haven't forgotten about this; I've discussed some changes with @imotov that should be done in order to avoid issues revealed with the two broken tests. I'm working on those now and will hopefully have a PR for review in the next couple days. Then both our work can be merged together.

Can I request that you remove your latest commit to this branch and push? My commits will cause merge conflicts with the latest commit.

Lastly, I still can't issue a PR against your branch. I believe you need to add me as one of your collaborators for elasticsearch (https://github.com/gfyoung/elasticsearch/settings/collaboration). Alternatively, we can setup a feature branch that incorporates both of our commits - this may be the best approach so I'll set it up and let you know when its there so you can push your commits to it.

gfyoung · 2016-07-08T15:18:09Z

@abeyad : I reverted that last commit on my branch. That was my attempt to patch the errors. Strange that you can't put up PR's against my fork (I haven't seen that issue before), but yes, do let me know once the feature branch is up!

gfyoung · 2016-07-13T14:34:36Z

@abeyad : any updates on the feature branch?

jasontedor reviewed Jun 10, 2016
View reviewed changes

gfyoung mentioned this pull request Jun 10, 2016

BlobContainer's deleteBlob method should not be lenient #18530

Closed

abeyad reviewed Jun 12, 2016
View reviewed changes

clintongormley added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement labels Jun 14, 2016

gfyoung added 7 commits June 30, 2016 21:36

Properly raise IOException for Azure, Fs, Hdfs, and S3

9e50e09

Added deleteBlob IOException test

41dc3a7

Added S3BlobStoreContainer tests

c797068

Added AzureBlobStoreContainer tests

c5e9afe

Added BlobContainer tests for Azure storage and caught a bug at the same time in which deleteBlob was not raising an IOException when the blobName did not exist.

Added HdfsBlobStoreContainer tests

83c4d55

Added BlobContainer tests for HDFS storage and caught a bug at the same time in which deleteBlob was not raising an IOException when the blobName did not exist.

Responded to first round of comments

f328c11

Changed Files.deleteIfExists to Files.delete in FsBlobContainer

ed8f982

abeyad merged commit d24cc65 into elastic:master Jul 1, 2016

gfyoung deleted the stricter-delete-blob branch July 1, 2016 03:09

javanna added a commit to javanna/elasticsearch that referenced this pull request Jul 1, 2016

Revert "Raised IOException on deleteBlob (elastic#18815)"

598c361

This reverts commit d24cc65 as it seems to be causing test failures.

gfyoung restored the stricter-delete-blob branch July 1, 2016 13:10

imotov mentioned this pull request Jul 6, 2016

Removed duplicate deleteBlob methods #18813

Merged

abeyad mentioned this pull request Jul 13, 2016

Snapshot UUIDs in blob names #19421

Merged

abeyad mentioned this pull request Jul 31, 2016

More resilient blob handling in snapshot repositories #19706

Merged

clintongormley added the v5.0.0-alpha5 label Aug 4, 2016

Raised IOException on deleteBlob #18815

Raised IOException on deleteBlob #18815

Uh oh!

Conversation

gfyoung commented Jun 10, 2016

Uh oh!

gfyoung commented Jun 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfyoung Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfyoung Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor commented Jun 10, 2016

Uh oh!

gfyoung commented Jun 10, 2016

Uh oh!

gfyoung commented Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abeyad commented Jun 10, 2016

Uh oh!

jasontedor commented Jun 10, 2016

Uh oh!

jasontedor commented Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfyoung commented Jun 10, 2016

Uh oh!

gfyoung commented Jun 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfyoung commented Jun 22, 2016

Uh oh!

jasontedor commented Jun 22, 2016

Uh oh!

gfyoung commented Jun 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfyoung commented Jun 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfyoung commented Jun 23, 2016

Uh oh!

gfyoung commented Jun 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abeyad commented Jun 23, 2016

Uh oh!

gfyoung commented Jun 23, 2016

Uh oh!

abeyad commented Jun 23, 2016

Uh oh!

jbaiera commented Jun 23, 2016

Uh oh!

gfyoung Jun 10, 2016 •

edited

Loading

jasontedor Jun 10, 2016 •

edited

Loading

gfyoung Jun 10, 2016 •

edited

Loading

gfyoung commented Jun 10, 2016 •

edited

Loading

jasontedor commented Jun 10, 2016 •

edited

Loading

gfyoung commented Jun 12, 2016 •

edited

Loading

gfyoung commented Jun 22, 2016 •

edited

Loading

gfyoung commented Jun 23, 2016 •

edited

Loading

gfyoung commented Jun 23, 2016 •

edited

Loading

gfyoung commented Jun 23, 2016 •

edited

Loading

gfyoung commented Jul 1, 2016 •

edited

Loading

javanna commented Jul 1, 2016 •

edited

Loading

gfyoung commented Jul 1, 2016 •

edited

Loading

gfyoung commented Jul 3, 2016 •

edited

Loading

gfyoung commented Jul 8, 2016 •

edited

Loading