Skip to content

Conversation

@gfyoung
Copy link
Contributor

@gfyoung gfyoung commented Jun 10, 2016

Title is self-explanatory. Closes #18530.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 10, 2016

Suggestions on how to write tests for these changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that a security exception should be re-thrown as an IOException.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DirectoryNotEmptyException is an IOException, why is it being wrapped in an IOException?

Copy link
Contributor Author

@gfyoung gfyoung Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my question in my comment here. Couldn't either exception be considered as being "unable to delete the blob"?

Copy link
Member

@jasontedor jasontedor Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's mistake to wrap a SecurityException in an IOException, I think that the Javadocs need to be changed. And again, a DirectoryNotEmptyException is already an IOException it doesn't need to be wrapped and allowing it to bubble up does not violate what should be the contract for this method.

Copy link
Contributor Author

@gfyoung gfyoung Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the DirectoryNotFoundException, my bad. I misread your comment there. With regards to the SecurityException, I do see what you mean, though probably input from @abeyad would also be useful here since he did bring the issue initially.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not catch the SecurityException at all. Let it propagate. We should not have even gotten to this point if the security manager did not give us access here, but in any case, its not an exception we should handle at this level. It should just be propagated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not catch the SecurityException at all. Let it propagate.

Precisely.

@jasontedor
Copy link
Member

Title is self-explanatory.

It explains the what, but not the why and the why is really important.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 10, 2016

@jasontedor : I understand that these exceptions are not IOException, but then is the issue I'm closing a real issue then? Also, if you look at the two implementations that already throw IOException (see here), those caught exceptions are not IOException either.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 10, 2016

@jasontedor : I think the issue that I am closing explains the importance. Do I need to repeat it in the PR?

@abeyad
Copy link

abeyad commented Jun 10, 2016

@gfyoung Regarding writing tests, take a look at FsBlobStoreContainerTests. It extends ESBlobStoreContainerTestCase which contains useful general BlobContainer tests that can be used for each of the different BlobContainer implementations.

@jasontedor
Copy link
Member

I understand that these exceptions are not IOException, but then is the issue I'm closing a real issue then?

Yes, the leniency is the real issue.

@jasontedor
Copy link
Member

jasontedor commented Jun 10, 2016

I think the issue that I am closing explains the importance. Do I need to repeat it in the PR?

Yes, the why needs to be in the commit message which ends up as the default body for the initial PR comment.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 10, 2016

@jasontedor : Regarding the leniency, sure thing. I suppose a more careful examination of the possible exceptions thrown needs to be done here. Regarding the commit body, fair enough. That can be added.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 12, 2016

  1. Added a deleteBlob test for the IOException.

  2. Added BlobStoreContainer tests AzureBlobContainer, HdfsBlobContainer, and S3BlobContainer, but I had issues writing their newBlobStore methods, especially as simply as the GoogleCloudStorageBlobStoreContainerTests for example.

Feedback on how to properly write them would be great!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this file context is initialized properly. Look at HdfsRepository for how it is initialized, with the appropriate security checks. You will need to create a temp dir for this, see the createTempDir() method that ESBlobStoreContainerTestCase inherits.

Did you run this test? What did it say, if anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails at the moment because of permissions. However I was pretty sure I didn't properly initialize everything either.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I figured... my feeling is these tests weren't written before because they require mocking of the S3/Azure/HDFS services (HDFS may actually work if you give it a createTempDir() and initialize it the same way HdfsRepository does). I will be interested to here @imotov and @tlrx thoughts. The Google blob store repository has a mock storage client that mocks the behavior of what a real Google storage client would do, without actually connecting to an external service. Something similar would need to be done for S3 and Azure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was looking for some sort of mock objects as for the Google object, but I could not find any such thing. Should probably see if I can Mockito somehow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, for example, in the case of S3, you would have to create a mock
implementation for the AmazonS3 interface. Its a pretty lengthy interface,
but you would only need to provide mock implementations for the methods
that S3BlobStore uses, which are just a few. The rest of the methods could
just return null or some dummy values.

On Sun, Jun 12, 2016 at 12:45 PM, gfyoung [email protected] wrote:

In
plugins/repository-hdfs/src/test/java/org/elasticsearch/repositories/hdfs/HdfsBlobStoreContainerTests.java
#18815 (comment)
:

  • * specific language governing permissions and limitations
  • * under the License.
  • */
    +package org.elasticsearch.repositories.hdfs;
    +
    +import org.apache.hadoop.fs.FileContext;
    +import org.elasticsearch.common.blobstore.BlobStore;
    +import org.elasticsearch.repositories.ESBlobStoreContainerTestCase;
    +
    +import java.io.IOException;
    +
    +public class HdfsBlobStoreContainerTests extends ESBlobStoreContainerTestCase {
    +
  • @OverRide
  • protected BlobStore newBlobStore() throws IOException {
  •    return new HdfsBlobStore(FileContext.getFileContext(), "", 100);
    

Yeah, I was looking for some sort of mock objects as for the Google
object, but I could not find any such thing. Should probably see if I can
Mockito somehow.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elastic/elasticsearch/pull/18815/files/5ecfabc0dffffcbec1cbf7c5c7db0e3ab05a8c07#r66723801,
or mute the thread
https://github.com/notifications/unsubscribe/ABjkQVZ4QGKT05XXuqbFiz4DONPvQmbGks5qLDeUgaJpZM4Iy-tw
.

@clintongormley clintongormley added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement labels Jun 14, 2016
@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 22, 2016

Is there any update on whether or not we can use actual repositories for both Azure or S3 for these tests? If that can be done, that will save the working of having to mock everything for both repositories.

@jasontedor
Copy link
Member

Is there any update on whether or not we can use actual repositories for both Azure or S3 for these tests?

We can not. These tests need to be able to run from developer machines.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 22, 2016

@jasontedor : Ah, I see. Hmm...then perhaps some more input from others on how to possibly mock the Azure and S3 repositories? The mocking strategy suggested by @abeyad looked promising, but further discussion of it appears to me that it will only hit a dead end AFAICT.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 23, 2016

Sometimes a fresh look at a problem is needed. Took a different approach to mocking S3-related components and found that I really just needed to mock the client as @abeyad had mentioned at some point. Onto Azure and HDFS now!

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 23, 2016

AzureStorageServiceMock really made things a lot easier for the AzureBlobStoreContainer tests than I had perceived previously. Only HDFS is left now!

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 23, 2016

Having some issues with HDFS: instantiating the BlobStore results in a permissions error (even when I create a temp directory that I then instantiate FileContext with).

An exception is thrown because when FileContext.getFileContext(...) is called, it cannot call UserGroupInformation.getCurrentUser() via Subject.getSubject() (in Hadoop source code) due to permission denials. How to get around this?

@abeyad
Copy link

abeyad commented Jun 23, 2016

@gfyoung great news, let me know when its ready and I can give it another review.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 23, 2016

@abeyad : read my comment above - some assistance on the HDFS permissions issue would be great!

@abeyad
Copy link

abeyad commented Jun 23, 2016

Not familiar with HDFS but perhaps @jbaiera could help?

@jbaiera
Copy link
Member

jbaiera commented Jun 23, 2016

@abeyad Re HDFS: Many of the Hadoop services' authentication is based around Hadoop's UserGroupInformation object, which is poked around with even if auth is turned off. When you first request the UserGroupInformation object in the Hadoop Client, it goes through it's internal setup steps. One of the steps is registering with a metrics gathering system so that the UGI can emit quantiles for number of login successes and failures. This is for the benefit of reporting and monitoring systems like Ganglia. The snag occurs when UGI registers it's metrics object. The system tries accessing the UgiMetrics object's fields using reflection to collect them as metrics. When the wrapper code asks for all declared fields on UgiMetrics it trips on the SecurityManager as it does not have the permission for accessDeclaredMembers. I'm afraid I'm not too familiar with the Security side of Java to help much further than that...

@gfyoung
Copy link
Contributor Author

gfyoung commented Jun 23, 2016

Hmmm...seems like a stripped down mock of the FileContext might be needed but not sure yet how to go about that...
EDIT: Arghhh...private constructor only in FileContext --> no extending 😢

gfyoung added 7 commits June 30, 2016 21:36
Added BlobContainer tests for Azure storage
and caught a bug at the same time in which
deleteBlob was not raising an IOException
when the blobName did not exist.
Added BlobContainer tests for HDFS storage
and caught a bug at the same time in which
deleteBlob was not raising an IOException
when the blobName did not exist.
@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 1, 2016

@abeyad : Change has been made. Ready to merge if there are no other concerns.

@abeyad abeyad merged commit d24cc65 into elastic:master Jul 1, 2016
@gfyoung gfyoung deleted the stricter-delete-blob branch July 1, 2016 03:09
javanna added a commit to javanna/elasticsearch that referenced this pull request Jul 1, 2016
This reverts commit d24cc65 as it seems to be causing test failures.
@javanna
Copy link
Member

javanna commented Jul 1, 2016

@abeyad I reverted this commit as it was causing consistent test failures, here is an example:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-intake/942/console

@gfyoung gfyoung restored the stricter-delete-blob branch July 1, 2016 13:10
@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 1, 2016

@javanna @abeyad :

IMHO reverting the entire commit was a bit overkill. In such situations, I have seen maintainers simply disable tests temporarily with the expectation that the committer (in this case, me) would address them. Almost never have I seen anyone shy away from the task.

A closer look at the test failures suggests that we have a conflict of behaviours here.

The two test failures are coming from SharedClusterSnapshotRestoreIT.java at the following tests:

  1. testDeleteSnapshotWithMissingMetadata
  2. testDeleteSnapshotWithCorruptedSnapshotFile

Both test failures was triggered by the change from Files.deleteIfExists to Files.delete in FsBlobContainer.java, which is a lot stricter than Files.deleteIfExists because it also throws a NoSuchFileException, which is what is thrown in both cases.

In the first test, because the metadata file is deleted, when FsBlobStoreContainer tries to delete it again (during the snapshot deletion), it can no longer do it. In the second test, the file corruption makes it unreadable, therefore leading to the exception once again.

Thus, the question is, what needs to change? Does FsBlobStoreContainer need to be more lenient again? Or do changes need to be made to the snapshot deletion behaviour (where is the code for that BTW)?

@javanna
Copy link
Member

javanna commented Jul 1, 2016

yes @gfyoung I could have probably annotated the test with @AwaitsFix and opened an issue about it. I went for baking it off as the PR was very recently merged, those tests were consistently failing and the whole team couldn't run tests hence push anything tested. Don't worry though, once those tests run successfully the PR will be for sure merged back.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 1, 2016

@javanna : no worries, any thoughts on the rest of what I wrote?

@abeyad
Copy link

abeyad commented Jul 1, 2016

IMHO reverting the entire commit was a bit overkill. In such situations, I have seen maintainers simply disable tests temporarily with the expectation that the committer (in this case, me) would address them. Almost never have I seen anyone shy away from the task.

A commit that causes test failures generally means a bug somewhere that was not thought through. Anyone could create a snapshot build at any time off of master - we don't want to risk master introducing new bugs from a recent PR, to the extent possible, and tests failing is an indication of it. Hence, I believe @javanna made the right call. Once its resolved, the everything will be merged back in, so its not an issue.

testDeleteSnapshotWithMissingMetadata

We should call expectThrows(IOException.class, () -> client.admin().cluster().prepareDeleteSnapshot("test-repo", "test-snap-1").get())

The line below it asserting the snapshot is deleted will probably not be true too, you'll have to check and change the assertion in that case as well.

testDeleteSnapshotWithCorruptedSnapshotFile

Whats the stack trace here? If corrupted files (which can happen) cause the entire snapshot to be undeletable, then I think we have a problem. If that's the case, we should open a separate issue indicating snapshot/restore has to handle corrupted files more robustly, and put an @AwaitsFix on this test linking to the newly created github issue.

The thing is, if the snap-*.dat file is corrupted, that contains all the info about everything else in the snapshot, so in this test's case, I think it should throw an exception if its undeletable due to corruption. However, if other actual snapshot files could not be deleted due to corruption, but the rest of the snapshot's files could be deleted, I think we still want to get delete the snapshot and all the files we could delete.

@abeyad
Copy link

abeyad commented Jul 1, 2016

@gfyoung Also, run gradle clean check for everything once your commit is ready, that will run all of the tests (including the ones for the plugins, etc)

@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 1, 2016

What about following what BlobStoreIndexShardRepository.java does by catching the IOException on each deletion call and then logging it with logger.debug?

That might be a nicer solution for both test failures (and for users too perhaps), as we could still clean up all the files that are delete-able but make the user aware of those failures.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 1, 2016

Also, your proposed change for testDeleteSnapshotWithMissingMetadata does not appear to work, at least when I run tests locally.

@abeyad
Copy link

abeyad commented Jul 1, 2016

@gfyoung i'll pull down your branch and experiment

@abeyad
Copy link

abeyad commented Jul 3, 2016

@gfyoung I've created a branch here: https://github.com/abeyad/elasticsearch/tree/stricter-delete-blob. I'd like to issue a PR against your branch, but I don't have access to your elasticsearch fork.

Before incorporating, I'd like feedback from @imotov first on what I've done.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 3, 2016

@abeyad : My original branch is here. You should be able to access that hopefully. Thanks for doing this! Hopefully we can get it right with Jenkins this time around. 😄

@abeyad
Copy link

abeyad commented Jul 8, 2016

@gfyoung I haven't forgotten about this; I've discussed some changes with @imotov that should be done in order to avoid issues revealed with the two broken tests. I'm working on those now and will hopefully have a PR for review in the next couple days. Then both our work can be merged together.

Can I request that you remove your latest commit to this branch and push? My commits will cause merge conflicts with the latest commit.

Lastly, I still can't issue a PR against your branch. I believe you need to add me as one of your collaborators for elasticsearch (https://github.com/gfyoung/elasticsearch/settings/collaboration). Alternatively, we can setup a feature branch that incorporates both of our commits - this may be the best approach so I'll set it up and let you know when its there so you can push your commits to it.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 8, 2016

@abeyad : I reverted that last commit on my branch. That was my attempt to patch the errors. Strange that you can't put up PR's against my fork (I haven't seen that issue before), but yes, do let me know once the feature branch is up!

@gfyoung
Copy link
Contributor Author

gfyoung commented Jul 13, 2016

@abeyad : any updates on the feature branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v5.0.0-alpha5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants