-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-16711 Add way to skip verifyBuckets check in S3A fs init() #1838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16711 Add way to skip verifyBuckets check in S3A fs init() #1838
Conversation
853c6fc to
dd6122e
Compare
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but given that there was some ongoing discussion in https://issues.apache.org/jira/browse/HADOOP-16711 -I think @steveloughran should also take a look.
(just a nit: you don't need to specify the bucket you've tested against, but specify the endpoint instead.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
production code looks fine; I've commented more on the test suite, which without disabling FS instance cacheing isn't going to do what you want.
- I'd like to see a stack trace from something like getFileStatus() on failure here, as I don't want no bucket to be interpreted as "nothing at this specific path"
For example, FileSystem.exists() catches that FNFE and will return false, mapping from a bucket not found to something which will delay the outcome
- We'll need a new entry in troubleshooting/any existing entry on missing bucket to be reviewed. And in the performance section, make clear that if you set the check to 0, then missing buckets will surface later as FileNotFoundExceptions, so may confuse applications. That is: while it is an optimization, whoever uses it will have moved the failure point around -and get to deal with that problem.
Actually, on that topic: what if you set the fs.s3a.endpoint to "unknownhost.example.org" with check = 0 and instantiate an FS? We won't see any problems until that first IO operation (presumably), so again, errors will surface later.
The docs should make that clear: "Any problems with connectivity, authentication or the bucket's existence will surface when method calls are made of the filesystem."
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABucketExistence.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABucketExistence.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABucketExistence.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the full stack here? Because I don't want the FNFE from a missing path to be confused with a getFileStatus failure, as that could go on to confuse other things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the text in contained for verification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially brittle, but we can deal with that when the text changes. We have found in the past hat any test coded to look for AWS error messages is brittle against SDK.
Lets just go with this and when things break, catch up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get a problem in this code because FileSystem.cache() will cache on the URI only; if there's an FS in the cache, your settings aren't picked up -you will always get the previous instance. That often causes intermittent problems with test runs.
- Use S3ATestUtils.disableFilesystemCaching to turn off caching of the filesystems you get via FileSystem.get
- and close() them at the end of each test case. You can do with with try/finally or a try-with-resources clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried about this but somehow new conf settings were getting picked up. Need to figure out how. Anyway I have disabled the FilesystemCaching such that we don't see intermittent failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless you want to share across test cases (you don't) or want to have cleanup in the teardown code, move this into a local variable in each test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, given it's non-static it will be unique to each test case. You can just override the teardown() method and add an IOUtilcleanupWithLogger(LOG, fs) & so close the fs variable robustly if it is set. Do call the superclass after.
FWIW, the S3A base test suite already retrieves an FS instance for each test case, so you can pick that up, it's just a bit fiddlier to configure. Don't' worry about it here, but you will eventually have to learn your way around that test code
dd6122e to
a5798ac
Compare
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this extra cleanup is throwing FileSystem is closed! because of this call AbstractFSContractTestBase.deleteTestDirInTeardown() in the super class teardown after each test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. So what's happening then is that your tests are picking up a shared FS instance, not the one you've just configured with different bucket init settings. your tests aren't doing what you think they are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My tests are running new FS instance only. I confirmed that using the IDE debugger. I think what is happening is, we are calling fs.close() twice one with the shared instance and other on my private instance which is stopping all the services for a particular Fs leading to mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the issue. Rather than overriding the teardown method I implemented it which caused the Junit to call teardown() twice thus causing all the above problems.
Sorry My Bad. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries
a5798ac to
81c4162
Compare
|
🎊 +1 overall
This message was automatically generated. |
81c4162 to
b01cfad
Compare
|
OK, production code all LGTM; just that test tuning if closing the |
b01cfad to
2107f34
Compare
|
All review comments addressed. |
|
🎊 +1 overall
This message was automatically generated. |
|
testing myself, setting validation to 0 for the entire test run to field test it better. |
|
Also once I fix that by adding a trailing /, the getFileStatus("/") fails to raise an FNFE, which is because S3Guard is enabled for all buckets on my test setup, and s3guard DDB will create a stub FS Status on a root entry. We could consider shortcutting some of the getFileStatus queries against / in S3A FS itself -it's always a dir after all |
|
Two tests are failing. Will debug. Also will rebase from trunk and fix merge conflicts.
|
Adds a new exception UnknownStoreException to indicate "there's no store there" * raised in verify bucket existence checks * and when translating AWS exceptions into IOEs * The S3A retry policy fails fast on this * And s3GetFileStatus recognises the same failure and raises it Except when the metastore shortcircuits S3 IO, this means all operations against a nonexistent store will fail with a unique exception. ITestS3ABucketExistence is extended to * disable metastore (getFileStatus(/) was returning a value) * always create new instances * invoke all the operations which catch and swallow FNFEs (exists, isFile, isDir, delete) Change-Id: Ide630ec9738ef971eba603b618bd612456fa064b
Created a new class org.apache.hadoop.fs.s3a.impl.ErrorTranslation; future work related to mapping from AWS exceptions to IOEs&C can go in there rather than S3AUtils. Moved the checks for an AmazonServiceException being caused by a missing bucket to there; this cleans up uses of the probe. Add a unit test for the recognition/translation. Change-Id: If81573b0c379def4bae715e4395f3ac19857c08e
4cf556b to
9448a8f
Compare
Fixed merge conflicts. Above tests are succeeding now. |
…or fixes Change-Id: I379afa2a10dc7691abb2bd09014fd52a73e3f7f6
|
💔 -1 overall
This message was automatically generated. |
|
+1; merged with a couple of final fixups of the test (add @deprecated) to the test class to shut up checkstyle thanks! |
Testing Bucket used : https://mthakur-data.s3.ap-south-1.amazonaws.com/file2
Ran all the UT's and IT's using default settings with S3guard.