-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-16371: Option to disable GCM for SSL connections when running on Java 8 #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use SLF4J {} expansion rather than inline; there's a lot of commons-logging era log statements, and changing a line is the time to upgrade them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like having to remember the case of all these options, especially as every other s3a config option is all lower case. these should all be case insensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better: A parameterized junit test (see ITestS3AMetadataPersistenceException) where the example isn't an enum but the string values we expect to see. I know hard-coding strings is not what I normally like, but doing it here would ensure that we'd have a regression test against the values changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about this as it then ensures that the s3a code will only load when the shaded aws JAR is on the CP; people won't be able to switch to the unshaded JAR with the unshaded httpclient code.
I'm also trying to move off adding more stuff to the S3AUtils class as its become a merge conflict point with no real structure: every patch adds something to it. See Refactoring S3A for my thoughts there.
As this is adding new network setup, I'm going propose
- putting this into a new class in o.a.h.fs.s3a.impl; -something like "NetworkBinding", where we can add more stuff later on
- whatever is done to set up the AWS networking is done through reflection; if the shaded class can't be found on the CP, then just skip trying to configure it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
communicate with AWS Services.
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
Outdated
Show resolved
Hide resolved
You are going to have to do S3Guard tests here because that talks to another AWS endpoint, and it's important to know if there are any regressions. I'm not worried about the KMS stuff, or the assumed role code; there's enough test of AWS Session credentials that talking to AWS STS is handled. FWIW, we directly talk to the following service s in production code
We need to make sure we are going near all of them. I've just been looking to see where there's already a good parameterized test we could (abuse) to add another option to the config set, so guaranteeing coverage of interaction with these services without adding new tests (and new test delays). Nothing immediately springs to mind, though Sahil: what would you think about replacing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont want a warning in default mode. log at debug unless openssl was explicitly asked for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it 'NoSuchAlgorithmException' so current code can catch it
hadoop-tools/hadoop-azure/pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure about this as it is an existing dependency.
leave as is
|
@sahilTakiar I've left some comments. it'd be good for @DadanielZ to review too |
f4b8f9d to
7fcdf14
Compare
|
💔 -1 overall
This message was automatically generated. |
|
thanks for updating this. Was this just a rebase + push or have you made other changes? |
|
Yes, I rebased this and addressed your comments. Having some trouble running all the tests though. I ran against I'm working on getting S3Guard access, but wanted to update the PR in the meantime. Once I get access will re-run the tests with Still trying to understand how to test against |
|
Re-ran tests with: In my I re-ran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By calling wildfly methods without using reflection, this class can only be loaded or used if wildfly is on the CP. While this is already a requirement of hadoop-azure, I don't want to add it to hadoop-aws. Somehow reflection is going to be needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check in NetworkBinding#bindSSLChannelMode explicitly prevents S3A users from setting fs.s3a.ssl.channel.mode to default or OpenSSL, so there should be no way an S3A user can trigger the Wildfly jar from actually being used.
IIUC Java correctly, a class should still be able to load this class without Wildfly on the classpath. Java only looks for the Wildly classes when a Wildly class is initialized (in this case OpenSSLProvider). The import statements are only used during compilation. ref: https://stackoverflow.com/a/12620773/11511572
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I always had the belief that integers and other simple constants may just get compiled in but that the class load Will kick off as soon as you reference a string in class. I should do more experiments. Maybe even learn Java assembly language. Anyway, as long as we don't try referencing, things like strings from the class, we should be okay. And as the little wild fly JAR is not currently on the hadoop-aws test CP, we are implicitly verifying this. We probably have to be careful once we had more support for it -we need to make sure we've not accidentally made it mandatory.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should go up into the same import block as com.amazonaws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...p-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractSeek.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's good to see this coverage, we do now have a test which is taking 4 min, and I'm trying to keep execution time of a non-scale test with a runner pool of 12 down to < 20 mins -this isn't going to help.
I propose only having the random and normal as the seek policies for the GCM tests; and remove these from the default jsse. Yes, it's a simpler matrix but as the default_jsse option gets implicitly tested everywhere,we aren't losing coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Patch is coming together nicely -nearly there. Done CLI tests as well as the -aws suite, A big fear of mine is that the current patch will, through transitive references, fail if the But I couldn't actually create that failure condition when I tried on the CLI. first, extended patched cloudstore s3a diagnostics to look for the new class tested IO against a store -all good. and when I switch to an unsupported mode I get the expected stack trace which is telling me that my fears are misguided? what do others say? BTW, @bgaborg been having problems with STS tests too -try setting a region for the endpoint. Starting to suspect the latest SDK needs this now. |
|
Did a full test run |
|
Thanks for the feedback and running all the tests Steve! I left a comment above about why I think everything will still work without wildfly on the classpath. Working on addressing the other comments. |
7fcdf14 to
f9a550e
Compare
|
Addressed comments, re-ran tests, the only additional failure is |
|
💔 -1 overall
This message was automatically generated. |
Is it? That's not good. That test uses the file system metrics to make assertions about how many operations actually go to S3. If it's failing either there is something wrong with assertions in your environment, or the connector is potentially making to many or too few calls to S3. What are your test S3Guard settings? |
|
thanks for the final revision and explanation of why classloading wasn't going to be a problem +1 -committed to trunk. Thanks! |
…to CoordinatorStreamSystemConsumer (apache#970)
HADOOP-16371: Option to disable GCM for SSL connections when running on Java 8
Changes:
SSLSocketFactoryExwould beDelegatingSSLSocketFactorybecause the class is not OpenSSL specific (e.g. it is capable of just delegating to the JSSE)DelegatingSSLSocketFactoryfs.s3a.ssl.channel.modeinperformance.mdandcore-default.xmlfs.s3a.ssl.channel.modeanUnsupportedOperationExceptionis thrown.Testing Done:
mvn verifyand S3 scale testsmvn verify -Dparallel-tests -Dscale -DtestsThreadCount=16(did not have S3Guard or kms tests setup)TestDelegatingSSLSocketFactoryon Ubuntu and OSX with-Pnativeand confirmed the test passes on both systems (on OSX it is skipped, on Ubuntu it actually runs)ITestGetNameSpaceEnabled.testNonXNSAccount(known issue)mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true -DskipShade, un-tarredhadoop-dist/target/hadoop-3.3.0-SNAPSHOT.tar.gzand ran./bin/hadoop fs -ls s3a://[my-bucket-name]/successfully and that I could upload and read a file to S3 using the CLI without the wildlfy jar on the classpath