Skip to content

Conversation

@sidseth
Copy link
Contributor

@sidseth sidseth commented Aug 21, 2019

This is very similar to the original patch.
In terms of testing - have run tests against a bucket in us-east-1 (including with the patch posted on HADOOP-16477). Struggling a bit with failures though, which seem completely unrelated to the patch. Still trying to get my test configuration file to a state where tests pass.

@sidseth sidseth force-pushed the HADOOP-16445 branch 2 times, most recently from 27e1939 to 586cff1 Compare September 15, 2019 05:27
@apache apache deleted a comment from hadoop-yetus Sep 16, 2019
@apache apache deleted a comment from hadoop-yetus Sep 16, 2019
@apache apache deleted a comment from hadoop-yetus Sep 16, 2019
@apache apache deleted a comment from hadoop-yetus Sep 16, 2019
@apache apache deleted a comment from hadoop-yetus Sep 16, 2019
@steveloughran
Copy link
Contributor

we also use the credentials for talking to STS (session credentials) and I have an ambition to talk to Amazon SQS to subscribe to changes in an S3 bucket for spark streaming. As a result, I'm thinking about how to make that possible after your changes.

I'm also slowly trying to stop S3AUtils getting any worse as the one stop "throw all our static static stuff in here. S3AUtils is a merge troublespot and has grown too big. I'm reluctant to move existing stuff, because it makes backporting so hard, but I'd like to make the amount of new stuff we had there nearly non-existent. In particular #970 is adding a new class org.apache.hadoop.fs.s3a.impl.NetworkBinding where networking stuff can go; I'm not sure the best place for authentication stuff. Maybe a class alongside that org.apache.hadoop.fs.s3a.impl.AwsConfigurationFactory. What do you think?

Also, rather than a new method createAwsConfForDdb alongside the existing one, I'd rather than existing one was extended to take a string declaring what the configurations for, e.g : "s3", "ddb", "sts" ... I'm proposing a string over an enum to maintain binary compatibility for applications which add/use a new service. If the name is unknown, we could just warn and return the "default" configuration.

So how about adding the new operations into some into some AwsConfigurationFactory.createAwsConf(string purpose,...) method, with S3AUtils.createAwsConf() tagged as deprecated and invoking the new method. I'm reluctant to cut it, as I suspect some people (me) have been using it elsewhere.

With that, I'm now going to to some minor review of the patch at the line-by-line level. Those comments come secondary to what I've just suggested.

@sidseth
Copy link
Contributor Author

sidseth commented Sep 18, 2019

Removed the new method in S3AUtil, and introduced a SignerManager (more changes coming to this soon). Also incorporated one of the changes from the patch on HADOOP-16505 which uses the correct way to initialize signers.

On the AwsConfigurationFactory - that makes sense. However, I don't think that should be in this patch. It's unrelated, and will end up moving more code from S3AUtils (including some public static methods). That's better handled in a separate refactoring only patch.

@sidseth
Copy link
Contributor Author

sidseth commented Sep 18, 2019

Updated. Have left the timeouts on the test, since these tests don't need the default 10 minute timeout.


@Rule
public Timeout testTimeout = new Timeout(
10_000L, TimeUnit.MILLISECONDS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation.
FWIW, for any ITest, the values in S3ATestConstants are the ones to refer to, not any other ones. Why so? When you are debugging things in the IDE you can edit those values in one place and know that your debug session won't suddenly finish 5-10 minutes in. I usually end up doing exactly that after my first debug session times out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is set to 10 minutes, which is a little too high for simple tests like this. A test which actually gets stuck would end up wasting a lot of time and resources when running on Jenkins.

import com.amazonaws.auth.Signer;
import com.amazonaws.auth.SignerFactory;
import java.util.concurrent.TimeUnit;
import org.apache.hadoop.classification.InterfaceAudience.Private;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all org.aoache stuff needs to go in its own block after the block of non-asf imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the next patch. I couldn't find anything on the coding standards, or a standard IntelliJ template.

@steveloughran steveloughran self-assigned this Sep 18, 2019
@sidseth
Copy link
Contributor Author

sidseth commented Sep 18, 2019

Had forgotten how difficult it is to get a patch into Hadoop. Are there any published coding standards or templates for IntelliJ / Eclipse that you're aware of?

There's multiple revisions of the patch just fixing checkstyles, which is rather annoying.

@steveloughran
Copy link
Contributor

Had forgotten how difficult it is to get a patch into Hadoop. Are there any published coding standards or templates for IntelliJ / Eclipse that you're aware of?

not AFAIK. Spark has some automated checks for import ordering but as there's no IDE config which works 100% of the time there either, also a PITA. But as it is 100% consistent, it is probably better at avoiding a lot of backport-merge-hell, which imports always seem to generate

import com.amazonaws.auth.SignerFactory;
import java.io.Closeable;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports still a bit confused here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch. Hopefully fixed in the latest commit.

@steveloughran
Copy link
Contributor

I think you've managed to get stuff into the latest commit which weren't there earlier (SignerManager). Was that your intent? Because that's a bigger change. I'm happy with the smaller patch and its details; lets get that in before the next step

@sidseth
Copy link
Contributor Author

sidseth commented Sep 19, 2019

The change the latest commit (SignerManager) removes the code from S3AUtils and moves it to a SignerManager class (non-static). The actual signer registration has picked up bits from the patch on HADOOP-16505 (duplicate of this jira), since that was doing the registration in a better way (once per unique signer per JVM, rather than each time a FS instances is created).

I do plan to make additional changes to this class in a follow up patch, but it's essentially the same as the old patch, with a non-static Closeable class, without adding an extra static method to S3AUtils. Please let me know if there's specific concerns around this (i.e. change back to what?)

import com.amazonaws.services.securitytoken.model.Credentials;
import com.amazonaws.services.securitytoken.model.GetSessionTokenRequest;
import com.google.common.base.Preconditions;
import org.apache.hadoop.fs.s3a.Constants;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import location

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 98 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1192 trunk passed
+1 compile 32 trunk passed
+1 checkstyle 24 trunk passed
+1 mvnsite 36 trunk passed
+1 shadedclient 793 branch has no errors when building and testing our client artifacts.
+1 javadoc 25 trunk passed
0 spotbugs 56 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 55 trunk passed
_ Patch Compile Tests _
+1 mvninstall 34 the patch passed
+1 compile 27 the patch passed
+1 javac 27 the patch passed
+1 checkstyle 19 hadoop-tools/hadoop-aws: The patch generated 0 new + 42 unchanged - 1 fixed = 42 total (was 43)
+1 mvnsite 31 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 825 patch has no errors when building and testing our client artifacts.
+1 javadoc 22 the patch passed
+1 findbugs 60 the patch passed
_ Other Tests _
+1 unit 81 hadoop-aws in the patch passed.
+1 asflicense 29 The patch does not generate ASF License warnings.
3485
Subsystem Report/Notes
Docker Client=18.09.7 Server=18.09.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1332/10/artifact/out/Dockerfile
GITHUB PR #1332
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 15cf078856e3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / f260b5a
Default Java 1.8.0_222
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1332/10/testReport/
Max. process+thread count 308 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1332/10/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@sidseth
Copy link
Contributor Author

sidseth commented Sep 19, 2019

tests all went well for me
Do you have fs.s3a.metadatastore.authoritative, fs.s3a.metadatastore.impl set in core-site.xml or auth-keys.xml

The MR tests seem quite broken to me otherwise. They use cluster configuration, which is created from new JobConf() - this will not pick up the test properties. Can get at least 2 of them to pass (haven't tried the other 2) with these 2 config keys set in core-site.xml or auth-keys.xml.
That said, I'm seeing references to "part-m-00000", "part-m-00001", "_SUCCESS" in S3Guard - with a 0 size, and marked as deleted, and cannot explain where these come from. Maybe some of the committer code runs on the client.

@apache apache deleted a comment from hadoop-yetus Sep 20, 2019
@apache apache deleted a comment from hadoop-yetus Sep 20, 2019
@apache apache deleted a comment from hadoop-yetus Sep 20, 2019
*/
public static final String CUSTOM_SIGNERS = "fs.s3a.custom.signers";

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are going to have to add these to the aws docs I'm afraid. This is probably time to start a new "signing" section

import com.amazonaws.auth.AWSSessionCredentials;
import com.amazonaws.services.securitytoken.AWSSecurityTokenService;
import com.google.common.annotations.VisibleForTesting;
import org.apache.hadoop.fs.s3a.Constants;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import location

import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.google.common.base.Preconditions;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.fs.s3a.Constants;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import location

import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.Signer;
import com.amazonaws.auth.SignerFactory;
import java.util.concurrent.TimeUnit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go on, pull this up into its own block...

/**
* Tests for the SignerManager.
*/
public class TestSignerManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to highlight that the completely optional org.apache.hadoop.test.HadoopTestBase class can act as a base class, offering setup of the test timeouts and automatic naming of the junit thread to the current method. I would encourage your use of it.


import com.amazonaws.auth.Signer;
import com.amazonaws.auth.SignerFactory;
import java.io.Closeable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usual and predictable comments about imports

* algorithm. fs.s3a.signing-algorithm - This property has existed for the
* longest time. If specified, without either of the other 2 properties being
* specified, this signing algorithm will be used for S3 and DDB (S3Guard).
* The other 2 properties override this value for S3 or DDB.
Copy link
Contributor

@steveloughran steveloughran Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also: other uses like STS. Maybe say "non S3 requests, such as to DDB (for S3Guard) or to STS."

@steveloughran
Copy link
Contributor

Patch LGTM, only thing needed is to get the imports consistent. I know its a PITA but its to keep merge hell vaguely under control.

You are going to need to document the new signing options in the markdown, rather than just the source code -either here or in a followup JIRA. That would also be good time to list what the standard AWS signers are as I don't see any real docs of them at all.

@sidseth
Copy link
Contributor Author

sidseth commented Sep 20, 2019

Updated the imports again. They should be consistent now.

java.*
\n
org.apache.*
\n
Everything else
\n
import static *

is the formatting options I'm using.

I'll update the docs in the next jira, which will make some changes to the value set in some of the config options.

Does the LGTM qualify as a +1? I'll go ahead and squash-merge the patch if that's the case.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 39 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1041 trunk passed
+1 compile 36 trunk passed
+1 checkstyle 29 trunk passed
+1 mvnsite 40 trunk passed
-1 shadedclient 110 branch has errors when building and testing our client artifacts.
+1 javadoc 31 trunk passed
0 spotbugs 60 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 57 trunk passed
_ Patch Compile Tests _
+1 mvninstall 34 the patch passed
+1 compile 28 the patch passed
+1 javac 28 the patch passed
+1 checkstyle 21 hadoop-tools/hadoop-aws: The patch generated 0 new + 42 unchanged - 1 fixed = 42 total (was 43)
+1 mvnsite 32 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
-1 shadedclient 29 patch has errors when building and testing our client artifacts.
+1 javadoc 24 the patch passed
+1 findbugs 59 the patch passed
_ Other Tests _
+1 unit 82 hadoop-aws in the patch passed.
+1 asflicense 26 The patch does not generate ASF License warnings.
1815
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1332/11/artifact/out/Dockerfile
GITHUB PR #1332
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux e64d223753e4 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b3173e1
Default Java 1.8.0_222
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1332/11/testReport/
Max. process+thread count 342 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1332/11/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@sidseth
Copy link
Contributor Author

sidseth commented Sep 20, 2019

Also, the *MRJob tests pass after applying HADOOP-16591 or setting configs in auth-keys.xml

@sidseth
Copy link
Contributor Author

sidseth commented Sep 20, 2019

/retest

@steveloughran
Copy link
Contributor

+1

@sidseth
Copy link
Contributor Author

sidseth commented Sep 21, 2019

Thanks for the reviews @steveloughran . Merging.
The shadedclient failure isn't related. See: #1432 (comment)

@sidseth sidseth merged commit e02b102 into apache:trunk Sep 21, 2019
smengcl pushed a commit to smengcl/hadoop that referenced this pull request Oct 8, 2019
… S3 and DDB (apache#1332)

(cherry picked from commit e02b102)
Change-Id: I5ae5b5631341812ca9d6972f3922bcdbafc27cb5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants