HADOOP-19044: AWS SDK V2 - Update S3A region logic #6482

HarshitGupta11 · 2024-01-22T09:47:39Z

If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set fs.s3a.endpoint to

s3.amazonaws.com here:https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540

HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is set, or if a region can be parsed from fs.s3a.endpoint (which will happen in this case, region will be US_EAST_1), cross region access is not enabled. This will cause 400 errors if the bucket is not in US_EAST_1.

Proposed: Updated the logic so that if the endpoint is the global s3.amazonaws.com , cross region access is enabled.

Description of PR

How was this patch tested?

Its being currently tested against us-west-2 by explicitly setting the endpoint as s3.amazonaws.com

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set fs.s3a.endpoint to s3.amazonaws.com here:https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540 HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is set, or if a region can be parsed from fs.s3a.endpoint (which will happen in this case, region will be US_EAST_1), cross region access is not enabled. This will cause 400 errors if the bucket is not in US_EAST_1. Proposed: Updated the logic so that if the endpoint is the global s3.amazonaws.com , cross region access is enabled.

hadoop-yetus · 2024-01-22T11:15:40Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 21s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	31m 45s		trunk passed
+1 💚	compile	0m 24s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 18s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 21s		trunk passed
+1 💚	mvnsite	0m 29s		trunk passed
+1 💚	javadoc	0m 16s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 19s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 45s		trunk passed
+1 💚	shadedclient	22m 31s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 20s		the patch passed
+1 💚	compile	0m 21s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 21s		the patch passed
+1 💚	compile	0m 16s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 16s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 11s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚	mvnsite	0m 19s		the patch passed
+1 💚	javadoc	0m 11s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 17s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 41s		the patch passed
+1 💚	shadedclient	21m 53s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 23s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 23s		The patch does not generate ASF License warnings.
		87m 1s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6482/1/artifact/out/Dockerfile
GITHUB PR	#6482
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux a47a17ac58b1 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `ffedbcf`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6482/1/testReport/
Max. process+thread count	561 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6482/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

ahmarsuhail

Thanks @HarshitGupta11 , I think we should go with @virajjasani's implementation #6479 .. let me know what you think.

ahmarsuhail · 2024-01-22T15:01:17Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java

-        region = getS3RegionFromEndpoint(parameters.getEndpoint());
-        if (region != null) {
-          origin = "endpoint";
+      if(parameters.getEndpoint().equals(CENTRAL_ENDPOINT)){


if someone configures fs.s3a.endpoint to s3.amazonaws.com and sets region in fs.s3a.endpoint.region to eu-west-1, this code will start ignoring what we have in fs.s3a.endpoint.region and just enable cross region for everything. I think this could be risky as people could just have s.s3a.endpoint to s3.amazonaws.com in their core-site.xml as it doesn't make a difference if you've set your region right.

HarshitGupta11 · 2024-01-22T16:07:25Z

Thanks @HarshitGupta11 , I think we should go with @virajjasani's implementation #6479 .. let me know what you think.

Cool, I will close this pull request.

github-actions bot added trunk TOOLS AWS labels Jan 22, 2024

ahmarsuhail reviewed Jan 22, 2024

View reviewed changes

HarshitGupta11 closed this Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HADOOP-19044: AWS SDK V2 - Update S3A region logic #6482

HADOOP-19044: AWS SDK V2 - Update S3A region logic #6482

Uh oh!

HarshitGupta11 commented Jan 22, 2024

Uh oh!

hadoop-yetus commented Jan 22, 2024

Uh oh!

ahmarsuhail left a comment

Uh oh!

ahmarsuhail Jan 22, 2024

Uh oh!

HarshitGupta11 commented Jan 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

HADOOP-19044: AWS SDK V2 - Update S3A region logic #6482

HADOOP-19044: AWS SDK V2 - Update S3A region logic #6482

Uh oh!

Conversation

HarshitGupta11 commented Jan 22, 2024

Description of PR

How was this patch tested?

For code changes:

Uh oh!

hadoop-yetus commented Jan 22, 2024

Uh oh!

ahmarsuhail left a comment

Choose a reason for hiding this comment

Uh oh!

ahmarsuhail Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

HarshitGupta11 commented Jan 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants