Skip to content

Conversation

@HarshitGupta11
Copy link
Contributor

If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set fs.s3a.endpoint to

s3.amazonaws.com here:https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540

HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is set, or if a region can be parsed from fs.s3a.endpoint (which will happen in this case, region will be US_EAST_1), cross region access is not enabled. This will cause 400 errors if the bucket is not in US_EAST_1.

Proposed: Updated the logic so that if the endpoint is the global s3.amazonaws.com , cross region access is enabled.

Description of PR

How was this patch tested?

Its being currently tested against us-west-2 by explicitly setting the endpoint as s3.amazonaws.com

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set fs.s3a.endpoint to

s3.amazonaws.com here:https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540

 HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is set, or if a region can be parsed from fs.s3a.endpoint (which will happen in this case, region will be US_EAST_1), cross region access is not enabled. This will cause 400 errors if the bucket is not in US_EAST_1.

 Proposed: Updated the logic so that if the endpoint is the global s3.amazonaws.com , cross region access is enabled.
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 31m 45s trunk passed
+1 💚 compile 0m 24s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 18s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 21s trunk passed
+1 💚 mvnsite 0m 29s trunk passed
+1 💚 javadoc 0m 16s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 19s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 45s trunk passed
+1 💚 shadedclient 22m 31s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 20s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 21s the patch passed
+1 💚 compile 0m 16s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 11s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚 mvnsite 0m 19s the patch passed
+1 💚 javadoc 0m 11s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 41s the patch passed
+1 💚 shadedclient 21m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 23s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 23s The patch does not generate ASF License warnings.
87m 1s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6482/1/artifact/out/Dockerfile
GITHUB PR #6482
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux a47a17ac58b1 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ffedbcf
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6482/1/testReport/
Max. process+thread count 561 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6482/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HarshitGupta11 , I think we should go with @virajjasani's implementation #6479 .. let me know what you think.

region = getS3RegionFromEndpoint(parameters.getEndpoint());
if (region != null) {
origin = "endpoint";
if(parameters.getEndpoint().equals(CENTRAL_ENDPOINT)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if someone configures fs.s3a.endpoint to s3.amazonaws.com and sets region in fs.s3a.endpoint.region to eu-west-1, this code will start ignoring what we have in fs.s3a.endpoint.region and just enable cross region for everything. I think this could be risky as people could just have s.s3a.endpoint to s3.amazonaws.com in their core-site.xml as it doesn't make a difference if you've set your region right.

@HarshitGupta11
Copy link
Contributor Author

Thanks @HarshitGupta11 , I think we should go with @virajjasani's implementation #6479 .. let me know what you think.

Cool, I will close this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants