-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-18938. AWS SDK v2: Fix endpoint region parsing for vpc endpoints. #6466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18938. AWS SDK v2: Fix endpoint region parsing for vpc endpoints. #6466
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'd really like, as it'd make so many support calls go away, is to go from an endpoint like
<property>
<name>fs.s3a.bucket.stevel2-usw2.endpoint</name>
<value>https://bucket.vpce-0117f6033eaf7aee5.s3.us-west-2.vpce.amazonaws.com/</value>
</property>to knowing that the region is us-west-2. It's there, and we've modified the v1 sdk override json to make it go away for some people.
can we have that same logic here?
| /** | ||
| * The vpc endpoint :{@value}. | ||
| */ | ||
| public static final String VPC_ENDPOINT = "vpce.amazonaws.com"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- move to InternalConstants
- is there a .cn equivalent?
|
💔 -1 overall
This message was automatically generated. |
1c39495 to
b32bb52
Compare
|
Thanks for the review! Added parsing of S3 VPC endpoints to help reduce the support calls. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, just thinking through about whether we should ever handle the case "someone uses .vpce." in an internal s3 server. Nobody does that I know of, but...
|
|
||
| // S3 VPC endpoint parsing | ||
| Matcher matcher = VPC_ENDPOINT_PATTERN.matcher(endpoint); | ||
| if(matcher.find()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, add a space after if
| // S3 VPC endpoint parsing | ||
| Matcher matcher = VPC_ENDPOINT_PATTERN.matcher(endpoint); | ||
| if(matcher.find()) { | ||
| LOG.debug("Endpoint {} is vpc endpoint; parsing", endpoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is going to match on anything with .vpce. isn't it? I think it should include amazonaws.{com,com.cn} at the end so if someone ever sets up an internal host called vpce there's no confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR to match entire endpoint to avoid the confusion.
| Matcher matcher = VPC_ENDPOINT_PATTERN.matcher(endpoint); | ||
| if(matcher.find()) { | ||
| LOG.debug("Endpoint {} is vpc endpoint; parsing", endpoint); | ||
| return Region.of(matcher.group(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a debug log saying "mapping to vpce"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you log the group(1) value in the debug log so we can see exactly which one it is. region and endpoint bindings are the cause of many of our support calls.
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new pattern...there's scope to add a simple unit test as well as an integration test -could you have a go?
| * @param endpoint the configure endpoint. | ||
| * @return the S3 region, null if unable to resolve from endpoint. | ||
| */ | ||
| private static Region getS3RegionFromEndpoint(String endpoint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be made package private/visible for testing and then have some unit tests which will run every time yetus does its work. this should include a test which passes an endpoint which shouldn't match the regexp -and verifies that it is rejected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some unit tests on endpoint parsing.
| Matcher matcher = VPC_ENDPOINT_PATTERN.matcher(endpoint); | ||
| if(matcher.find()) { | ||
| LOG.debug("Endpoint {} is vpc endpoint; parsing", endpoint); | ||
| return Region.of(matcher.group(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you log the group(1) value in the debug log so we can see exactly which one it is. region and endpoint bindings are the cause of many of our support calls.
| private static final String S3_SERVICE_NAME = "s3"; | ||
|
|
||
| private static final Pattern VPC_ENDPOINT_PATTERN = | ||
| Pattern.compile("^(?:.+\\.)?([a-z0-9-]+)\\.vpce\\.amazonaws\\.(?:com|com\\.cn)$"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does aws govcloud have a different pattern? or is it just .cn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Govcloud is amazonaws.com aswell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shintaroonuma could you do a quick rebase and revise of this...the faster it goes in the better it is for all.
487921f to
5580fa2
Compare
|
Thanks for the comments, rebased and updated the pr. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
sorry, forgotten about this...don't be afraid to remind me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested the new/modified tests against s3 london; all happy
|
merged. @shintaroonuma what is your jira account, so we can assign the issue to you there too. |
Thanks for merging the commit in. @steveloughran , my jira account username is shintaro |
|
thanks. credited you there. also backported with testing onto branches 3.4 and 3.4.1; the next RC of that release is coming soon! |
…ache#6466) Contributed by Shintaro Onuma
…ache#6466) Contributed by Shintaro Onuma
Description of PR
JIRA: https://issues.apache.org/jira/browse/HADOOP-18938
Perform endpoint parsing for regions when vpc endpoint is configured. Previous endpoint parsing behaviour would parse vpc endpoints incorrectly, setting the region as "vpce".
How was this patch tested?
Tested with a bucket in eu-west-1 with
mvn -Dparallel-tests -DtestsThreadCount=16 clean verifyAll tests passing other than 2 failures:
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?