-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-17105: S3AFS - Do not attempt to resolve symlinks in globStatus #2113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17105: S3AFS - Do not attempt to resolve symlinks in globStatus #2113
Conversation
|
Looks like the javadoc failures would be fixed by -> #2094 |
S3AFS does not support symlinks, so attempting to resolve symlinks in globStatus causes wasted S3 calls and worse performance. Removing it will speed up some calls to globStatus. JIRA link: https://issues.apache.org/jira/browse/HADOOP-17105
7940ede to
a9ec888
Compare
|
Revision 2: Fixed checkstyle warnings on line length |
|
💔 -1 overall
This message was automatically generated. |
|
patch LGTM. Which endpoint (e.g us-west-2) and what build CLI options did you use? we don't need that much detail, though if tests are failing that's good to call out so you can get some assistance debugging. e.g |
|
oh, and +1 pending that s3 endpoint declaration. |
| } | ||
|
|
||
| @Test | ||
| public void testCostOfGlobStatusNoSymlinkResolution() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both these tests look almost the same in terms operation counts and also the symlink resolution is always disabled.
Can you please tell me why do we need two tests here? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these tests two different things, a directory with multiple objects and a directory with one object. The directory with a single object is the special case that triggers attempted symlink resolution, so I wanted to carve out that special case in its own test.
The multiple-objects-in-a-directory test is a general test that it felt like globStatus should have, whereas the second one was specifically made to catch the regression.
If you don't think that the multiple objects test is justified, I can remove it. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it it.
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java
Line 292 in f77bbc2
| if (children.length == 1) { |
Thanks.
I ran with the following test settings, no tests failed by the way:
|
|
+1 -merged to trunk. nice catch |
#2113) Contributed by Jimmy Zuber. Change-Id: I2f247c2d2ab4f38214073e55f5cfbaa15aeaeb11
apache#2113) Contributed by Jimmy Zuber. Change-Id: Iad87ee5f9dd0e88af96088f592f22fde8768beb5
S3AFS does not support symlinks, so attempting to resolve
symlinks in globStatus causes wasted S3 calls and worse
performance. Removing it will speed up some calls to
globStatus.
JIRA link: https://issues.apache.org/jira/browse/HADOOP-17105