-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-17023 Tune S3AFileSystem.listStatus() api. #2257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17023 Tune S3AFileSystem.listStatus() api. #2257
Conversation
S3AFileSystem.listStatus() to perform list operations directly and then fallback to head checks for files
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileOperationCost.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileOperationCost.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/performance/OperationCost.java
Outdated
Show resolved
Hide resolved
|
checkstyle complaining with fixable issues |
Yes I am going to fix the checkstyle. How is the rest of code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the move to remote iterator. But see the comment about moving from Listing to an interface/function for the S3Guard callbacks. Doing that will simplify the Test where you had to do the mock context, listing etc, because they aren't going to be needed any more -and we remove S3Guard having intimate knowledge of what should be the layer above it
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
Outdated
Show resolved
Hide resolved
|
LGTM. +1 -thank you for a great piece of work here! |
S3AFileSystem.listStatus() is optimized for invocations where the path supplied is a non-empty directory. The number of S3 requests is significantly reduced, saving time, money, and reducing the risk of S3 throttling. Contributed by Mukund Thakur. Change-Id: I7cc5f87aa16a4819e245e0fbd2aad226bd500f3f
|
Surprisingly large mount of merge problems surfacing here, not in the production code but in the tests. A sign of my patches all going near the same code you've worked on. I'm going to move the new inner classes in S3ATestUtils out of there and into their own classes in o.a.h.fs.s3a.test; reduces the diff to S3ATestUtils (which is often a source of spurious merge pain), so once isolated, less long term suffering. Doing that in #2310 which is a high priority change for me |
Oh Yes I refactored some methods to S3ATestUtils thinking that would be the right place for them. Isolating them to o.a.h.fs.s3a.test seems a better idea. |
|
S3AUtils and S3ATestUtils were where we stuck everything; creates lots of spurious merge pain. Isolation will ultimately improve life |
S3AFileSystem.listStatus() is optimized for invocations where the path supplied is a non-empty directory. The number of S3 requests is significantly reduced, saving time, money, and reducing the risk of S3 throttling. Contributed by Mukund Thakur.
S3AFileSystem.listStatus() is optimized for invocations where the path supplied is a non-empty directory. The number of S3 requests is significantly reduced, saving time, money, and reducing the risk of S3 throttling. Contributed by Mukund Thakur. Change-Id: Ib4daf53dc74586390b379cc93000d61839c48a2e
S3AFileSystem.listStatus() to perform list operations
directly and then fallback to head checks for files
Tested using ap-south-1 bucket in following modes along with scale tests:
All good.