Skip to content

Conversation

@mukund-thakur
Copy link
Contributor

S3AFileSystem.listStatus() to perform list operations
directly and then fallback to head checks for files

Tested using ap-south-1 bucket in following modes along with scale tests:

  1. Raw s3.
  2. Auth S3guard.
  3. Non auth s3guard.
    All good.

S3AFileSystem.listStatus() to perform list operations
directly and then fallback to head checks for files
@apache apache deleted a comment from hadoop-yetus Sep 8, 2020
@steveloughran
Copy link
Contributor

checkstyle complaining with fixable issues

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java:370:    Listing.FileStatusListingIterator filesItr = createFileStatusListingIterator(: Line is longer than 80 characters (found 81). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:34:import java.util.Arrays;:8: Unused import - java.util.Arrays. [UnusedImports]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:2672:  private RemoteIterator<S3AFileStatus> innerListStatus(Path f) throws FileNotFoundException,: Line is longer than 80 characters (found 93). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:2686:    }:5: '}' at column 5 should be on the same line as the next part of a multi-block statement (one that directly contains multiple blocks: if/else-if/else, do/while or try/catch/finally). [RightCurly]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java:45:import org.apache.hadoop.fs.s3a.impl.TestPartialDeleteFailures;:8: Unused import - org.apache.hadoop.fs.s3a.impl.TestPartialDeleteFailures. [UnusedImports]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestS3Guard.java:138:    S3AFileStatus[] result = S3AUtils.iteratorToStatuses(resultItr, new HashSet<>());: Line is longer than 80 characters (found 85). [LineLength]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestS3Guard.java:205:    S3AFileStatus[] result = S3AUtils.iteratorToStatuses(resultItr, new HashSet<>());: Line is longer than 80 characters (found 85). [LineLength]

@apache apache deleted a comment from hadoop-yetus Sep 8, 2020
@mukund-thakur
Copy link
Contributor Author

checkstyle complaining with fixable issues

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java:370:    Listing.FileStatusListingIterator filesItr = createFileStatusListingIterator(: Line is longer than 80 characters (found 81). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:34:import java.util.Arrays;:8: Unused import - java.util.Arrays. [UnusedImports]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:2672:  private RemoteIterator<S3AFileStatus> innerListStatus(Path f) throws FileNotFoundException,: Line is longer than 80 characters (found 93). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:2686:    }:5: '}' at column 5 should be on the same line as the next part of a multi-block statement (one that directly contains multiple blocks: if/else-if/else, do/while or try/catch/finally). [RightCurly]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java:45:import org.apache.hadoop.fs.s3a.impl.TestPartialDeleteFailures;:8: Unused import - org.apache.hadoop.fs.s3a.impl.TestPartialDeleteFailures. [UnusedImports]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestS3Guard.java:138:    S3AFileStatus[] result = S3AUtils.iteratorToStatuses(resultItr, new HashSet<>());: Line is longer than 80 characters (found 85). [LineLength]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestS3Guard.java:205:    S3AFileStatus[] result = S3AUtils.iteratorToStatuses(resultItr, new HashSet<>());: Line is longer than 80 characters (found 85). [LineLength]

Yes I am going to fix the checkstyle. How is the rest of code?

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the move to remote iterator. But see the comment about moving from Listing to an interface/function for the S3Guard callbacks. Doing that will simplify the Test where you had to do the mock context, listing etc, because they aren't going to be needed any more -and we remove S3Guard having intimate knowledge of what should be the layer above it

@apache apache deleted a comment from hadoop-yetus Sep 18, 2020
@apache apache deleted a comment from hadoop-yetus Sep 21, 2020
@steveloughran
Copy link
Contributor

LGTM. +1

-thank you for a great piece of work here!

@steveloughran steveloughran merged commit 83c7c2b into apache:trunk Sep 21, 2020
asfgit pushed a commit that referenced this pull request Sep 21, 2020
S3AFileSystem.listStatus() is optimized for invocations
where the path supplied is a non-empty directory.
The number of S3 requests is significantly reduced, saving
time, money, and reducing the risk of S3 throttling.

Contributed by Mukund Thakur.

Change-Id: I7cc5f87aa16a4819e245e0fbd2aad226bd500f3f
@steveloughran
Copy link
Contributor

Surprisingly large mount of merge problems surfacing here, not in the production code but in the tests. A sign of my patches all going near the same code you've worked on.

I'm going to move the new inner classes in S3ATestUtils out of there and into their own classes in o.a.h.fs.s3a.test; reduces the diff to S3ATestUtils (which is often a source of spurious merge pain), so once isolated, less long term suffering.

Doing that in #2310 which is a high priority change for me

@apache apache deleted a comment from hadoop-yetus Sep 22, 2020
@mukund-thakur
Copy link
Contributor Author

Surprisingly large mount of merge problems surfacing here, not in the production code but in the tests. A sign of my patches all going near the same code you've worked on.

I'm going to move the new inner classes in S3ATestUtils out of there and into their own classes in o.a.h.fs.s3a.test; reduces the diff to S3ATestUtils (which is often a source of spurious merge pain), so once isolated, less long term suffering.

Doing that in #2310 which is a high priority change for me

Oh Yes I refactored some methods to S3ATestUtils thinking that would be the right place for them. Isolating them to o.a.h.fs.s3a.test seems a better idea.
Sorry for the inconvenience though.

@steveloughran
Copy link
Contributor

S3AUtils and S3ATestUtils were where we stuck everything; creates lots of spurious merge pain. Isolation will ultimately improve life

bilaharith pushed a commit to bilaharith/hadoop that referenced this pull request Sep 27, 2020
S3AFileSystem.listStatus() is optimized for invocations
where the path supplied is a non-empty directory.
The number of S3 requests is significantly reduced, saving
time, money, and reducing the risk of S3 throttling.

Contributed by Mukund Thakur.
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
S3AFileSystem.listStatus() is optimized for invocations
where the path supplied is a non-empty directory.
The number of S3 requests is significantly reduced, saving
time, money, and reducing the risk of S3 throttling.

Contributed by Mukund Thakur.

Change-Id: Ib4daf53dc74586390b379cc93000d61839c48a2e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants