-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-18246. Reduce lower limit on s3a prefetching/caching block size to 1 byte. #5120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, some suggested changes for the readme. Can you also update the documentation in https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md#general-s3a-client-configuration for the block size property.
| Although, default size of the block for prefetching the input stream is 8 MB, | ||
| minimum size allowed to set is 1 byte for a block. | ||
| User should set the block size with the understanding that smaller block sizes increases the number of blocks. | ||
| Thus, smaller block size affects the performance by increasing the overhead for reading and prefetching | ||
| each block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Although, default size of the block for prefetching the input stream is 8 MB, | |
| minimum size allowed to set is 1 byte for a block. | |
| User should set the block size with the understanding that smaller block sizes increases the number of blocks. | |
| Thus, smaller block size affects the performance by increasing the overhead for reading and prefetching | |
| each block. | |
| The default size of a block is 8MB, and the minimum allowed block size is 1 byte. | |
| Decreasing block size will increase the number of blocks to be read for a file. | |
| A smaller block size may negatively impact performance as the number of prefetches required will increase. |
|
Thanks @ahmarsuhail for the feedback. I have made the recommended changes. |
|
💔 -1 overall
This message was automatically generated. |
| Default value is 8 MB. | ||
| Lower limit for the block size is 1 byte. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have updated my comment. We discussed offline, you can either revert this change completely, or change to just the following:
| Default value is 8 MB. | |
| Lower limit for the block size is 1 byte. | |
| Decreasing this will increase the number of prefetches required, and may negatively impact performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, +1. Please update the description of the PR with your test results.
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 (non-binding), lgtm.
Yetus -1 for no new tests is to be expected.
|
Hey @steveloughran , please review this PR about reducing the lower limit on prefetching/caching block size. Reduced it to 1 byte. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. no reason for a test for this that I can see
…yte. (apache#5120) The minimum value of fs.s3a.prefetch.block.size is now 1 Contributed by Ankit Saurabh
…yte. (#5120) The minimum value of fs.s3a.prefetch.block.size is now 1 Contributed by Ankit Saurabh
Description of PR
Set the lower limit on s3a prefetching/caching block size as 1 byte. Not removed the limit as it would essentially result in infinite number of prefetching/caching request when the block size is 0 byte.
How was this patch tested?
Tested in eu-west-1 by running
mvn -Dparallel-tests clean verify. Not written a separate test as the change was trivial. Following is the result -[INFO] Results:
[INFO]
[WARNING] Tests run: 1154, Failures: 0, Errors: 0, Skipped: 182
[INFO]
[INFO]
[INFO] --- maven-failsafe-plugin:3.0.0-M1:integration-test (sequential-integration-tests) @ hadoop-aws ---
[INFO]
[INFO] Results:
[INFO]
[WARNING] Tests run: 124, Failures: 0, Errors: 0, Skipped: 84
[INFO]
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0:enforce (depcheck) @ hadoop-aws ---
[INFO]
[INFO] --- maven-failsafe-plugin:3.0.0-M1:verify (default-integration-test) @ hadoop-aws ---
[INFO]
[INFO] --- maven-failsafe-plugin:3.0.0-M1:verify (sequential-integration-tests) @ hadoop-aws ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:52 min
[INFO] Finished at: 2022-11-10T15:35:44Z
[INFO] Final Memory: 68M/1402M
[INFO] ------------------------------------------------------------------------
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?