-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26700][CORE] enable fetch-big-block-to-disk by default #23625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #101577 has finished for PR 23625 at commit
|
| // to the block data itself (in particular UploadBlock has a lot of metadata), so we leave | ||
| // extra room. | ||
| .createWithDefault(Int.MaxValue - 512) | ||
| .checkValue(_ <= Int.MaxValue - 512, "maxRemoteBlockSizeFetchToMem must be less than 2GB.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone specifies '2g' this will fail right? which might be surprising given the message. What about reusing that lower limit in the message?
| "in bytes. This is to avoid a giant request takes too much memory. Note this " + | ||
| "configuration will affect both shuffle fetch and block manager remote block fetch. " + | ||
| "For users who enabled external shuffle service, this feature can only work when " + | ||
| "external shuffle service is newer than Spark 2.2.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newer than 2.2 -> at least 2.3.0?
|
Test build #101614 has finished for PR 23625 at commit
|
|
retest this please |
|
Test build #101627 has finished for PR 23625 at commit
|
| </tr> | ||
| <tr> | ||
| <td><code>spark.maxRemoteBlockSizeFetchToMem</code></td> | ||
| <td>Int.MaxValue - 512</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to clarify, you intentionally moved this from shuffle section to network section since it affects both the shuffle fetch and block manager fetches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea
| .createWithDefault(Int.MaxValue - 512) | ||
| .checkValue( | ||
| _ <= Int.MaxValue - 512, | ||
| "maxRemoteBlockSizeFetchToMem must be less than (Int.MaxValue - 512) bytes.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. less than or equal to?
|
Test build #101742 has finished for PR 23625 at commit
|
|
retest this please |
|
"fetch-big-block-to-memory" maybe "fetch-big-block-to-disk" in title? |
|
Test build #101750 has finished for PR 23625 at commit
|
|
thanks, merging to master! |
## What changes were proposed in this pull request? This is a followup of apache#16989 The fetch-big-block-to-disk feature is disabled by default, because it's not compatible with external shuffle service prior to Spark 2.2. The client sends stream request to fetch block chunks, and old shuffle service can't support it. After 2 years, Spark 2.2 has EOL, and now it's safe to turn on this feature by default ## How was this patch tested? existing tests Closes apache#23625 from cloud-fan/minor. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This is a followup of #16989
The fetch-big-block-to-disk feature is disabled by default, because it's not compatible with external shuffle service prior to Spark 2.2. The client sends stream request to fetch block chunks, and old shuffle service can't support it.
After 2 years, Spark 2.2 has EOL, and now it's safe to turn on this feature by default
How was this patch tested?
existing tests