Skip to content

Commit ed71a82

Browse files
committed
[SPARK-26700][CORE] enable fetch-big-block-to-disk by default
## What changes were proposed in this pull request? This is a followup of #16989 The fetch-big-block-to-disk feature is disabled by default, because it's not compatible with external shuffle service prior to Spark 2.2. The client sends stream request to fetch block chunks, and old shuffle service can't support it. After 2 years, Spark 2.2 has EOL, and now it's safe to turn on this feature by default ## How was this patch tested? existing tests Closes #23625 from cloud-fan/minor. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent bd027f6 commit ed71a82

File tree

2 files changed

+19
-19
lines changed

2 files changed

+19
-19
lines changed

core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -699,17 +699,19 @@ package object config {
699699
private[spark] val MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM =
700700
ConfigBuilder("spark.maxRemoteBlockSizeFetchToMem")
701701
.doc("Remote block will be fetched to disk when size of the block is above this threshold " +
702-
"in bytes. This is to avoid a giant request takes too much memory. We can enable this " +
703-
"config by setting a specific value(e.g. 200m). Note this configuration will affect " +
704-
"both shuffle fetch and block manager remote block fetch. For users who enabled " +
705-
"external shuffle service, this feature can only be worked when external shuffle" +
706-
"service is newer than Spark 2.2.")
702+
"in bytes. This is to avoid a giant request takes too much memory. Note this " +
703+
"configuration will affect both shuffle fetch and block manager remote block fetch. " +
704+
"For users who enabled external shuffle service, this feature can only work when " +
705+
"external shuffle service is at least 2.3.0.")
707706
.bytesConf(ByteUnit.BYTE)
708707
// fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might
709708
// as well use fetch-to-disk in that case. The message includes some metadata in addition
710709
// to the block data itself (in particular UploadBlock has a lot of metadata), so we leave
711710
// extra room.
712-
.createWithDefault(Int.MaxValue - 512)
711+
.checkValue(
712+
_ <= Int.MaxValue - 512,
713+
"maxRemoteBlockSizeFetchToMem cannot be larger than (Int.MaxValue - 512) bytes.")
714+
.createWithDefaultString("200m")
713715

714716
private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
715717
ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")

docs/configuration.md

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -626,19 +626,6 @@ Apart from these, the following properties are also available, and may be useful
626626
You can mitigate this issue by setting it to a lower value.
627627
</td>
628628
</tr>
629-
<tr>
630-
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
631-
<td>Int.MaxValue - 512</td>
632-
<td>
633-
The remote block will be fetched to disk when size of the block is above this threshold in bytes.
634-
This is to avoid a giant request that takes too much memory. By default, this is only enabled
635-
for blocks > 2GB, as those cannot be fetched directly into memory, no matter what resources are
636-
available. But it can be turned down to a much lower value (eg. 200m) to avoid using too much
637-
memory on smaller blocks as well. Note this configuration will affect both shuffle fetch
638-
and block manager remote block fetch. For users who enabled external shuffle service,
639-
this feature can only be used when external shuffle service is newer than Spark 2.2.
640-
</td>
641-
</tr>
642629
<tr>
643630
<td><code>spark.shuffle.compress</code></td>
644631
<td>true</td>
@@ -1519,6 +1506,17 @@ Apart from these, the following properties are also available, and may be useful
15191506
you can set larger value.
15201507
</td>
15211508
</tr>
1509+
<tr>
1510+
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
1511+
<td>200m</td>
1512+
<td>
1513+
Remote block will be fetched to disk when size of the block is above this threshold
1514+
in bytes. This is to avoid a giant request takes too much memory. Note this
1515+
configuration will affect both shuffle fetch and block manager remote block fetch.
1516+
For users who enabled external shuffle service, this feature can only work when
1517+
external shuffle service is at least 2.3.0.
1518+
</td>
1519+
</tr>
15221520
</table>
15231521

15241522
### Scheduling

0 commit comments

Comments
 (0)