Skip to content

Commit c6f4e70

Browse files
sryzapwendell
authored andcommitted
SPARK-4230. Doc for spark.default.parallelism is incorrect
Author: Sandy Ryza <[email protected]> Closes #3107 from sryza/sandy-spark-4230 and squashes the following commits: 37a1d19 [Sandy Ryza] Clear up a couple things 34d53de [Sandy Ryza] SPARK-4230. Doc for spark.default.parallelism is incorrect
1 parent c5db8e2 commit c6f4e70

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

docs/configuration.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -562,15 +562,18 @@ Apart from these, the following properties are also available, and may be useful
562562
<tr>
563563
<td><code>spark.default.parallelism</code></td>
564564
<td>
565+
For distributed shuffle operations like <code>reduceByKey</code> and <code>join</code>, the
566+
largest number of partitions in a parent RDD. For operations like <code>parallelize</code>
567+
with no parent RDDs, it depends on the cluster manager:
565568
<ul>
566569
<li>Local mode: number of cores on the local machine</li>
567570
<li>Mesos fine grained mode: 8</li>
568571
<li>Others: total number of cores on all executor nodes or 2, whichever is larger</li>
569572
</ul>
570573
</td>
571574
<td>
572-
Default number of tasks to use across the cluster for distributed shuffle operations
573-
(<code>groupByKey</code>, <code>reduceByKey</code>, etc) when not set by user.
575+
Default number of partitions in RDDs returned by transformations like <code>join</code>,
576+
<code>reduceByKey</code>, and <code>parallelize</code> when not set by user.
574577
</td>
575578
</tr>
576579
<tr>

0 commit comments

Comments
 (0)