[SPARK-30918][SQL] improve the splitting of skewed partitions #27669

cloud-fan · 2020-02-21T13:27:10Z

What changes were proposed in this pull request?

Use the average size of the non-skewed partitions as the target size when splitting skewed partitions, instead of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD

Why are the changes needed?

The goal of skew join optimization is to make the data distribution move even. So it makes more sense the use the average size of the non-skewed partitions as the target size.

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

cloud-fan · 2020-02-21T13:27:22Z

@JkSelf @maryannxue

cloud-fan · 2020-02-21T13:29:00Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

not related to this PR but a small fix: we shouldn't coalesce partitions if the config is off.

nit: can we combine this if with the one below: if (nonSkewPartitionIndices.length == 1)

maryannxue · 2020-02-21T15:51:02Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

nit: can we combine this if with the one below: if (nonSkewPartitionIndices.length == 1)

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

maryannxue · 2020-02-21T16:05:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

nit: when ... is enabled.

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

maryannxue · 2020-02-21T16:12:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

A rule to optimize skewed joins to avoid straggler tasks whose share of data are significantly larger than those of the rest of the tasks.

maryannxue · 2020-02-21T16:28:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

It's not accurate to say "sub-join" here. How about:
The general idea is to divide each skew partition into smaller partitions and replicate its matching partition on the other side of the join so that they can run in parallel tasks. Note that when matching partitions from the left side and the right side both have skew, it will become a cartesian product of splits from left and right joining together.

And let's replace the term "sub-join" accordingly in the example below as well.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

SparkQA · 2020-02-21T18:40:28Z

Test build #118793 has finished for PR 27669 at commit 4a64c0e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-24T13:57:34Z

Test build #118870 has finished for PR 27669 at commit 1b38b71.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

maryannxue · 2020-02-24T16:09:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

      nonSkewPartitionIndices: Seq[Int]): Seq[ShufflePartitionSpec] = {
    assert(nonSkewPartitionIndices.nonEmpty)
-    if (nonSkewPartitionIndices.length == 1) {
+    val isEnabled = conf.getConf(SQLConf.REDUCE_POST_SHUFFLE_PARTITIONS_ENABLED)


nit: isCoalesceEnabled

maryannxue · 2020-02-24T16:15:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

    val avgPartitionSize = mapPartitionSizes.sum / mapPartitionSizes.length
-    val advisoryPartitionSize = math.max(avgPartitionSize,
-      conf.getConf(SQLConf.ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD))
+    val advisoryPartitionSize = math.max(avgPartitionSize, targetSize)


do we still need avgPartitionSize? we shouldn't have a problem even if targetSize smaller than average, right? coz the worse case would always be one mapper per task.

SparkQA · 2020-02-24T16:40:44Z

Test build #118877 has finished for PR 27669 at commit 5ef5837.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

maryannxue

LGTM pending jenkins. And you need to fix the compilation errors first, @cloud-fan :)

JkSelf · 2020-02-25T01:01:45Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

+   * target post-shuffle partition size if avg size is smaller than it.
+   */
+  private def targetSize(stats: MapOutputStatistics, medianSize: Long): Long = {
+    val targetPostShuffleSize = conf.getConf(SQLConf.SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE)


When user enable skewed join optimization and want to change the skewed condition by adjusting the targetPostShuffleSize. If we use the SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE here, it may also effect the task numbers in map stage. It is better to use the ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD config to set the targetPostShuffleSize in skewed join optimization?

Why would user want the new partition size after split to be different from the sizes of non-skew partition size? The goal of this rule is to coordinate all partitions to be around the same size if possible...

The problem with the old approach was the new skew partition size after split can be much smaller than that of the non-skew partition size. Being small itself is not a problem, but having more splits may come with a price, esp. with both side skews, and meanwhile if non-skew partitions take longer to finish, it wouldn't be worth that price.

@JkSelf do you have any real-world use cases for it? I noticed it as well but have the same feeling with @maryannxue : why would users set a different value?

After coming across the config description of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD, I probably get @JkSelf 's point. In the description, it is meant to test if a partition is skewed... but the way it is actually used here in this class, it is more like the target size for splitting the skewed partitions.
So we need to changes here:

bring this conf back and use it in isSkewed instead.

if eventually the entire "skewed" partition is not split at all because the size is smaller than the target size, we need to avoid adding the SkewDesc for that partition.

SparkQA · 2020-02-25T08:05:02Z

Test build #118901 has finished for PR 27669 at commit 4755526.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-25T08:58:31Z

retest this please

SparkQA · 2020-02-25T14:31:27Z

Test build #118916 has finished for PR 27669 at commit 4755526.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

### What changes were proposed in this pull request? Use the average size of the non-skewed partitions as the target size when splitting skewed partitions, instead of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD ### Why are the changes needed? The goal of skew join optimization is to make the data distribution move even. So it makes more sense the use the average size of the non-skewed partitions as the target size. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes #27669 from cloud-fan/aqe. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Xiao Li <[email protected]> (cherry picked from commit 8f247e5) Signed-off-by: Xiao Li <[email protected]>

gatorsmile · 2020-02-25T22:11:13Z

Thanks! Merged to master/3.0

dongjoon-hyun · 2020-02-25T22:21:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

+ * right: [R1, R2, R3, R4]
+ *
+ * Let's say L2, L4 and R3, R4 are skewed, and each of them get split into 2 sub-partitions. This
+ * is scheduled to run 4 tasks at the beginning: (L1, R1), (L2, R2), (L2, R2), (L2, R2).


This seems to be a mistake. Did you want to say the following?

- (L1, R1), (L2, R2), (L2, R2), (L2, R2). + (L1, R1), (L2, R2), (L3, R3), (L4, R4).

ah yes! will fix it soon

dongjoon-hyun · 2020-02-25T22:25:11Z

cc @dbtsai

### What changes were proposed in this pull request? This is a follow up of #27669 in order to fix a typo. ### Why are the changes needed? N/A ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27714 from cloud-fan/typo. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit eced932) Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? Use the average size of the non-skewed partitions as the target size when splitting skewed partitions, instead of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD ### Why are the changes needed? The goal of skew join optimization is to make the data distribution move even. So it makes more sense the use the average size of the non-skewed partitions as the target size. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes apache#27669 from cloud-fan/aqe. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Xiao Li <[email protected]>

### What changes were proposed in this pull request? This is a follow up of apache#27669 in order to fix a typo. ### Why are the changes needed? N/A ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes apache#27714 from cloud-fan/typo. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

cloud-fan commented Feb 21, 2020

View reviewed changes

maryannxue reviewed Feb 21, 2020

View reviewed changes

improve the splitting of skewed partitions

1b38b71

cloud-fan force-pushed the aqe branch from 4a64c0e to 1b38b71 Compare February 24, 2020 13:44

maryannxue reviewed Feb 24, 2020

View reviewed changes

address comments

5ef5837

maryannxue approved these changes Feb 24, 2020

View reviewed changes

JkSelf reviewed Feb 25, 2020

View reviewed changes

fix tests

4755526

gatorsmile closed this in 8f247e5 Feb 25, 2020

dongjoon-hyun reviewed Feb 25, 2020

View reviewed changes

dongjoon-hyun added the SQL label Feb 25, 2020

cloud-fan mentioned this pull request Feb 26, 2020

[SPARK-30918][SQL][followup] Fix typo in OptimizeSkewedJoin #27714

Closed

[SPARK-30918][SQL] improve the splitting of skewed partitions #27669

[SPARK-30918][SQL] improve the splitting of skewed partitions #27669

Uh oh!

Conversation

cloud-fan commented Feb 21, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan commented Feb 21, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Feb 21, 2020

Uh oh!

SparkQA commented Feb 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 24, 2020

Uh oh!

maryannxue left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 25, 2020

Uh oh!

cloud-fan commented Feb 25, 2020

Uh oh!

SparkQA commented Feb 25, 2020

Uh oh!

gatorsmile commented Feb 25, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Feb 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

maryannxue left a comment •

edited

Loading