Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Use the average size of the non-skewed partitions as the target size when splitting skewed partitions, instead of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD

Why are the changes needed?

The goal of skew join optimization is to make the data distribution move even. So it makes more sense the use the average size of the non-skewed partitions as the target size.

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

@JkSelf @maryannxue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR but a small fix: we shouldn't coalesce partitions if the config is off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we combine this if with the one below: if (nonSkewPartitionIndices.length == 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we combine this if with the one below: if (nonSkewPartitionIndices.length == 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when ... is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A rule to optimize skewed joins to avoid straggler tasks whose share of data are significantly larger than those of the rest of the tasks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not accurate to say "sub-join" here. How about:
The general idea is to divide each skew partition into smaller partitions and replicate its matching partition on the other side of the join so that they can run in parallel tasks. Note that when matching partitions from the left side and the right side both have skew, it will become a cartesian product of splits from left and right joining together.

And let's replace the term "sub-join" accordingly in the example below as well.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118793 has finished for PR 27669 at commit 4a64c0e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118870 has finished for PR 27669 at commit 1b38b71.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

nonSkewPartitionIndices: Seq[Int]): Seq[ShufflePartitionSpec] = {
assert(nonSkewPartitionIndices.nonEmpty)
if (nonSkewPartitionIndices.length == 1) {
val isEnabled = conf.getConf(SQLConf.REDUCE_POST_SHUFFLE_PARTITIONS_ENABLED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isCoalesceEnabled

val avgPartitionSize = mapPartitionSizes.sum / mapPartitionSizes.length
val advisoryPartitionSize = math.max(avgPartitionSize,
conf.getConf(SQLConf.ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD))
val advisoryPartitionSize = math.max(avgPartitionSize, targetSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need avgPartitionSize? we shouldn't have a problem even if targetSize smaller than average, right? coz the worse case would always be one mapper per task.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118877 has finished for PR 27669 at commit 5ef5837.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@maryannxue maryannxue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending jenkins. And you need to fix the compilation errors first, @cloud-fan :)

* target post-shuffle partition size if avg size is smaller than it.
*/
private def targetSize(stats: MapOutputStatistics, medianSize: Long): Long = {
val targetPostShuffleSize = conf.getConf(SQLConf.SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When user enable skewed join optimization and want to change the skewed condition by adjusting the targetPostShuffleSize. If we use the SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE here, it may also effect the task numbers in map stage. It is better to use the ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD config to set the targetPostShuffleSize in skewed join optimization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would user want the new partition size after split to be different from the sizes of non-skew partition size? The goal of this rule is to coordinate all partitions to be around the same size if possible...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the old approach was the new skew partition size after split can be much smaller than that of the non-skew partition size. Being small itself is not a problem, but having more splits may come with a price, esp. with both side skews, and meanwhile if non-skew partitions take longer to finish, it wouldn't be worth that price.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf do you have any real-world use cases for it? I noticed it as well but have the same feeling with @maryannxue : why would users set a different value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After coming across the config description of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD, I probably get @JkSelf 's point. In the description, it is meant to test if a partition is skewed... but the way it is actually used here in this class, it is more like the target size for splitting the skewed partitions.
So we need to changes here:

  1. bring this conf back and use it in isSkewed instead.
  2. if eventually the entire "skewed" partition is not split at all because the size is smaller than the target size, we need to avoid adding the SkewDesc for that partition.

@SparkQA
Copy link

SparkQA commented Feb 25, 2020

Test build #118901 has finished for PR 27669 at commit 4755526.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 25, 2020

Test build #118916 has finished for PR 27669 at commit 4755526.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

gatorsmile pushed a commit that referenced this pull request Feb 25, 2020
### What changes were proposed in this pull request?

Use the average size of the non-skewed partitions as the target size when splitting skewed partitions, instead of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD

### Why are the changes needed?

The goal of skew join optimization is to make the data distribution move even. So it makes more sense the use the average size of the non-skewed partitions as the target size.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #27669 from cloud-fan/aqe.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Xiao Li <[email protected]>
(cherry picked from commit 8f247e5)
Signed-off-by: Xiao Li <[email protected]>
@gatorsmile
Copy link
Member

Thanks! Merged to master/3.0

* right: [R1, R2, R3, R4]
*
* Let's say L2, L4 and R3, R4 are skewed, and each of them get split into 2 sub-partitions. This
* is scheduled to run 4 tasks at the beginning: (L1, R1), (L2, R2), (L2, R2), (L2, R2).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a mistake. Did you want to say the following?

- (L1, R1), (L2, R2), (L2, R2), (L2, R2).
+ (L1, R1), (L2, R2), (L3, R3), (L4, R4).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes! will fix it soon

@dongjoon-hyun
Copy link
Member

cc @dbtsai

dongjoon-hyun pushed a commit that referenced this pull request Feb 26, 2020
### What changes were proposed in this pull request?

This is a follow up of #27669 in order to fix a typo.

### Why are the changes needed?

N/A

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27714 from cloud-fan/typo.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit eced932)
Signed-off-by: Dongjoon Hyun <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

Use the average size of the non-skewed partitions as the target size when splitting skewed partitions, instead of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD

### Why are the changes needed?

The goal of skew join optimization is to make the data distribution move even. So it makes more sense the use the average size of the non-skewed partitions as the target size.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes apache#27669 from cloud-fan/aqe.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Xiao Li <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

This is a follow up of apache#27669 in order to fix a typo.

### Why are the changes needed?

N/A

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes apache#27714 from cloud-fan/typo.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants