-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-33472][SQL] Adjust RemoveRedundantSorts rule order #30373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131085 has finished for PR 30373 at commit
|
| RemoveRedundantProjects, | ||
| RemoveRedundantSorts, | ||
| EnsureRequirements, | ||
| RemoveRedundantSorts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave some comments here about why we need to put this rule after EnsureRequirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @maropu 's comment.
| } | ||
| } | ||
|
|
||
| test("shuffled join with different left and right side partition numbers") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you add the prefix: SPARK-33183:
| (0 to 100).toDF("key").createOrReplaceTempView("t2") | ||
|
|
||
| // left side partitioning: RangePartitioning(key ASC, 2) | ||
| // right side partitioning: UnknownPartitioning(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add assert to check if the query below has the output partitions above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@allisonwang-db . Please split this PR.
- Please create a new JIRA for
RemoveRedundantSorts. This looks worth. - Version update PR can be a follow-up.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
cc @cloud-fan |
|
Test build #131240 has finished for PR 30373 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #131256 has finished for PR 30373 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131282 has finished for PR 30373 at commit
|
fa6050a to
4e684df
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131319 has finished for PR 30373 at commit
|
|
thanks, merging to master! |
|
@allisonwang-db can you create backport PRs for 2.4 and 3.0? |
This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection. `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception: ``` IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions. ``` No Unit test Closes apache#30373 from allisonwang-db/sort-follow-up. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a03c540) Signed-off-by: allisonwang-db <[email protected]>
This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection. `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception: ``` IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions. ``` No Unit test Closes apache#30373 from allisonwang-db/sort-follow-up. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a03c540) Signed-off-by: allisonwang-db <[email protected]>
|
Thank you, @allisonwang-db , @maropu , @cloud-fan ! |
Backport #30373 for branch-3.0. ### What changes were proposed in this pull request? This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection. ### Why are the changes needed? `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception: ``` IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions. ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes #30438 from allisonwang-db/spark-33472-3.0. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Backport #30373 for branch-2.4. ### What changes were proposed in this pull request? This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection. ### Why are the changes needed? `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception: ``` IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions. ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes #30437 from allisonwang-db/spark-33472-2.4. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
| RemoveRedundantProjects, | ||
| RemoveRedundantSorts, | ||
| EnsureRequirements, | ||
| // `RemoveRedundantSorts` needs to be added before `EnsureRequirements` to guarantee the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before -> after ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed it. @allisonwang-db could you fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching it! Will create a fix.
### What changes were proposed in this pull request? This PR is a follow-up for #30373 that updates the comment for RemoveRedundantSorts in QueryExecution. ### Why are the changes needed? To update an incorrect comment. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #30584 from allisonwang-db/spark-33472-followup. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR is a follow-up for #30373 that updates the comment for RemoveRedundantSorts in QueryExecution. ### Why are the changes needed? To update an incorrect comment. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #30584 from allisonwang-db/spark-33472-followup. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 960d6af) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR switched the order for the rule
RemoveRedundantSortsandEnsureRequirementsso thatEnsureRequirementswill be invoked beforeRemoveRedundantSortsto avoid IllegalArgumentException when instantiating PartitioningCollection.Why are the changes needed?
RemoveRedundantSortsrule uses SparkPlan'soutputPartitioningto check whether a sort node is redundant. Currently, it is added beforeEnsureRequirements. SincePartitioningCollectionrequires left and right partitioning to have the same number of partitions, which is not necessarily true before applyingEnsureRequirements, the rule can fail with the following exception:Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test