-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17271] [SQL] Planner adds un-necessary Sort even if child ordering is semantically same as required ordering #14841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ring is semantically same as required ordering
|
ok to test |
|
Test build #64512 has finished for PR 14841 at commit
|
|
cc'ing @rxin and @hvanhovell for review |
|
LGTM - merging to master. Thanks! |
|
|
||
| def isAscending: Boolean = direction == Ascending | ||
|
|
||
| def semanticEquals(other: SortOrder): Boolean = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expression has a default implementation of semanticEquals, doesn't it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan : If you look at the old version of EnsureRequirements below at L253, it compared raw SortOrder objects which will use equals() generated for it. In scala, equals() for case classes is merely doing equals() over all its fields so that lead to Expression's equals() being used instead of its semanticEquals().
My fix here was to introduce a semanticEquals in SortOrder which compares the underlying Expression semantically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I understand in EnsureRequirements we should use semanticEquals instead of == to compare SortOrder, but why we need to implement samanticEquals again in SortOrder? What's wrong with the default implementation?
I mean, there is no need to "introduce" a semanticEquals in SortOrder, it already has, because SortOrder is also Expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan : I see what you were trying to say before. I tried that and it worked. I have created a PR to clean it up : #14910 Thanks for pointing this out !!
## What changes were proposed in this pull request? Removing `semanticEquals()` from `SortOrder` because it can use the `semanticEquals()` provided by its parent class (`Expression`). This was as per suggestion by cloud-fan at https://github.com/apache/spark/pull/14841/files/7192418b3a26a14642fc04fc92bf496a954ffa5d#r77106801 ## How was this patch tested? Ran the test added in apache#14841 Author: Tejas Patil <[email protected]> Closes apache#14910 from tejasapatil/SPARK-17271_remove_semantic_ordering.
|
Should this go into 2.0? |
|
yea, I think so, it's pretty safe. @tejasapatil do you mind sending a new PR against branch 2.0 to backport this PR and the follow-up one? thanks! |
## What changes were proposed in this pull request? Ports #14841 and #14910 from `master` to `branch-2.0` Jira : https://issues.apache.org/jira/browse/SPARK-17271 Planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 `SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects. eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` Expression in required SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId, qualifier = Some("a") ) ``` Expression in child SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId) ``` Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order. This PR includes following changes: - Added a `semanticEquals` method to `SortOrder` so that it can compare underlying child expressions semantically (and not using default Object.equals) - Fixed `EnsureRequirements` to use semantic comparison of SortOrder ## How was this patch tested? - Added a test case to `PlannerSuite`. Ran rest tests in `PlannerSuite` Author: Tejas Patil <[email protected]> Closes #14920 from tejasapatil/SPARK-17271_2.0_port.
What changes were proposed in this pull request?
Jira : https://issues.apache.org/jira/browse/SPARK-17271
Planner is adding un-needed SORT operation due to bug in the way comparison for
SortOrderis done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253SortOrderneeds to be compared semantically becauseExpressionwithin twoSortOrdercan be "semantically equal" but not literally equal objects.eg. In case of
sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")Expression in required SortOrder:
Expression in child SortOrder:
Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.
This PR includes following changes:
semanticEqualsmethod toSortOrderso that it can compare underlying child expressions semantically (and not using default Object.equals)EnsureRequirementsto use semantic comparison of SortOrderHow was this patch tested?
PlannerSuite. Ran rest tests inPlannerSuite