-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12594] [SQL] Outer Join Elimination by Filter Conditions #10567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12594] [SQL] Outer Join Elimination by Filter Conditions #10567
Conversation
|
Test build #48633 has finished for PR 10567 at commit
|
|
Test build #48634 has finished for PR 10567 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document the arguments here?
@sameeragarwal it would be great if we could replace this function with your more general null propagation information
/cc @yhuai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile now that we propagate IsNotNull constraints in the logical plan, you should be able to eliminate outer joins by simply looking into the constraints of the parent filter operator. I believe something along the lines this should work:
object OuterJoinElimination extends Rule[LogicalPlan] with PredicateHelper {
private def buildNewJoin(filter: Filter, join: Join): Join = {
val leftHasNonNullPredicate = filter.constraints.filter(_.isInstanceOf[IsNotNull])
.exists(expr => join.left.outputSet.intersect(expr.references).nonEmpty)
val rightHasNonNullPredicate = filter.constraints.filter(_.isInstanceOf[IsNotNull])
.exists(expr => join.right.outputSet.intersect(expr.references).nonEmpty)
join.joinType match {
case RightOuter if leftHasNonNullPredicate =>
Join(join.left, join.right, Inner, join.condition)
case LeftOuter if rightHasNonNullPredicate =>
Join(join.left, join.right, Inner, join.condition)
case FullOuter if leftHasNonNullPredicate && rightHasNonNullPredicate =>
Join(join.left, join.right, Inner, join.condition)
case FullOuter if leftHasNonNullPredicate =>
Join(join.left, join.right, LeftOuter, join.condition)
case FullOuter if rightHasNonNullPredicate =>
Join(join.left, join.right, RightOuter, join.condition)
case _ =>
join
}
}
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case f @ Filter(condition, j@ Join(_, _, RightOuter | LeftOuter | FullOuter, _)) =>
Filter(condition, buildNewJoin(f, j))
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! Will do the changes.
|
Test build #51088 has finished for PR 10567 at commit
|
|
Test build #51098 has started for PR 10567 at commit |
|
jenkins, test this please |
|
Test build #51115 has finished for PR 10567 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can support all the expressions that will return null or false, if the inputs are null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your changes, we can support all the expressions whose attributes are all from left or right, no matter how complicated they are!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With @sameeragarwal changes, we can support the expressions containing both left and right attributes but the types are limited to EqualTo, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, and EqualNullSafe.
We need both!
|
Test build #51499 has finished for PR 10567 at commit
|
11b3214 to
6977fdf
Compare
|
Test build #51520 has finished for PR 10567 at commit
|
| } | ||
|
|
||
| def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
| case f @ Filter(condition, j @ Join(_, _, RightOuter | LeftOuter | FullOuter, _)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return the original f if it's not changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! will do.
|
Test build #51570 has finished for PR 10567 at commit
|
| } | ||
|
|
||
| def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
| case f @ Filter(condition, j @ Join(_, _, RightOuter | LeftOuter | FullOuter, _)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to return JoinType from buildNewJoin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
|
LGTM, pending tests. |
|
Test build #51587 has finished for PR 10567 at commit
|
|
Merging this into master, thanks! |
Conversion of outer joins, if the predicates in filter conditions can restrict the result sets so that all null-supplying rows are eliminated.
full outer->innerif both sides have such predicatesleft outer->innerif the right side has such predicatesright outer->innerif the left side has such predicatesfull outer->left outerif only the left side has such predicatesfull outer->right outerif only the right side has such predicatesIf applicable, this can greatly improve the performance, since outer join is much slower than inner join, full outer join is much slower than left/right outer join.
The original PR is #10542