-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-42500][SQL] ConstantPropagation support more cases #40268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42500][SQL] ConstantPropagation support more cases #40268
Conversation
|
Tests look ok, removed WIP flag. cc @wangyum, @cloud-fan |
| // The keys are always canonicalized `AttributeReference`s, but it is easier to use `Expression` | ||
| // type keys here instead of casting `AttributeReference.canonicalized` to `AttributeReference` at | ||
| // the calling sites. | ||
| type EqualityPredicates = mutable.Map[Expression, (Literal, BinaryComparison)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AttributeReference is enough.
Can we use AttributeMap ? In order to avoid the use of x.canonicalized later:
type EqualityPredicates = AttributeMap[(Literal, BinaryComparison)]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I deliberately used mutable map here to improve their addition (equalityPredicates ++= equalityPredicatesRight) later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a small improvement in cabb083
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
| private def replaceConstants( | ||
| condition: Expression, | ||
| equalityPredicates: EqualityPredicates): Expression = { | ||
| val predicates = equalityPredicates.values.map(_._2).toSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the constantsMap?
val constantsMap = AttributeMap(equalityPredicates.map { case (attr, (lit, _)) => attr -> lit })There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for changing EqualityPredicates to mutable.Map earlier was to avoid building a map here.
The main point of that map is that we store only one Literal (and its original BinaryComparision) assigned to an attribute key. So if we have 2 or more conflicting EqualTo then in replaceConstants() we keep only one's original form and rewrite the other conflicing ones. E.g. a = 1 AND a = 2 we store only a -> (2, a = 2) in the map and rewrite the expression to 2 = 1 AND a = 2.
| case a: AttributeReference if equalityPredicates.contains(a.canonicalized) => | ||
| equalityPredicates(a.canonicalized)._1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case a: AttributeReference => constantsMap.getOrElse(a, a)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the current mutable.Map based EqualityPredicates so I chage it to case a: AttributeReference => equalityPredicates.get(a.canonicalized).map(_._1).getOrElse(a).
But if we decide to use AttributeMap then I can change it to the suggested.
| * resulted false | ||
| * @return A tuple including: | ||
| * 1. Option[Expression]: optional changed condition after traversal | ||
| * 1. Expression: optional changed condition after traversal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I forgot to update that. Fixed in 0244034
|
@wangyum, @cloud-fan, do you think this PR is good enough or shall we go further with improvements?
@cloud-fan, I know you had concerns: #40093 (comment) and actually I'm not sure either how much performance improvement we could expect from the above 4 in real-life usecases... |
99d6fea to
b9f3fbb
Compare
…500-constantpropagation-support-more-cases
|
@cloud-fan, @wangyum please let me know if this PR needs further improvements. |
| } else { | ||
| f | ||
| } | ||
| f.mapExpressions(traverse(_, nullIsFalse = true, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is our internal change:
object ConstantPropagation extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan.transformUpWithPruning(
_.containsAllPatterns(LITERAL, FILTER), ruleId) {
case f: Filter =>
val (newCondition, _) = traverse(f.condition, replaceChildren = true, nullIsFalse = true)
if (newCondition.isDefined) {
f.copy(condition = newCondition.get)
} else {
f
}
}
/**
* Traverse a condition as a tree and replace attributes with constant values.
* - On matching [[And]], recursively traverse each children and get propagated mappings.
* If the current node is not child of another [[And]], replace all occurrences of the
* attributes with the corresponding constant values.
* - If a child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], propagate the mapping
* of attribute => constant.
* - On matching [[Or]] or [[Not]], recursively traverse each children, propagate empty mapping.
* - Otherwise, stop traversal and propagate empty mapping.
* @param condition condition to be traversed
* @param replaceChildren whether to replace attributes with constant values in children
* @param nullIsFalse whether a boolean expression result can be considered to false e.g. in the
* case of `WHERE e`, null result of expression `e` means the same as if it
* resulted false
* @return A tuple including:
* 1. Option[Expression]: optional changed condition after traversal
* 2. AttributeMap: propagated mapping of attribute => constant
*/
private def traverse(condition: Expression, replaceChildren: Boolean, nullIsFalse: Boolean)
: (Option[Expression], AttributeMap[(Literal, BinaryComparison)]) =
condition match {
case e @ EqualTo(left: AttributeReference, right: Literal)
if safeToReplace(left, nullIsFalse) =>
(None, AttributeMap(Map(left -> (right, e))))
case e @ EqualTo(left: Literal, right: AttributeReference)
if safeToReplace(right, nullIsFalse) =>
(None, AttributeMap(Map(right -> (left, e))))
case e @ EqualNullSafe(left: AttributeReference, right: Literal)
if safeToReplace(left, nullIsFalse) =>
(None, AttributeMap(Map(left -> (right, e))))
case e @ EqualNullSafe(left: Literal, right: AttributeReference)
if safeToReplace(right, nullIsFalse) =>
(None, AttributeMap(Map(right -> (left, e))))
case a: And =>
val (newLeft, equalityPredicatesLeft) =
traverse(a.left, replaceChildren = false, nullIsFalse)
val (newRight, equalityPredicatesRight) =
traverse(a.right, replaceChildren = false, nullIsFalse)
val equalityPredicates = equalityPredicatesLeft ++ equalityPredicatesRight
val newSelf = if (equalityPredicates.nonEmpty && replaceChildren) {
Some(And(replaceConstants(newLeft.getOrElse(a.left), equalityPredicates),
replaceConstants(newRight.getOrElse(a.right), equalityPredicates)))
} else {
if (newLeft.isDefined || newRight.isDefined) {
Some(And(newLeft.getOrElse(a.left), newRight.getOrElse(a.right)))
} else {
None
}
}
(newSelf, equalityPredicates)
case o: Or =>
// Ignore the EqualityPredicates from children since they are only propagated through And.
val (newLeft, _) = traverse(o.left, replaceChildren = true, nullIsFalse)
val (newRight, _) = traverse(o.right, replaceChildren = true, nullIsFalse)
val newSelf = if (newLeft.isDefined || newRight.isDefined) {
Some(Or(left = newLeft.getOrElse(o.left), right = newRight.getOrElse((o.right))))
} else {
None
}
(newSelf, AttributeMap.empty)
case n: Not =>
// Ignore the EqualityPredicates from children since they are only propagated through And.
val (newChild, _) = traverse(n.child, replaceChildren = true, nullIsFalse = false)
(newChild.map(Not), AttributeMap.empty)
case _ => (None, AttributeMap.empty)
}
// We need to take into account if an attribute is nullable and the context of the conjunctive
// expression. E.g. `SELECT * FROM t WHERE NOT(c = 1 AND c + 1 = 1)` where attribute `c` can be
// substituted into `1 + 1 = 1` if 'c' isn't nullable. If 'c' is nullable then the enclosing
// NOT prevents us to do the substitution as NOT flips the context (`nullIsFalse`) of what a
// null result of the enclosed expression means.
private def safeToReplace(ar: AttributeReference, nullIsFalse: Boolean) =
!ar.nullable || nullIsFalse
private def replaceConstants(
condition: Expression,
equalityPredicates: AttributeMap[(Literal, BinaryComparison)]): Expression = {
val constantsMap = AttributeMap(equalityPredicates.map { case (attr, (lit, _)) => attr -> lit })
val predicates = equalityPredicates.values.map(_._2).toSet
condition transform {
case b: BinaryComparison if !predicates.contains(b) => b transform {
case a: AttributeReference => constantsMap.getOrElse(a, a)
}
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very similar to a previous state of this PR but your version uses an immutable AttributeMap[(Literal, BinaryComparison)] instead of a mutable Map[Expression, (Literal, BinaryComparison)] in my PR. The addition of maps val equalityPredicates = equalityPredicatesLeft ++ equalityPredicatesRight inside handling Ands is slower when we use immutable maps. That's why I proposed to use a mutable one.
But since that version, I changed my PR to eliminate map addition at all and I use the mutable map as an accumulator from this commit: b9f3fbb. So the current version of this PR is even more optimized.
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR improves
ConstantPropagationrule, the new implementation:e.g.
a = 1 AND a = 2=>a = 1 AND 1 = 2Before this PR this unsatisfiable filter was not reduced:
EqualToandEqualNullSafeBinaryComparisonexpressions:e.g.
a = 5 AND b < a + 3=>a = 5 AND b < 8Optionwrapper from the return type ofConstantPropagation.traverse()as modern.mapExpressions(),.withNewChildren()and.mapChildren()methods can recognize if there was no change in arguments.Why are the changes needed?
Improve performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UTs are taken from @wangyum's PR #40093