-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19309][SQL] disable common subexpression elimination for conditional expressions #16659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| outputExternalType, | ||
| bufferDeserializer :: Nil) | ||
|
|
||
| val serializeExprs = outputSerializer.map(_.transform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: outputSerializeExprs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lazy val?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's always used, no need to make it lazy val.
| * @param result The expression that contains [[BoundReference]] and produces the final output. | ||
| * @param children The expressions that used as input values for [[BoundReference]]. | ||
| */ | ||
| case class ReferenceToExpressions(result: Expression, children: Seq[Expression]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
| // e.g. `CaseWhen`, we should support them. | ||
| // 2. conditional expressions: common subexpressions will always be evaluated at the | ||
| // beginning, so we should not recurse into condition expressions, | ||
| // whole children may not get evaluated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rephrase it? whole children may not get evaluated looks not easy to understand.
| */ | ||
| def addExprTree( | ||
| root: Expression, | ||
| ignoreLeaf: Boolean = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code change, I don't see any place other than tests using ignoreLeaf = false. Curious why we have it.
|
This looks good to me. Just few comments. |
|
Test build #71720 has finished for PR 16659 at commit
|
|
|
||
| override lazy val initialValues: Seq[Expression] = { | ||
| val zero = Literal.fromObject(aggregator.zero, bufferExternalType) | ||
| bufferSerializer.map(ReferenceToExpressions(_, zero :: Nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bufferSerializer now replaced with bufferDeserializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, typo...
| bufferDeserializer :: inputDeserializer.get :: Nil) | ||
|
|
||
| bufferSerializer.map(ReferenceToExpressions(_, reduced :: Nil)) | ||
| deserializeToBuffer(reduced) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| leftBuffer :: rightBuffer :: Nil) | ||
|
|
||
| bufferSerializer.map(ReferenceToExpressions(_, merged :: Nil)) | ||
| deserializeToBuffer(merged) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
Test build #71752 has finished for PR 16659 at commit
|
|
The child expression in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this's cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found that not all the children of AtLeastNNonNulls get accessed during evaluation too. Do we need to add it here too?
|
LGTM |
|
I reran the |
|
Test build #71759 has finished for PR 16659 at commit
|
|
Test build #71763 has finished for PR 16659 at commit
|
| // when it is code generated. This decision should be a cost based one. | ||
| // | ||
| // The cost of doing subexpression elimination is: | ||
| // 1. Extra function call, although this is probably *good* as the JIT can decide to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we removed 2. and 3.. We do not need 1., right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we do have an extra function call to evaluate common subexpression at the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: ) Just removed 1.. Not the whole sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should still keep it, to make the indent consistent between the "cost" part and the "benefit" part. It also makes it more obvious that we only have one "cost".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine to keep it.
| equivalence.addExprTree(price, false) | ||
| equivalence.addExprTree(discount, false) | ||
| // quantity, price, discount and (price * (1 - discount)) | ||
| assert(equivalence.getAllEquivalentExprs.count(_.size > 1) == 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To other reviewers: the new addExprTree always ignores the leaf nodes. Thus, these test cases are not needed.
| // expression. We should only recurse into the predicate expression. | ||
| // 3. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain | ||
| // condition. We should only recurse into the first condition expression as it | ||
| // will always get accessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CaseWhen could be very deep.
CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END
Whenexpr1= true, returnsexpr2; whenexpr3= true, returnexpr4; else returnexpr5.
Compared with the previous impl, will we miss some expression elimination chances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, CaseWhen implements CodegenFallback. Thus, the previous impl skips it.
| def childrenToRecurse: Seq[Expression] = expr match { | ||
| case _: CodegenFallback => Nil | ||
| case i: If => i.predicate :: Nil | ||
| case c: CaseWhenBase => c.children.head :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is not reachable, could we leave a comment above this?
| // condition. We should only recurse into the first condition expression as it | ||
| // will always get accessed. | ||
| // 4. Coalesce: it's also a conditional expression, we should only recurse into the first | ||
| // children, because others may not get accessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although Coalesce might miss some expression elimination chances, I think it is very rare when users use the same expressions in Coalesce.
Could you update the comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coalesce may be just a small part of the whole expression tree, and the children of Coalesce may be same with other expressions inside the expression tree.
|
LGTM except a few comments. |
|
LGTM pending test. |
|
Test build #71818 has finished for PR 16659 at commit
|
|
thanks, merging to master! |
…tional expressions ## What changes were proposed in this pull request? As I pointed out in apache#15807 (comment) , the current subexpression elimination framework has a problem, it always evaluates all common subexpressions at the beginning, even they are inside conditional expressions and may not be accessed. Ideally we should implement it like scala lazy val, so we only evaluate it when it gets accessed at lease once. apache#15837 tries this approach, but it seems too complicated and may introduce performance regression. This PR simply stops common subexpression elimination for conditional expressions, with some cleanup. ## How was this patch tested? regression test Author: Wenchen Fan <[email protected]> Closes apache#16659 from cloud-fan/codegen.
…tional expressions ## What changes were proposed in this pull request? As I pointed out in apache#15807 (comment) , the current subexpression elimination framework has a problem, it always evaluates all common subexpressions at the beginning, even they are inside conditional expressions and may not be accessed. Ideally we should implement it like scala lazy val, so we only evaluate it when it gets accessed at lease once. apache#15837 tries this approach, but it seems too complicated and may introduce performance regression. This PR simply stops common subexpression elimination for conditional expressions, with some cleanup. ## How was this patch tested? regression test Author: Wenchen Fan <[email protected]> Closes apache#16659 from cloud-fan/codegen.
What changes were proposed in this pull request?
As I pointed out in #15807 (comment) , the current subexpression elimination framework has a problem, it always evaluates all common subexpressions at the beginning, even they are inside conditional expressions and may not be accessed.
Ideally we should implement it like scala lazy val, so we only evaluate it when it gets accessed at lease once. #15837 tries this approach, but it seems too complicated and may introduce performance regression.
This PR simply stops common subexpression elimination for conditional expressions, with some cleanup.
How was this patch tested?
regression test