-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15647] [SQL] Fix Boundary Cases in OptimizeCodegen Rule #13392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dongjoon-hyun FYI, this PR is just to fix the boundary cases. I knew this issue was not introduced in your PR: #12353. Thanks! |
|
cc @cloud-fan @rxin Could you verify if my understanding is right? Thanks! |
|
Thank you for making me up-to-date, @gatorsmile ! By the way, there is one correction. My PR is about parameterizing the following previous code. :) |
|
Test build #59585 has finished for PR 13392 at commit
|
|
@dongjoon-hyun Yeah. As pointed out above, I knew it was not introduced by your PR. Thanks! |
| def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { | ||
| case e @ CaseWhen(branches, _) if branches.size < conf.maxCaseBranchesForCodegen => | ||
| case e @ CaseWhen(branches, elseBranch) | ||
| if branches.size + elseBranch.size <= conf.maxCaseBranchesForCodegen => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the case takes a while and and I think it'd greatly benefit from introducing a local def - a predicate - for the condition (I can't figure out a name for this, sorry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not good at naming. How about canCodegen? : )
|
Sorry, pushed to a wrong branch. : ) |
|
Test build #59623 has finished for PR 13392 at commit
|
|
Test build #59626 has finished for PR 13392 at commit
|
| def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { | ||
| case e @ CaseWhen(branches, _) if branches.size < conf.maxCaseBranchesForCodegen => | ||
| case e: CaseWhen if canCodeGen(e) => | ||
| e.toCodegen() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can fit in the previous line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do. Thanks!
| withTable("tab1") { | ||
| spark | ||
| .range(10) | ||
| .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need a? or just spark.range(10).write.saveAsTable, then we can use id in the case when
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do it. Thanks!
|
LGTM except some style comments |
|
Test build #59634 has finished for PR 13392 at commit
|
|
Test build #59636 has finished for PR 13392 at commit
|
| def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { | ||
| case e @ CaseWhen(branches, _) if branches.size < conf.maxCaseBranchesForCodegen => | ||
| e.toCodegen() | ||
| case e: CaseWhen if canCodeGen(e) => e.toCodegen() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nitpicking, but could you use canCodegen instead (to follow the name of the method to call)? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me fix it. Thanks!
|
Test build #59653 has finished for PR 13392 at commit
|
|
retest this please |
|
Test build #59654 has finished for PR 13392 at commit
|
#### What changes were proposed in this pull request? The following condition in the Optimizer rule `OptimizeCodegen` is not right. ```Scala branches.size < conf.maxCaseBranchesForCodegen ``` - The number of branches in case when clause should be `branches.size + elseBranch.size`. - `maxCaseBranchesForCodegen` is the maximum boundary for enabling codegen. Thus, we should use `<=` instead of `<`. This PR is to fix this boundary case and also add missing test cases for verifying the conf `MAX_CASES_BRANCHES`. #### How was this patch tested? Added test cases in `SQLConfSuite` Author: gatorsmile <[email protected]> Closes #13392 from gatorsmile/maxCaseWhen. (cherry picked from commit d67c82e) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, mering to master and 2.0! |
What changes were proposed in this pull request?
The following condition in the Optimizer rule
OptimizeCodegenis not right.branches.size < conf.maxCaseBranchesForCodegenbranches.size + elseBranch.size.maxCaseBranchesForCodegenis the maximum boundary for enabling codegen. Thus, we should use<=instead of<.This PR is to fix this boundary case and also add missing test cases for verifying the conf
MAX_CASES_BRANCHES.How was this patch tested?
Added test cases in
SQLConfSuite