Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

The following condition in the Optimizer rule OptimizeCodegen is not right.

branches.size < conf.maxCaseBranchesForCodegen
  • The number of branches in case when clause should be branches.size + elseBranch.size.
  • maxCaseBranchesForCodegen is the maximum boundary for enabling codegen. Thus, we should use <= instead of <.

This PR is to fix this boundary case and also add missing test cases for verifying the conf MAX_CASES_BRANCHES.

How was this patch tested?

Added test cases in SQLConfSuite

@gatorsmile
Copy link
Member Author

@dongjoon-hyun FYI, this PR is just to fix the boundary cases. I knew this issue was not introduced in your PR: #12353. Thanks!

@gatorsmile
Copy link
Member Author

cc @cloud-fan @rxin Could you verify if my understanding is right? Thanks!

@dongjoon-hyun
Copy link
Member

Thank you for making me up-to-date, @gatorsmile !

By the way, there is one correction. My PR is about parameterizing the following previous code. :)

def shouldCodegen: Boolean =
  branches.length < CaseWhen.MAX_NUM_CASES_FOR_CODEGEN

@SparkQA
Copy link

SparkQA commented May 29, 2016

Test build #59585 has finished for PR 13392 at commit f351c10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@dongjoon-hyun Yeah. As pointed out above, I knew it was not introduced by your PR. Thanks!

def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
case e @ CaseWhen(branches, _) if branches.size < conf.maxCaseBranchesForCodegen =>
case e @ CaseWhen(branches, elseBranch)
if branches.size + elseBranch.size <= conf.maxCaseBranchesForCodegen =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the case takes a while and and I think it'd greatly benefit from introducing a local def - a predicate - for the condition (I can't figure out a name for this, sorry)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not good at naming. How about canCodegen? : )

@gatorsmile
Copy link
Member Author

Sorry, pushed to a wrong branch. : )

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #59623 has finished for PR 13392 at commit ecc4318.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #59626 has finished for PR 13392 at commit 414e116.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
case e @ CaseWhen(branches, _) if branches.size < conf.maxCaseBranchesForCodegen =>
case e: CaseWhen if canCodeGen(e) =>
e.toCodegen()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can fit in the previous line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do. Thanks!

withTable("tab1") {
spark
.range(10)
.select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need a? or just spark.range(10).write.saveAsTable, then we can use id in the case when

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it. Thanks!

@cloud-fan
Copy link
Contributor

LGTM except some style comments

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #59634 has finished for PR 13392 at commit 4306c4f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #59636 has finished for PR 13392 at commit b2849e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
case e @ CaseWhen(branches, _) if branches.size < conf.maxCaseBranchesForCodegen =>
e.toCodegen()
case e: CaseWhen if canCodeGen(e) => e.toCodegen()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for nitpicking, but could you use canCodegen instead (to follow the name of the method to call)? Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me fix it. Thanks!

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #59653 has finished for PR 13392 at commit 9830e31.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #59654 has finished for PR 13392 at commit 9830e31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request May 31, 2016
#### What changes were proposed in this pull request?

The following condition in the Optimizer rule `OptimizeCodegen` is not right.
```Scala
branches.size < conf.maxCaseBranchesForCodegen
```

- The number of branches in case when clause should be `branches.size + elseBranch.size`.
- `maxCaseBranchesForCodegen` is the maximum boundary for enabling codegen. Thus, we should use `<=` instead of `<`.

This PR is to fix this boundary case and also add missing test cases for verifying the conf `MAX_CASES_BRANCHES`.

#### How was this patch tested?
Added test cases in `SQLConfSuite`

Author: gatorsmile <[email protected]>

Closes #13392 from gatorsmile/maxCaseWhen.

(cherry picked from commit d67c82e)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, mering to master and 2.0!

@asfgit asfgit closed this in d67c82e May 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants