Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jan 12, 2021

What changes were proposed in this pull request?

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133928/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/

We can reduce more than 8000 bytes by removing the unnecessary CONCAT expression.

W/ this fix, for q41 in TPCDS with Using TPCDS original definitions for char/varchar columns applied, we can reduce the stage code-gen size from 22523 to 14369

14369  - 22523 = - 8154

Why are the changes needed?

fix the perf regression(we need other improvements for q41 works), there will be a huge performance regression if codegen fails

Does this PR introduce any user-facing change?

no

How was this patch tested?

modified uts

@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu @HyukjinKwon @gatorsmile @dongjoon-hyun, thanks for reviewing

@github-actions github-actions bot added the SQL label Jan 12, 2021
override def foldable: Boolean = false
override def nullable: Boolean = true
override def dataType: DataType = NullType
override def dataType: DataType = returnType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the constructor parameter can be dataType directly, then we don't need this override.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, updated

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38555/

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me.

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38555/

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38557/

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38557/

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Test build #133968 has finished for PR 31150 at commit d86490a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

since = "3.1.0",
group = "misc_funcs")
case class RaiseError(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
case class RaiseError private[spark] (child: Expression, dataType: DataType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalyst is already a private module. We can just remove private[spark].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Test build #133970 has finished for PR 31150 at commit 5c043e9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38578/

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38578/

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Test build #133990 has finished for PR 31150 at commit 1dc7776.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class RaiseError(child: Expression, dataType: DataType)

@cloud-fan
Copy link
Contributor

cloud-fan commented Jan 13, 2021

thanks, merging to master!

@cloud-fan cloud-fan closed this in 04f031a Jan 13, 2021
@cloud-fan
Copy link
Contributor

it conflicts with 3.1, @yaooqinn can you send a new PR?

@yaooqinn
Copy link
Member Author

w/ pleasure~

cloud-fan pushed a commit that referenced this pull request Jan 14, 2021
…ils codegen in length check for char varchar

A backport for #31150 to branch 3.1

### What changes were proposed in this pull request?

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133928/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/

We can reduce more than 8000 bytes by removing the unnecessary CONCAT expression.

W/ this fix, for q41 in TPCDS with [Using TPCDS original definitions for char/varchar columns](#31012) applied, we can reduce the stage code-gen size from 22523 to 14369
```
14369  - 22523 = - 8154
```

### Why are the changes needed?

fix the perf regression(we need other improvements for q41 works), there will be a huge performance regression if codegen fails

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

modified uts

Closes #31168 from yaooqinn/SPARK-34086-31.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants