Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented May 24, 2020

What changes were proposed in this pull request?

  1. Make more expressions extend NullIntolerant.
  2. Add a checker(in ExpressionInfoSuite) to identify whether the expression is NullIntolerant.

Why are the changes needed?

Avoid skew join if the join column has many null values and can improve query performance. For examples:

CREATE TABLE t1(c1 string, c2 string) USING parquet;
CREATE TABLE t2(c1 string, c2 string) USING parquet;
EXPLAIN SELECT t1.* FROM t1 JOIN t2 ON upper(t1.c1) = upper(t2.c1);

Before and after this PR:

== Physical Plan ==
*(2) Project [c1#5, c2#6]
+- *(2) BroadcastHashJoin [upper(c1#5)], [upper(c1#7)], Inner, BuildLeft
   :- BroadcastExchange HashedRelationBroadcastMode(List(upper(input[0, string, true]))), [id=#41]
   :  +- *(1) ColumnarToRow
   :     +- FileScan parquet default.t1[c1#5,c2#6]
   +- *(2) ColumnarToRow
      +- FileScan parquet default.t2[c1#7] 


== Physical Plan ==
*(2) Project [c1#5, c2#6]
+- *(2) BroadcastHashJoin [upper(c1#5)], [upper(c1#7)], Inner, BuildRight
   :- *(2) Project [c1#5, c2#6]
   :  +- *(2) Filter isnotnull(c1#5)
   :     +- *(2) ColumnarToRow
   :        +- FileScan parquet default.t1[c1#5,c2#6]
   +- BroadcastExchange HashedRelationBroadcastMode(List(upper(input[0, string, true]))), [id=#59]
      +- *(1) Project [c1#7]
         +- *(1) Filter isnotnull(c1#7)
            +- *(1) ColumnarToRow
               +- FileScan parquet default.t2[c1#7]

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented May 24, 2020

Test build #123041 has finished for PR 28626 at commit c3625ca.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented May 24, 2020

retest this please.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented May 24, 2020

Test build #123053 has finished for PR 28626 at commit c3625ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 24, 2020

Test build #123059 has finished for PR 28626 at commit 7fe3490.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented May 25, 2020

also cc: @viirya

wangyum added 2 commits May 26, 2020 11:42
# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123105 has finished for PR 28626 at commit b127c41.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented May 26, 2020

retest this please

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123113 has finished for PR 28626 at commit b127c41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123124 has finished for PR 28626 at commit 82e97e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented May 28, 2020

@gatorsmile @cloud-fan


final override def eval(input: InternalRow): Any = {
val input2 = right.eval(input)
if (input2 == null || input2 == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see the difference now. Previously we can skip evaluating input1 if input2 is 0. Can we change it back and add comment to explain it? sorry for the back and forth!

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. Thanks for the updates.

@SparkQA
Copy link

SparkQA commented May 28, 2020

Test build #123233 has finished for PR 28626 at commit 1bdff95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 28, 2020

Test build #123237 has finished for PR 28626 at commit 265abd3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 91148f4 May 29, 2020
@wangyum wangyum deleted the SPARK-28481 branch May 29, 2020 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants