[SPARK-42500][SQL] ConstantPropagation support more case #42038

TongWei1105 · 2023-07-17T08:47:21Z

What changes were proposed in this pull request?

This PR enhances ConstantPropagation to support more cases.

Propagated through other binary comparisons.
Propagated across equality comparisons. This can be further optimized to false.

Why are the changes needed?

Improve query performance. Denodo also has a similar optimization. For example:

CREATE TABLE t1(a int, b int) using parquet;
CREATE TABLE t2(x int, y int) using parquet;

CREATE TEMP VIEW v1 AS                                        
SELECT * FROM t1 JOIN t2 WHERE a = x AND a = 0                
UNION ALL                                                     
SELECT * FROM t1 JOIN t2 WHERE a = x AND (a IS NULL OR a <> 0);

SELECT * FROM v1 WHERE x > 1;

Before this PR:

== Optimized Logical Plan ==
Union false, false
:- Project [a#0 AS a#12, b#1 AS b#13, x#2 AS x#14, y#3 AS y#15]
:  +- Join Inner
:     :- Filter (isnotnull(a#0) AND (a#0 = 0))
:     :  +- Relation spark_catalog.default.t1[a#0,b#1] parquet
:     +- Filter (isnotnull(x#2) AND ((0 = x#2) AND (x#2 > 1)))
:        +- Relation spark_catalog.default.t2[x#2,y#3] parquet
+- Join Inner, (a#16 = x#18)
   :- Filter ((isnull(a#16) OR NOT (a#16 = 0)) AND ((a#16 > 1) AND isnotnull(a#16)))
   :  +- Relation spark_catalog.default.t1[a#16,b#17] parquet
   +- Filter ((isnotnull(x#18) AND (x#18 > 1)) AND (isnull(x#18) OR NOT (x#18 = 0)))
      +- Relation spark_catalog.default.t2[x#18,y#19] parquet

After this PR:

== Optimized Logical Plan ==
Join Inner, (a#16 = x#18)
:- Filter ((isnull(a#16) OR NOT (a#16 = 0)) AND ((a#16 > 1) AND isnotnull(a#16)))
:  +- Relation spark_catalog.default.t1[a#16,b#17] parquet
+- Filter ((isnotnull(x#18) AND (x#18 > 1)) AND (isnull(x#18) OR NOT (x#18 = 0)))
   +- Relation spark_catalog.default.t2[x#18,y#19] parquet

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

wangyum · 2023-07-17T09:09:24Z

cc @cloud-fan @peter-toth

peter-toth · 2023-07-17T09:48:18Z

I think this PR is very similar to my #40268 so I'm fine with this this change.
But my PR uses a mutable map to avoid the might be costly val equalityPredicates = equalityPredicatesLeft ++ equalityPredicatesRight map addition so please consider that PR too.

wangyum · 2023-08-06T15:15:22Z

@peter-toth This map should usually be small. This change is small and easy to maintain.

wangyum · 2023-08-06T15:23:10Z

Merged to master.

…apache#256) ### What changes were proposed in this pull request? This PR enhances ConstantPropagation to support more cases. Propagated through other binary comparisons. Propagated across equality comparisons. This can be further optimized to false. ### Why are the changes needed? Improve query performance. [Denodo](https://community.denodo.com/docs/html/browse/latest/en/vdp/administration/optimizing_queries/automatic_simplification_of_queries/removing_redundant_branches_of_queries_partitioned_unions) also has a similar optimization. For example: ``` CREATE TABLE t1(a int, b int) using parquet; CREATE TABLE t2(x int, y int) using parquet; CREATE TEMP VIEW v1 AS SELECT * FROM t1 JOIN t2 WHERE a = x AND a = 0 UNION ALL SELECT * FROM t1 JOIN t2 WHERE a = x AND (a IS NULL OR a <> 0); SELECT * FROM v1 WHERE x > 1; ``` Before this PR: ``` == Optimized Logical Plan == Union false, false :- Project [a#0 AS a#12, b#1 AS b#13, x#2 AS x#14, y#3 AS y#15] : +- Join Inner : :- Filter (isnotnull(a#0) AND (a#0 = 0)) : : +- Relation spark_catalog.default.t1[a#0,b#1] parquet : +- Filter (isnotnull(x#2) AND ((0 = x#2) AND (x#2 > 1))) : +- Relation spark_catalog.default.t2[x#2,y#3] parquet +- Join Inner, (a#16 = x#18) :- Filter ((isnull(a#16) OR NOT (a#16 = 0)) AND ((a#16 > 1) AND isnotnull(a#16))) : +- Relation spark_catalog.default.t1[a#16,b#17] parquet +- Filter ((isnotnull(x#18) AND (x#18 > 1)) AND (isnull(x#18) OR NOT (x#18 = 0))) +- Relation spark_catalog.default.t2[x#18,y#19] parquet ``` After this PR: ``` == Optimized Logical Plan == Join Inner, (a#16 = x#18) :- Filter ((isnull(a#16) OR NOT (a#16 = 0)) AND ((a#16 > 1) AND isnotnull(a#16))) : +- Relation spark_catalog.default.t1[a#16,b#17] parquet +- Filter ((isnotnull(x#18) AND (x#18 > 1)) AND (isnull(x#18) OR NOT (x#18 = 0))) +- Relation spark_catalog.default.t2[x#18,y#19] parquet ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes apache#42038 from TongWei1105/SPARK-42500. Authored-by: TongWei1105 <[email protected]> Signed-off-by: Yuming Wang <[email protected]> (cherry picked from commit 74ae1e3) Co-authored-by: TongWei1105 <[email protected]>

github-actions bot added the SQL label Jul 17, 2023

TongWei1105 force-pushed the SPARK-42500 branch 3 times, most recently from 5712bb5 to ce86f25 Compare July 17, 2023 08:59

TongWei1105 changed the title ~~SPARK-42500: ConstantPropagation support more case~~ [SPARK-42500][SQL] ConstantPropagation support more case Jul 17, 2023

wangyum approved these changes Jul 17, 2023

View reviewed changes

TongWei1105 force-pushed the SPARK-42500 branch 3 times, most recently from 20d286f to 4d1e3a9 Compare July 28, 2023 05:46

SPARK-42500: ConstantPropagation support more case

ca42914

TongWei1105 force-pushed the SPARK-42500 branch from fca478e to ca42914 Compare August 2, 2023 09:04

wangyum closed this in 74ae1e3 Aug 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-42500][SQL] ConstantPropagation support more case #42038

[SPARK-42500][SQL] ConstantPropagation support more case #42038

Uh oh!

TongWei1105 commented Jul 17, 2023

Uh oh!

wangyum commented Jul 17, 2023

Uh oh!

peter-toth commented Jul 17, 2023

Uh oh!

wangyum commented Aug 6, 2023

Uh oh!

wangyum commented Aug 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-42500][SQL] ConstantPropagation support more case #42038

[SPARK-42500][SQL] ConstantPropagation support more case #42038

Uh oh!

Conversation

TongWei1105 commented Jul 17, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

wangyum commented Jul 17, 2023

Uh oh!

peter-toth commented Jul 17, 2023

Uh oh!

wangyum commented Aug 6, 2023

Uh oh!

wangyum commented Aug 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants