Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

In the query, Optimizer keeps adding duplicate constraints in Filter constraints by applying the rule PushPredicateThroughProject.

SELECT unionsrc1.key, unionsrc1.value, unionsrc2.key, unionsrc2.value FROM (select 'tst1' as key, cast(count(1) as string) as value from parquet_t1 s1 UNION ALL select s2.key as key, s2.value as value from parquet_t1 s2 where s2.key < 10) unionsrc1 JOIN (select 'tst1' as key, cast(count(1) as string) as value from parquet_t1 s3 UNION  ALL select s4.key as key, s4.value as value from parquet_t1 s4 where s4.key < 10) unionsrc2 ON (unionsrc1.key = unionsrc2.key)

The generated optimized plan is like

Join Inner, Some((key#30 = key#34))
:- Filter isnotnull(key#30)
:  +- Union
:     :- Aggregate [tst1 AS key#30,cast((count(1),mode=Complete,isDistinct=false) as string) AS value#31]
:     :  +- Project
:     :     +- Relation[key#38L,value#39] ParquetFormat part: struct<>, data: struct<key:bigint,value:string>
:     +- Project [cast(key#38L as string) AS key#46,value#39 AS value#33]
:        +- Filter ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((isnotnull(key#38L) && (key#38L < 10)) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string)))
:           +- Relation[key#38L,value#39] ParquetFormat part: struct<>, data: struct<key:bigint,value:string>
+- Filter isnotnull(key#34)
   +- Union
      :- Aggregate [tst1 AS key#34,cast((count(1),mode=Complete,isDistinct=false) as string) AS value#35]
      :  +- Project
      :     +- Relation[key#38L,value#39] ParquetFormat part: struct<>, data: struct<key:bigint,value:string>
      +- Project [cast(key#38L as string) AS key#47,value#39 AS value#37]
         +- Filter ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((isnotnull(key#38L) && (key#38L < 10)) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string))) && isnotnull(cast(key#38L as string)))
            +- Relation[key#38L,value#39] ParquetFormat part: struct<>, data: struct<key:bigint,value:string>

Due to this issue, it also hits the max iteration, as shown in the log output https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53176/consoleFull.

This PR uses the constraints to avoid pushing any predicate that already exists in its child's Constraints. Also, it will not pushing any predicate that does not contain any reference, since it could introduce the same issue.

Will introduce the same idea in the similar rules:

  •   PushPredicateThroughJoin,
    
  •   PushPredicateThroughGenerate,
    
  •   PushPredicateThroughAggregate,
    
  •   SetOperationPushDown
    

Should I do it in the same PR? or different PRs? @marmbrus

How was this patch tested?

Added a test case and also manually tested the case that causes the exception in #11714

@gatorsmile gatorsmile changed the title Constraint filter [SPARK-13936] [SQL] Avoid Predicate Pushdown Using Constraints in PushPredicateThroughProject Mar 16, 2016
@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53328 has finished for PR 11765 at commit 6a0fd8a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53329 has finished for PR 11765 at commit da6238e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@davies
Copy link
Contributor

davies commented Mar 18, 2016

@gatorsmile Should we fix this in InferFiltersFromConstraints itself?

@gatorsmile
Copy link
Member Author

@davies, changing InferFiltersFromConstraints is unable to fix all the issues. InferFiltersFromConstraints is to infer the filter conditions based on the Constraints of a specific node. Fixing InferFiltersFromConstraints is needed but it does not cover all the cases. If you checking the Constraint generation, it will not pop up all the conditions to their parents. It depends on the operator type. This is the fundamental reason why we should use the Constraints to avoid filter condition pushdown.

@sameeragarwal
Copy link
Member

@gatorsmile isn't the issue that you pointed out in the PR description just an artifact of union not propagating constraints correctly when it encounters a Cast in a child?

@gatorsmile
Copy link
Member Author

@sameeragarwal That is just an example that exposes this issue. I think Cast is just one of multiple compound cases. If we can avoid pushing down any filters that have been included in the children's Constraints. Then, it will avoid useless predicate pushdown.

@srowen
Copy link
Member

srowen commented Jun 23, 2016

Same, is this still an issue, active?

@gatorsmile
Copy link
Member Author

The issue has been resolved. Let me close it. Thanks!

@gatorsmile gatorsmile closed this Jun 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants