Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

As I show in https://issues.apache.org/jira/browse/SPARK-29769
we can't use EXISTS/NOT EXISTS as on condition in LEFTE OUTER JOIN/ FULL OUTER JOIN / LEFT ANTI JOIN
This pr is to support this.

Why are the changes needed?

Support EXISTS/NOT EXISTS as join's on condition

Does this PR introduce any user-facing change?

People can use exits like

SELECT s1.id FROM s1
LEFT OUTER  JOIN s2 ON s1.id = s2.id
 AND EXISTS (SELECT * from s3 where s3.id > 6)

SELECT s1.id FROM s1
RIGHT OUTER JOIN  s2 ON s1.id = s2.id
AND EXISTS (SELECT * from s3 where s3.id > 6)

SELECT s1.id FROM s1
LEFT SEMI  JOIN s2 ON s1.id = s2.id
AND EXISTS (SELECT * from s3 where s3.id > 6)


SELECT s1.id FROM s1
LEFT ANTI  JOIN s2 ON s1.id = s2.id
AND EXISTS (SELECT * from s3 where s3.id > 6)

SELECT s1.id FROM s1
FULL OUTER JOIN  s2 ON s1.id = s2.id
AND EXISTS (SELECT * from s3 where s3.id > 6)

How was this patch tested?

Added UT

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Nov 8, 2019

@cloud-fan @maropu @viirya @dilipbiswal
I make a change in PushPredicateThroughJoin to make it split condition according to JoinType.
For left/right join, we should have different split order or we will lose some case.
Since I wasn't familiar with the full details of catalyst's process. Hope your advise.

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

do you mean the optimizer rule PushPredicateThroughJoin is the culprit that forbids "exists/not exists" condition in outer joins?

@AngersZhuuuu
Copy link
Contributor Author

do you mean the optimizer rule PushPredicateThroughJoin is the culprit that forbids "exists/not exists" condition in outer joins?

My last pr make it can be analyzed since I make it can resolve Subqueries in join condition.
But for exists, in current master, you can use inner/right outer but you can't use left outer/full outer/ left ant/ left semi , since in origin PushPredicateThroughJoin , origin split() will solve leftEvaluateCondition first, in our test query exists is not about left or right, then will be be pased to leftEvaluateCondition, then if you meet RIGHT OUTER JOIN, it will build a new LeftChild. then it change exists condition to a LeftSemi join. But if you are LEFT OUTER JOIN it will build a new RightChild, exists condition was passed to leftEvaluateCondition, so it won't be solved, and it will be a unresolved PhysicalPlan as I show in jira.

@AngersZhuuuu
Copy link
Contributor Author

do you mean the optimizer rule PushPredicateThroughJoin is the culprit that forbids "exists/not exists" condition in outer joins?

You can see my change in PushPredicateThroughJoin I add a new split method to make it fit for join type. it will build left/right EvaluateCondition in different case. Such as LeftOuterJoin, we should build rightEvaluateCondition first. Since it will build a new Right child.

@cloud-fan
Copy link
Contributor

cloud-fan commented Nov 8, 2019

I'm confused. How can we leave the subquery unresolved in analyzer and fix it in optimizer? @dilipbiswal can you take a look?

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Nov 8, 2019

I'm confused. How can we leave the subquery unresolved in analyzer and fix it in optimizer?@dilipbiswal can you take a look?

Origin catalyst can't resolve this, but my last pr's change on here


Make it can be resolved.

@cloud-fan
Copy link
Contributor

OK I was misled by the JIRA ticket. You don't need to show the query plan assuming a commit(your last PR) is reverted.

@SparkQA
Copy link

SparkQA commented Nov 8, 2019

Test build #113420 has finished for PR 26431 at commit 9b510c4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

So the direct cause is that we can't plan the subquery. What's the problem in the planner?

@AngersZhuuuu
Copy link
Contributor Author

So the direct cause is that we can't plan the subquery. What's the problem in the planner?

I will see if there are any place I can plan subquery in exists .

@SparkQA
Copy link

SparkQA commented Nov 8, 2019

Test build #113427 has finished for PR 26431 at commit 17dccd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

So the direct cause is that we can't plan the subquery. What's the problem in the planner?

Checked and try a lot, seems we can't support EXISTS/NOT EXISTS as on's join condition with FULL OUTER JOIN. Since for FULL OUTER JOIN, we can't push down exists/not exists condition to left/right and we can't build a LeftSemi Join with FULL OUTER JOIN and EXISTS subqueries

@cloud-fan
Copy link
Contributor

I think we need to update PlanSubqueries to fai if it hit subqueries that it can't plan. @AngersZhuuuu can you send another PR to do it first?

@AngersZhuuuu
Copy link
Contributor Author

I think we need to update PlanSubqueries to fai if it hit subqueries that it can't plan. @AngersZhuuuu can you send another PR to do it first?

PlanSubqueries is what I want to find, Thanks a lot for you suggestion, checking on this.

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-29769][SQL] Support "exists/not exists" condition as all type Join's on cpndition [SPARK-29769][SQL] Support non-correlate "exists/not exists" condition as all type Join's on condition Nov 11, 2019
@AngersZhuuuu
Copy link
Contributor Author

Solved in #26437
Close this.
cc @cloud-fan

@cloud-fan
Copy link
Contributor

There is still an issue about correlated EXISTS in join condition. We can keep thinking about how to fix it.

@AngersZhuuuu
Copy link
Contributor Author

There is still an issue about correlated EXISTS in join condition. We can keep thinking about how to fix it.

Yea, keeping find way to solve correlated EXISTS in join condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants