-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15916][SQL] Correctly pushdown top level AND operators with parenthesis in JDBC data source #13640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I remember this was written by you @viirya. Could you take a look please? |
|
Test build #60399 has finished for PR 13640 at commit
|
|
cc @rxin do you mind if I ask to review this please? |
|
I added this to my team's backlog. |
|
Thanks! |
|
LGTM |
| */ | ||
| private val filterWhereClause: String = | ||
| filters.flatMap(JDBCRDD.compileFilter).mkString(" AND ") | ||
| filters.flatMap(JDBCRDD.compileFilter).map(p => s"($p)").mkString(" AND ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about getWhereClause? There is also a AND at https://github.com/apache/spark/pull/13640/files#diff-5a29ce8f760092fb4a9c1f190cc2f61cR315
…edence
## What changes were proposed in this pull request?
This PR fixes the problem that the precedence order is messed when pushing where-clause expression to JDBC layer.
**Case 1:**
For sql `select * from table where (a or b) and c`, the where-clause is wrongly converted to JDBC where-clause `a or (b and c)` after filter push down. The consequence is that JDBC may returns less or more rows than expected.
**Case 2:**
For sql `select * from table where always_false_condition`, the result table may not be empty if the JDBC RDD is partitioned using where-clause:
```
spark.read.jdbc(url, table, predicates = Array("partition 1 where clause", "partition 2 where clause"...)
```
## How was this patch tested?
Unit test.
This PR also close #13640
Author: hyukjinkwon <[email protected]>
Author: Sean Zhong <[email protected]>
Closes #13743 from clockfly/SPARK-15916.
(cherry picked from commit ebb9a3b)
Signed-off-by: Cheng Lian <[email protected]>
What changes were proposed in this pull request?
This PR inserts the correct parenthesis between top level
ANDoperators.For example, the where clause below:
is being parsed as below:
This is fine for other sub filters for each element in
Array[Filter]but it is not considering the parenthesis and precedence withANDbetween elements inArray[Filter].This PR produces the correct condition as below:
How was this patch tested?
Unit test in
JDBCSuite.