[SPARK-15916][SQL] Correctly pushdown top level AND operators with parenthesis in JDBC data source #13640

HyukjinKwon · 2016-06-13T10:39:45Z

What changes were proposed in this pull request?

This PR inserts the correct parenthesis between top level AND operators.

For example, the where clause below:

WHERE (NAME = 'fred' OR THEID = 100) AND THEID < 1

is being parsed as below:

WHERE (NAME = 'fred') OR (THEID = 100) AND (THEID < 1)

This is fine for other sub filters for each element in Array[Filter] but it is not considering the parenthesis and precedence with AND between elements in Array[Filter].

This PR produces the correct condition as below:

WHERE ((NAME = 'fred') OR (THEID = 100)) AND (THEID < 1)

How was this patch tested?

Unit test in JDBCSuite.

HyukjinKwon · 2016-06-13T10:40:40Z

I remember this was written by you @viirya. Could you take a look please?

SparkQA · 2016-06-13T12:17:43Z

Test build #60399 has finished for PR 13640 at commit 6d2580a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-06-15T23:30:47Z

cc @rxin do you mind if I ask to review this please?

rxin · 2016-06-15T23:34:26Z

I added this to my team's backlog.

HyukjinKwon · 2016-06-16T01:10:14Z

Thanks!

viirya · 2016-06-16T04:20:59Z

LGTM

clockfly · 2016-06-17T18:30:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

   */
  private val filterWhereClause: String =
-    filters.flatMap(JDBCRDD.compileFilter).mkString(" AND ")
+    filters.flatMap(JDBCRDD.compileFilter).map(p => s"($p)").mkString(" AND ")


How about getWhereClause? There is also a AND at https://github.com/apache/spark/pull/13640/files#diff-5a29ce8f760092fb4a9c1f190cc2f61cR315

…edence ## What changes were proposed in this pull request? This PR fixes the problem that the precedence order is messed when pushing where-clause expression to JDBC layer. **Case 1:** For sql `select * from table where (a or b) and c`, the where-clause is wrongly converted to JDBC where-clause `a or (b and c)` after filter push down. The consequence is that JDBC may returns less or more rows than expected. **Case 2:** For sql `select * from table where always_false_condition`, the result table may not be empty if the JDBC RDD is partitioned using where-clause: ``` spark.read.jdbc(url, table, predicates = Array("partition 1 where clause", "partition 2 where clause"...) ``` ## How was this patch tested? Unit test. This PR also close #13640 Author: hyukjinkwon <[email protected]> Author: Sean Zhong <[email protected]> Closes #13743 from clockfly/SPARK-15916. (cherry picked from commit ebb9a3b) Signed-off-by: Cheng Lian <[email protected]>

Consider top level and/or precedence for parenthesis

6d2580a

HyukjinKwon changed the title ~~[SPARK-15916][SQL] Correctly pushdown top level and operators with parenthesis in JDBC data source~~ [SPARK-15916][SQL] Correctly pushdown top level AND operators with parenthesis in JDBC data source Jun 13, 2016

clockfly reviewed Jun 17, 2016
View reviewed changes

clockfly mentioned this pull request Jun 17, 2016

[SPARK-15916][SQL] JDBC filter push down should respect operator precedence #13743

Closed

asfgit closed this in ebb9a3b Jun 18, 2016

HyukjinKwon deleted the SPARK-15916 branch January 2, 2018 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-15916][SQL] Correctly pushdown top level AND operators with parenthesis in JDBC data source #13640

[SPARK-15916][SQL] Correctly pushdown top level AND operators with parenthesis in JDBC data source #13640

Uh oh!

HyukjinKwon commented Jun 13, 2016 •

edited

Loading

Uh oh!

HyukjinKwon commented Jun 13, 2016 •

edited

Loading

Uh oh!

SparkQA commented Jun 13, 2016

Uh oh!

HyukjinKwon commented Jun 15, 2016

Uh oh!

rxin commented Jun 15, 2016

Uh oh!

HyukjinKwon commented Jun 16, 2016

Uh oh!

viirya commented Jun 16, 2016

Uh oh!

clockfly Jun 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-15916][SQL] Correctly pushdown top level AND operators with parenthesis in JDBC data source #13640

[SPARK-15916][SQL] Correctly pushdown top level AND operators with parenthesis in JDBC data source #13640

Uh oh!

Conversation

HyukjinKwon commented Jun 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Jun 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jun 13, 2016

Uh oh!

HyukjinKwon commented Jun 15, 2016

Uh oh!

rxin commented Jun 15, 2016

Uh oh!

HyukjinKwon commented Jun 16, 2016

Uh oh!

viirya commented Jun 16, 2016

Uh oh!

clockfly Jun 17, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HyukjinKwon commented Jun 13, 2016 •

edited

Loading

HyukjinKwon commented Jun 13, 2016 •

edited

Loading