Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 13, 2016

What changes were proposed in this pull request?

This PR inserts the correct parenthesis between top level AND operators.

For example, the where clause below:

WHERE (NAME = 'fred' OR THEID = 100) AND THEID < 1

is being parsed as below:

WHERE (NAME = 'fred') OR (THEID = 100) AND (THEID < 1)

This is fine for other sub filters for each element in Array[Filter] but it is not considering the parenthesis and precedence with AND between elements in Array[Filter].

This PR produces the correct condition as below:

WHERE ((NAME = 'fred') OR (THEID = 100)) AND (THEID < 1)

How was this patch tested?

Unit test in JDBCSuite.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 13, 2016

I remember this was written by you @viirya. Could you take a look please?

@HyukjinKwon HyukjinKwon changed the title [SPARK-15916][SQL] Correctly pushdown top level and operators with parenthesis in JDBC data source [SPARK-15916][SQL] Correctly pushdown top level AND operators with parenthesis in JDBC data source Jun 13, 2016
@SparkQA
Copy link

SparkQA commented Jun 13, 2016

Test build #60399 has finished for PR 13640 at commit 6d2580a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

cc @rxin do you mind if I ask to review this please?

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

I added this to my team's backlog.

@HyukjinKwon
Copy link
Member Author

Thanks!

@viirya
Copy link
Member

viirya commented Jun 16, 2016

LGTM

*/
private val filterWhereClause: String =
filters.flatMap(JDBCRDD.compileFilter).mkString(" AND ")
filters.flatMap(JDBCRDD.compileFilter).map(p => s"($p)").mkString(" AND ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asfgit asfgit closed this in ebb9a3b Jun 18, 2016
asfgit pushed a commit that referenced this pull request Jun 18, 2016
…edence

## What changes were proposed in this pull request?

This PR fixes the problem that the precedence order is messed when pushing where-clause expression to JDBC layer.

**Case 1:**

For sql `select * from table where (a or b) and c`, the where-clause is wrongly converted to JDBC where-clause `a or (b and c)` after filter push down. The consequence is that JDBC may returns less or more rows than expected.

**Case 2:**

For sql `select * from table where always_false_condition`, the result table may not be empty if the JDBC RDD is partitioned using where-clause:
```
spark.read.jdbc(url, table, predicates = Array("partition 1 where clause", "partition 2 where clause"...)
```

## How was this patch tested?

Unit test.

This PR also close #13640

Author: hyukjinkwon <[email protected]>
Author: Sean Zhong <[email protected]>

Closes #13743 from clockfly/SPARK-15916.

(cherry picked from commit ebb9a3b)
Signed-off-by: Cheng Lian <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-15916 branch January 2, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants