[SPARK-18141][SQL] Fix to quote column names in the predicate clause of the JDBC RDD generated sql statement #15662

sureshthalamati · 2016-10-27T18:42:49Z

What changes were proposed in this pull request?

SQL query generated for the JDBC data source is not quoting columns in the predicate clause. When the source table has quoted column names, spark jdbc read fails with column not found error incorrectly.

Error:
org.h2.jdbc.JdbcSQLException: Column "ID" not found;
Source SQL statement:
SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1)

This PR fixes by quoting column names in the generated SQL for predicate clause when filters are pushed down to the data source.

Source SQL statement after the fix:
SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE ("Id" < 1)

How was this patch tested?

Added new test case to the JdbcSuite

SparkQA · 2016-10-27T20:28:17Z

Test build #67662 has finished for PR 15662 at commit 0944e05.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-10-27T21:12:44Z

Test failed is org.apache.spark.sql.streaming.StreamingQuerySuite, unrelated to this change. Might have been fixed in commit 79fd0cc

sureshthalamati · 2016-10-27T21:12:56Z

retest this please

SparkQA · 2016-10-27T23:17:54Z

Test build #67669 has finished for PR 15662 at commit 0944e05.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-11-01T19:25:26Z

@rxin @gatorsmile

gatorsmile · 2016-11-02T21:41:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

Add a nested function in compileFilter

def quote(colName: String): String = dialect.quoteIdentifier(colName)

Then, your code changes can look cleaner.

sureshthalamati · 2016-11-03T03:39:58Z

Thank you very much for the feed back @gatorsmile . Addressed the review comments.

gatorsmile · 2016-11-03T04:05:30Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

remove this empty line

I will fix it.

gatorsmile · 2016-11-03T04:08:10Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

What is the purpose of the above two statements?

Those two statements test String StartsWith , and Contains filters. They are pushed to jdbc data source, and mapped to SQL LIKE expression.

I will fix the inconsistent column name in above two statements.

gatorsmile · 2016-11-03T04:17:56Z

This sounds a right and critical fix to me; otherwise we are unable to resolve the columns of predicates in the case sensitive JDBC sources.

@sureshthalamati Could you post the following exception in your PR description?

org.h2.jdbc.JdbcSQLException: Column "ID" not found; SQL statement:
SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1) [42122-183]

cc @srowen Could you please check it? Any comment? Thanks!

SparkQA · 2016-11-03T05:41:17Z

Test build #68044 has finished for PR 15662 at commit 2afe990.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-11-03T07:36:56Z

Thank you for reviewing, @gatorsmile . Updated the PR description , and addressed all the review comments.

SparkQA · 2016-11-03T10:05:51Z

Test build #68056 has finished for PR 15662 at commit 4e22e3c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-11-10T19:21:44Z

@gatorsmile I addressed all the review comments , can you please take a look.

gatorsmile · 2016-11-11T23:40:05Z

@srowen Any comment on this?

gatorsmile · 2016-11-26T05:53:54Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

This is an unnecessary change, right?

gatorsmile · 2016-11-26T05:55:23Z

@sureshthalamati Could you resolve the conflict? Thanks!

…of the JDBC RDD generated sql statement

…line

… column

SparkQA · 2016-11-29T01:41:05Z

Test build #69269 has finished for PR 15662 at commit 2178e3f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-11-29T06:53:05Z

Thanks, @gatorsmile . Resolved the conflicts, and also added test case for empty in clause with mixed case column name.

gatorsmile · 2016-11-30T18:41:14Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

    assert(sql("SELECT * FROM mixedCaseCols WHERE Name IS NULL").collect().size == 1)
    assert(sql("SELECT * FROM mixedCaseCols WHERE Name IS NOT NULL").collect().size == 2)
+    assert(sql("SELECT * FROM mixedCaseCols")
+      .filter($"Name".isin(Array[String]() : _*)).collect().size == 0)


.filter($"Name".isin(Array[String]() : _*)).collect().size == 0)

->

.filter($"Name".isin()).collect().size == 0)

Thanks , @gatorsmile . Fixed it.

gatorsmile · 2016-11-30T18:46:28Z

LGTM except a minor comment

cc @cloud-fan

SparkQA · 2016-11-30T22:10:54Z

Test build #69427 has finished for PR 15662 at commit f0d731f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-11-30T23:01:00Z

retest this please

SparkQA · 2016-12-01T01:25:59Z

Test build #69434 has finished for PR 15662 at commit f0d731f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-12-02T03:05:06Z

LGTM

…of the JDBC RDD generated sql statement ## What changes were proposed in this pull request? SQL query generated for the JDBC data source is not quoting columns in the predicate clause. When the source table has quoted column names, spark jdbc read fails with column not found error incorrectly. Error: org.h2.jdbc.JdbcSQLException: Column "ID" not found; Source SQL statement: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1) This PR fixes by quoting column names in the generated SQL for predicate clause when filters are pushed down to the data source. Source SQL statement after the fix: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE ("Id" < 1) ## How was this patch tested? Added new test case to the JdbcSuite Author: sureshthalamati <[email protected]> Closes #15662 from sureshthalamati/filter_quoted_cols-SPARK-18141. (cherry picked from commit 70c5549) Signed-off-by: gatorsmile <[email protected]>

gatorsmile · 2016-12-02T03:14:48Z

Merging to master/2.1! Thanks!

…of the JDBC RDD generated sql statement ## What changes were proposed in this pull request? SQL query generated for the JDBC data source is not quoting columns in the predicate clause. When the source table has quoted column names, spark jdbc read fails with column not found error incorrectly. Error: org.h2.jdbc.JdbcSQLException: Column "ID" not found; Source SQL statement: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1) This PR fixes by quoting column names in the generated SQL for predicate clause when filters are pushed down to the data source. Source SQL statement after the fix: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE ("Id" < 1) ## How was this patch tested? Added new test case to the JdbcSuite Author: sureshthalamati <[email protected]> Closes apache#15662 from sureshthalamati/filter_quoted_cols-SPARK-18141.

sureshthalamati · 2016-12-03T00:10:49Z

Thank you , @gatorsmile @cloud-fan

…of the JDBC RDD generated sql statement ## What changes were proposed in this pull request? SQL query generated for the JDBC data source is not quoting columns in the predicate clause. When the source table has quoted column names, spark jdbc read fails with column not found error incorrectly. Error: org.h2.jdbc.JdbcSQLException: Column "ID" not found; Source SQL statement: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE (Id < 1) This PR fixes by quoting column names in the generated SQL for predicate clause when filters are pushed down to the data source. Source SQL statement after the fix: SELECT "Name","Id" FROM TEST."mixedCaseCols" WHERE ("Id" < 1) ## How was this patch tested? Added new test case to the JdbcSuite Author: sureshthalamati <[email protected]> Closes apache#15662 from sureshthalamati/filter_quoted_cols-SPARK-18141.

gatorsmile reviewed Nov 2, 2016

View reviewed changes

sureshthalamati force-pushed the filter_quoted_cols-SPARK-18141 branch from 0944e05 to 2afe990 Compare November 3, 2016 03:35

gatorsmile reviewed Nov 3, 2016

View reviewed changes

gatorsmile reviewed Nov 26, 2016

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala Outdated

Copy link

Member

gatorsmile Nov 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unnecessary change, right?

sureshthalamati added 4 commits November 28, 2016 11:23

[SPARK-18141][SQL] Fix to quote column names in the predicate clause …

66f7999

…of the JDBC RDD generated sql statement

Addressed review comments. Simplified code using a nested function

5dd575e

Addressed review comments. Minor fix to test, and remove extra empty …

1765332

…line

Adding test case for in with empty value list filter using mixed case…

2178e3f

… column

sureshthalamati force-pushed the filter_quoted_cols-SPARK-18141 branch from 4e22e3c to 2178e3f Compare November 28, 2016 23:00

gatorsmile reviewed Nov 30, 2016

View reviewed changes

Addressing review comments. simplified isin test case

f0d731f

gatorsmile mentioned this pull request Nov 30, 2016

[SPARK-18593][SQL] JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL #16021

Closed

asfgit closed this in 70c5549 Dec 2, 2016

cloud-fan mentioned this pull request May 25, 2017

[SPARK-14460] [SQL] properly handling of column name contains space #12252

Closed

[SPARK-18141][SQL] Fix to quote column names in the predicate clause of the JDBC RDD generated sql statement #15662

[SPARK-18141][SQL] Fix to quote column names in the predicate clause of the JDBC RDD generated sql statement #15662

Uh oh!

Conversation

sureshthalamati commented Oct 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 27, 2016

Uh oh!

sureshthalamati commented Oct 27, 2016

Uh oh!

sureshthalamati commented Oct 27, 2016

Uh oh!

SparkQA commented Oct 27, 2016

Uh oh!

sureshthalamati commented Nov 1, 2016

Uh oh!

gatorsmile Nov 2, 2016

Choose a reason for hiding this comment

Uh oh!

sureshthalamati commented Nov 3, 2016

Uh oh!

gatorsmile Nov 3, 2016

Choose a reason for hiding this comment

Uh oh!

sureshthalamati Nov 3, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 3, 2016

Choose a reason for hiding this comment

Uh oh!

sureshthalamati Nov 3, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Nov 3, 2016

Uh oh!

SparkQA commented Nov 3, 2016

Uh oh!

sureshthalamati commented Nov 3, 2016

Uh oh!

SparkQA commented Nov 3, 2016

Uh oh!

sureshthalamati commented Nov 10, 2016

Uh oh!

gatorsmile commented Nov 11, 2016

Uh oh!

gatorsmile Nov 26, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Nov 26, 2016

Uh oh!

SparkQA commented Nov 29, 2016

Uh oh!

sureshthalamati commented Nov 29, 2016

Uh oh!

gatorsmile Nov 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sureshthalamati Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Nov 30, 2016

Uh oh!

SparkQA commented Nov 30, 2016

Uh oh!

gatorsmile commented Nov 30, 2016

Uh oh!

SparkQA commented Dec 1, 2016

Uh oh!

cloud-fan commented Dec 2, 2016

Uh oh!

gatorsmile commented Dec 2, 2016

Uh oh!

sureshthalamati commented Dec 3, 2016

Uh oh!

Reviewers

Assignees

sureshthalamati commented Oct 27, 2016 •

edited

Loading

gatorsmile Nov 30, 2016 •

edited

Loading