[SPARK-17860][SQL] SHOW COLUMN's database conflict check should respect case sensitivity configuration #15423

dilipbiswal · 2016-10-10T23:42:35Z

What changes were proposed in this pull request?

SHOW COLUMNS command validates the user supplied database
name with database name from qualified table name name to make
sure both of them are consistent. This comparison should respect
case sensitivity.

How was this patch tested?

Added tests in DDLSuite and existing tests were moved to use new sql based test infrastructure.

SparkQA · 2016-10-11T01:55:47Z

Test build #66692 has finished for PR 15423 at commit 3acd08f.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ShowColumnsCommand(

viirya · 2016-10-11T02:35:57Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

Why ShowColumnsCommand is sorted?

@viirya I added the case here because of
https://github.com/dilipbiswal/spark/blob/3acd08f9431d6cdfe4d043aa342d09fc0ebfa497/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L222

I didn't want the output of ShowColumnsCommand to be sorted before comparison.

Will it affect comparison result? I think the result is generated, right?

@viirya So it seemed odd to have the generated output files to have column names sorted which didn't reflect the column order of create table. In the test case i had the table create like following.

CREATE TABLE showcolumn2 (price int, qty int) partitioned by (year int, month int)

It seemed odd to me to have the generated output file report the columns as month, price, qty and year as opposed to price, qty, year and month. Let me know if i am missing something here.

Personally I don't think it is odd because we just want to compare the results. Adding ShowColumnsCommand to sorted op looks more odd to me. cc @cloud-fan

Does it break test if we don't mark it as sorted?

@cloud-fan @viirya Actually it does not break the test if we don't mark it as sorted. What happens is that, when we generate the expected output file, the results appear sorted like following:

-- !query 7 SHOW COLUMNS IN showcolumn2 IN showdb -- !query 7 schema struct<col_name:string> -- !query 7 output month price qty year

When i was going through the expected output file to make sure its correct, i noticed this as the above output would not be how it would be shown if i cut-paste the SQLs snippets from the test file and ran it in spark-sql shell.

If you guys think its okay to have the output in sorted form in the expected file, then i will change it back.

marking ShowColumnsCommand as sorted is more weird, I'd like to leave the result sorted.

+1 as mentioned in previous comment.

@cloud-fan @viirya Thanks :-) I will change it.

viirya · 2016-10-11T03:56:32Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

Do we need to explicitly set the current database here? If the current database is not default, the test is not making sense anymore.

@viirya OK.. I agree. I will make the change

cloud-fan · 2016-10-11T08:21:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

FYI, MySQL will treat SHOW COLUMNS FROM db1.tbl1 FROM db2 as SHOW COLUMNS FROM tbl1 FROM db2, i.e. if FROM database is specified, it will just ignore the database specified in table name, instead of reporting error.

we should investigate more databases to see how they handle this case.

Good point!

BTW, what @dilipbiswal does in this is following previous behavior, do we want to break it?

Seems Hive doesn't allow specifying duplicate databases no matter they are the same or not.

https://github.com/apache/hive/blob/21a0142f333fba231f2648db53a48dc41384ad72/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java#L2215

Thanks @viirya for checking hive. Yeah, when we implemented the SHOW columns native command, wde went through the the above hive code. We decided to improve up on the above check and report an error only when the database names are different.

I have no strong option towards this. From my point, MySQL's way might be little confusing users if they don't notice the database name is different.

… configuration.

SparkQA · 2016-10-17T10:27:23Z

Test build #67060 has finished for PR 15423 at commit cb0691c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2016-10-17T17:46:55Z

@viirya @cloud-fan I have incorporated the review comments. Could we please look at this again ?

viirya · 2016-10-18T02:17:12Z

LGTM, see if @cloud-fan has more comments on this or not?

cloud-fan · 2016-10-18T03:50:22Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

  override def run(sparkSession: SparkSession): Seq[Row] = {
    val catalog = sparkSession.sessionState.catalog
-    val table = catalog.getTempViewOrPermanentTableMetadata(tableName)
+    val caseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis


nit: we can simplify it to

val resolver = sparkSession.sessionState.conf.resolver ... case Some(db) if tableName.database.exists(!resolver(_, db))

cloud-fan · 2016-10-18T03:54:09Z

sql/core/src/test/resources/sql-tests/inputs/show_columns.sql

+USE showdb;
+
+CREATE TABLE showcolumn1 (col1 int, `col 2` int);
+CREATE TABLE showcolumn2 (price int, qty int) partitioned by (year int, month int);


can we also test temp view?

SparkQA · 2016-10-19T09:06:57Z

Test build #67177 has finished for PR 15423 at commit 586a6b4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2016-10-20T01:19:01Z

@cloud-fan Hi wenchen, i have added the test cases for temp view. Could we please look at this again? Thanks !

cloud-fan · 2016-10-20T02:38:04Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

+        withTable(tabName) {
+          sql(s"CREATE TABLE $tabName(col1 int, col2 string) USING parquet ")
+          val message = intercept[AnalysisException] {
+          sql(s"SHOW COLUMNS IN $db.showcolumn FROM ${db.toUpperCase}")


nit: wrong ident here.

cloud-fan · 2016-10-20T02:38:53Z

LGTM

SparkQA · 2016-10-20T05:07:32Z

Test build #67237 has started for PR 15423 at commit 15c568f.

viirya · 2016-10-20T08:46:30Z

The tests are passed but the results are failed to post back to github...

viirya · 2016-10-20T08:47:00Z

@cloud-fan Need to run tests again?

cloud-fan · 2016-10-20T11:39:53Z

it's fine, merging to master!

dilipbiswal · 2016-10-20T15:15:03Z

@cloud-fan @viirya Thank you very much !!

…ct case sensitivity configuration ## What changes were proposed in this pull request? SHOW COLUMNS command validates the user supplied database name with database name from qualified table name name to make sure both of them are consistent. This comparison should respect case sensitivity. ## How was this patch tested? Added tests in DDLSuite and existing tests were moved to use new sql based test infrastructure. Author: Dilip Biswal <[email protected]> Closes apache#15423 from dilipbiswal/dkb_show_column_fix.

viirya reviewed Oct 11, 2016

View reviewed changes

cloud-fan reviewed Oct 11, 2016

View reviewed changes

dilipbiswal added 2 commits October 16, 2016 23:06

SHOW COLUMN's database conflict check should respect case sensitivity…

0db083c

… configuration.

Review comments.

cb0691c

dilipbiswal force-pushed the dkb_show_column_fix branch from 3acd08f to cb0691c Compare October 17, 2016 08:14

cloud-fan reviewed Oct 18, 2016

View reviewed changes

review comments

586a6b4

cloud-fan reviewed Oct 20, 2016

View reviewed changes

fix indent

15c568f

asfgit closed this in e895bc2 Oct 20, 2016

[SPARK-17860][SQL] SHOW COLUMN's database conflict check should respect case sensitivity configuration #15423

[SPARK-17860][SQL] SHOW COLUMN's database conflict check should respect case sensitivity configuration #15423

Uh oh!

Conversation

dilipbiswal commented Oct 10, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Oct 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 17, 2016

Uh oh!

dilipbiswal commented Oct 17, 2016

Uh oh!

viirya commented Oct 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 19, 2016

Uh oh!

dilipbiswal commented Oct 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Oct 20, 2016

Uh oh!

SparkQA commented Oct 20, 2016

Uh oh!

viirya commented Oct 20, 2016

Uh oh!

viirya commented Oct 20, 2016

Uh oh!

cloud-fan commented Oct 20, 2016

Uh oh!

dilipbiswal commented Oct 20, 2016

Uh oh!

Reviewers

Assignees

dilipbiswal Oct 11, 2016 •

edited

Loading