-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17860][SQL] SHOW COLUMN's database conflict check should respect case sensitivity configuration #15423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #66692 has finished for PR 15423 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ShowColumnsCommand is sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya I added the case here because of
https://github.com/dilipbiswal/spark/blob/3acd08f9431d6cdfe4d043aa342d09fc0ebfa497/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L222
I didn't want the output of ShowColumnsCommand to be sorted before comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it affect comparison result? I think the result is generated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya So it seemed odd to have the generated output files to have column names sorted which didn't reflect the column order of create table. In the test case i had the table create like following.
CREATE TABLE showcolumn2 (price int, qty int) partitioned by (year int, month int)It seemed odd to me to have the generated output file report the columns as month, price, qty and year as opposed to price, qty, year and month. Let me know if i am missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think it is odd because we just want to compare the results. Adding ShowColumnsCommand to sorted op looks more odd to me. cc @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it break test if we don't mark it as sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan @viirya Actually it does not break the test if we don't mark it as sorted. What happens is that, when we generate the expected output file, the results appear sorted like following:
-- !query 7
SHOW COLUMNS IN showcolumn2 IN showdb
-- !query 7 schema
struct<col_name:string>
-- !query 7 output
month
price
qty
yearWhen i was going through the expected output file to make sure its correct, i noticed this as the above output would not be how it would be shown if i cut-paste the SQLs snippets from the test file and ran it in spark-sql shell.
If you guys think its okay to have the output in sorted form in the expected file, then i will change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marking ShowColumnsCommand as sorted is more weird, I'd like to leave the result sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 as mentioned in previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan @viirya Thanks :-) I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explicitly set the current database here? If the current database is not default, the test is not making sense anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya OK.. I agree. I will make the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, MySQL will treat SHOW COLUMNS FROM db1.tbl1 FROM db2 as SHOW COLUMNS FROM tbl1 FROM db2, i.e. if FROM database is specified, it will just ignore the database specified in table name, instead of reporting error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should investigate more databases to see how they handle this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, what @dilipbiswal does in this is following previous behavior, do we want to break it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems Hive doesn't allow specifying duplicate databases no matter they are the same or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya for checking hive. Yeah, when we implemented the SHOW columns native command, wde went through the the above hive code. We decided to improve up on the above check and report an error only when the database names are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong option towards this. From my point, MySQL's way might be little confusing users if they don't notice the database name is different.
3acd08f to
cb0691c
Compare
|
Test build #67060 has finished for PR 15423 at commit
|
|
@viirya @cloud-fan I have incorporated the review comments. Could we please look at this again ? |
|
LGTM, see if @cloud-fan has more comments on this or not? |
| override def run(sparkSession: SparkSession): Seq[Row] = { | ||
| val catalog = sparkSession.sessionState.catalog | ||
| val table = catalog.getTempViewOrPermanentTableMetadata(tableName) | ||
| val caseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can simplify it to
val resolver = sparkSession.sessionState.conf.resolver
...
case Some(db) if tableName.database.exists(!resolver(_, db))
| USE showdb; | ||
|
|
||
| CREATE TABLE showcolumn1 (col1 int, `col 2` int); | ||
| CREATE TABLE showcolumn2 (price int, qty int) partitioned by (year int, month int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also test temp view?
|
Test build #67177 has finished for PR 15423 at commit
|
|
@cloud-fan Hi wenchen, i have added the test cases for temp view. Could we please look at this again? Thanks ! |
| withTable(tabName) { | ||
| sql(s"CREATE TABLE $tabName(col1 int, col2 string) USING parquet ") | ||
| val message = intercept[AnalysisException] { | ||
| sql(s"SHOW COLUMNS IN $db.showcolumn FROM ${db.toUpperCase}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrong ident here.
|
LGTM |
|
Test build #67237 has started for PR 15423 at commit |
|
The tests are passed but the results are failed to post back to github... |
|
@cloud-fan Need to run tests again? |
|
it's fine, merging to master! |
|
@cloud-fan @viirya Thank you very much !! |
…ct case sensitivity configuration ## What changes were proposed in this pull request? SHOW COLUMNS command validates the user supplied database name with database name from qualified table name name to make sure both of them are consistent. This comparison should respect case sensitivity. ## How was this patch tested? Added tests in DDLSuite and existing tests were moved to use new sql based test infrastructure. Author: Dilip Biswal <[email protected]> Closes apache#15423 from dilipbiswal/dkb_show_column_fix.
…ct case sensitivity configuration ## What changes were proposed in this pull request? SHOW COLUMNS command validates the user supplied database name with database name from qualified table name name to make sure both of them are consistent. This comparison should respect case sensitivity. ## How was this patch tested? Added tests in DDLSuite and existing tests were moved to use new sql based test infrastructure. Author: Dilip Biswal <[email protected]> Closes apache#15423 from dilipbiswal/dkb_show_column_fix.
What changes were proposed in this pull request?
SHOW COLUMNS command validates the user supplied database
name with database name from qualified table name name to make
sure both of them are consistent. This comparison should respect
case sensitivity.
How was this patch tested?
Added tests in DDLSuite and existing tests were moved to use new sql based test infrastructure.