-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20841][SQL] Support table column aliases in FROM clause #18079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix is not related to this pr though, this is not updated in 3c9eef3
|
Test build #77270 has finished for PR 18079 at commit
|
|
Test build #77272 has finished for PR 18079 at commit
|
|
Test build #77280 has finished for PR 18079 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: keep it unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this USING entry caused a failure in PlanParserSuite;
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L336
- joins *** FAILED ***
== FAIL: Plans do not match ===
'Project [*] 'Project [*]
!+- 'Join Inner +- 'Join UsingJoin(Inner,List(a, b))
:- 'UnresolvedRelation `t` :- 'UnresolvedRelation `t`
! +- 'SubqueryAlias using +- 'UnresolvedRelation `u`
! +- 'UnresolvedRelation `u`, [a, b] (PlanTest.scala:97)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: keep it unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just change it to tableIdentifier sample? tableAlias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: UnresolvedRelation(t: TableIdentifier, _)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: UnresolvedRelation(t: TableIdentifier, _)
|
Could you check the alias precedence of the other database? select col1 as a, col2 as b from t1 as t(c, d);Which alias should be used as the output schema? |
|
ok, I'll check. |
|
Currently, we have the same behaviour; |
|
@gatorsmile oh, you joined Databricks :))) congrats! |
|
Test build #77322 has finished for PR 18079 at commit
|
|
It sounds like PostgresSQL supports it. See the docs in https://www.postgresql.org/docs/9.2/static/sql-select.html Actually, we also need to support the other alias in the from clauses: See the link: https://drill.apache.org/docs/from-clause/ and http://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause30.html with_subquery_table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
( subquery ) [ AS ] alias [ ( column_alias [, ...] ) ] |
|
yea, I think so. I mean, in the name conflict case you described above, postgresql throws an error; We do not currently support aliases for subquries. Should we include that support in this pr? Or, follow-up? |
|
ping |
|
Yeah. That should be a negative case. The The PR title is not accurate. I think we should keep the original JIRA name. Yeah. These cases should be part of this JIRA. Please add the sub-tasks under this JIRA. Follow what Redshift documents and do them one by one?http://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause30.html with_subquery_table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
( subquery ) [ AS ] alias [ ( column_alias [, ...] ) ]
table_reference [ NATURAL ] join_type table_reference [ ON join_condition | USING ( join_column [, ...] ) ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, based on the link http://developer.mimer.se/validator/sql-reserved-words.tml, USING is a reserved word in the ANSI SQL standard since SQL-92.
Second, since 1.2, Hive introduces a flag hive.support.sql11.reserved.keywords for backward compatbility, which defaults to true.
Added In: Hive 1.2.0 with HIVE-6617: https://issues.apache.org/jira/browse/HIVE-6617
Whether to enable support for SQL2011 reserved keywords. When enabled, will support (part of) SQL2011 reserved keywords.
In 2.2, Hive removes this flag and does not allow users to change it to false. That means, users are unable to use these reserved words as identifiers anymore, unless using them as quoted identifiers. See: https://issues.apache.org/jira/browse/HIVE-14872
Thus, I think it is safe to remove USING from the non-reserved words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, how did we decide these non-reserved words for Spark? It seems a lot of non-reserved words (e.g., CUBE and GROUPING) in Spark are the reserved ones in the ANSI standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added as much to the non-reserved keyword list as possible (without creating ambiguities). The reason for this is that many datasources (for instance twitter4j) unfortunately use reserved keywords for column names, and working with these was quite cumbersome. I took the pragmatic approach.
If we want to change this, then we need to do the same Hive did and create a config flag. We remove them for Spark 3.0...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also gives a simple (SQL) example here to explain why we did this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Checks if the number of the aliases equals to the number of columns in the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of column aliases does not match number of columns. Table name:
${u.tableName}; number of column aliases: ${u.outputNames.size}; number of columns: ${outputAttrs.size}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add @param for these two parms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
@gatorsmile ok, thanks for your suggestion. I'll check the doc. and make sub-tasks there. |
|
Test build #77485 has finished for PR 18079 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix is not related to this pr though, I modified along with this fix: https://github.com/apache/spark/pull/18079/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R604
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix is not related to this pr though, I modified along with this fix: https://github.com/apache/spark/pull/18079/files#diff-b4f9cbed8a042aeb12aeceb13b39d25aR50
|
Test build #77486 has finished for PR 18079 at commit
|
|
|
||
| val tableWithAlias = Option(ctx.strictIdentifier).map(_.getText) match { | ||
| val tableId = visitTableIdentifier(ctx.tableIdentifier) | ||
| val table = Option(ctx.tableAlias.identifierList) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just if/else? Seems a bit heavy weight to wrap in an option...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...or something like this:
val outputNames = Option(ctx.tableAlias.identifierList).map(visitIdentifierList).getOrElse(Nil)
val table = UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier), outputNames)| @@ -0,0 +1,17 @@ | |||
| -- Test data. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we name this file table-aliases.sql; that seems a little bit less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my end. I'll leave the final sign-off to @gatorsmile
|
LGTM too! : ) |
|
Test build #77489 has finished for PR 18079 at commit
|
|
Thanks! Merging to master. |
## What changes were proposed in this pull request? This pr added parsing rules to support subquery column aliases in FROM clause. This pr is a sub-task of #18079. ## How was this patch tested? Added tests in `PlanParserSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes #18185 from maropu/SPARK-20962.
…clause ## What changes were proposed in this pull request? This pr added parsing rules to support column aliases for join relations in FROM clause. This pr is a sub-task of apache#18079. ## How was this patch tested? Added tests in `AnalysisSuite`, `PlanParserSuite,` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes apache#18772 from maropu/SPARK-20963-2.
What changes were proposed in this pull request?
This pr added parsing rules to support table column aliases in FROM clause.
How was this patch tested?
Added tests in
PlanParserSuite,SQLQueryTestSuite, andPlanParserSuite.