[SPARK-20841][SQL] Support table column aliases in FROM clause #18079

maropu · 2017-05-24T00:02:59Z

What changes were proposed in this pull request?

This pr added parsing rules to support table column aliases in FROM clause.

How was this patch tested?

Added tests in PlanParserSuite, SQLQueryTestSuite, and PlanParserSuite.

maropu · 2017-05-24T00:03:39Z

sql/core/src/test/resources/sql-tests/results/operators.sql.out

This fix is not related to this pr though, this is not updated in 3c9eef3

SparkQA · 2017-05-24T00:09:49Z

Test build #77270 has finished for PR 18079 at commit e2b50f3.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UnresolvedRelation(

SparkQA · 2017-05-24T02:36:38Z

Test build #77272 has finished for PR 18079 at commit bb68f65.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UnresolvedRelation(

SparkQA · 2017-05-24T05:11:23Z

Test build #77280 has finished for PR 18079 at commit 902d2a3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UnresolvedRelation(

gatorsmile · 2017-05-24T16:45:51Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Nit: keep it unchanged.

I found this USING entry caused a failure in PlanParserSuite;
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L336

- joins *** FAILED *** == FAIL: Plans do not match === 'Project [*] 'Project [*] !+- 'Join Inner +- 'Join UsingJoin(Inner,List(a, b)) :- 'UnresolvedRelation `t` :- 'UnresolvedRelation `t` ! +- 'SubqueryAlias using +- 'UnresolvedRelation `u` ! +- 'UnresolvedRelation `u`, [a, b] (PlanTest.scala:97)

gatorsmile · 2017-05-24T16:46:32Z

...atalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala

Nit: keep it unchanged.

gatorsmile · 2017-05-24T17:08:36Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Just change it to tableIdentifier sample? tableAlias?

gatorsmile · 2017-05-24T17:10:07Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala

Nit: UnresolvedRelation(t: TableIdentifier, _)

gatorsmile · 2017-05-24T17:10:12Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala

Nit: UnresolvedRelation(t: TableIdentifier, _)

gatorsmile · 2017-05-24T17:46:27Z

Could you check the alias precedence of the other database?

select col1 as a, col2 as b from t1 as t(c, d);

Which alias should be used as the output schema? (a, b) or (c, d)?

maropu · 2017-05-25T00:35:07Z

ok, I'll check.

maropu · 2017-05-25T01:07:03Z

Currently, we have the same behaviour;

-- PostgreSQL
postgres=# create table t1(col1 int, col2 int);
CREATE TABLE

postgres=# select col1 as a, col2 as b from t1 as t(c, d);
ERROR:  column "col1" does not exist at character 8
STATEMENT:  select col1 as a, col2 as b from t1 as t(c, d);
ERROR:  column "col1" does not exist
LINE 1: select col1 as a, col2 as b from t1 as t(c, d);
               ^
-- MySQL
mysql> create table t1(col1 int, col2 int);
Query OK, 0 rows affected (0.01 sec)

mysql> select col1 as a, col2 as b from t1 as t(c, d);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(c, d)' at line 1
mysql> 


-- Spark with this pr
scala> Seq((1, 2), (2, 3)).toDF("col1", "col2").createOrReplaceTempView("t")

scala> sql("select col1 as a, col2 as b from t as t(c, d)").show
org.apache.spark.sql.AnalysisException: cannot resolve '`col1`' given input columns: [c, d]; line 1 pos 7;
'Project ['col1 AS a#18, 'col2 AS b#19]
+- SubqueryAlias t
   +- Project [col1#5 AS c#20, col2#6 AS d#21]
      +- SubqueryAlias t
         +- Project [_1#2 AS col1#5, _2#3 AS col2#6]
            +- LocalRelation [_1#2, _2#3]

maropu · 2017-05-25T01:13:58Z

@gatorsmile oh, you joined Databricks :))) congrats!

SparkQA · 2017-05-25T03:52:33Z

Test build #77322 has finished for PR 18079 at commit b0e5805.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-25T06:54:42Z

It sounds like PostgresSQL supports it. See the docs in https://www.postgresql.org/docs/9.2/static/sql-select.html

Actually, we also need to support the other alias in the from clauses:

See the link: https://drill.apache.org/docs/from-clause/ and http://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause30.html

   with_subquery_table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
   table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
   ( subquery ) [ AS ] alias [ ( column_alias [, ...] ) ]

maropu · 2017-05-25T07:56:25Z

yea, I think so. I mean, in the name conflict case you described above, postgresql throws an error;

postgres=# select * from testTable;
 col1 | col2 
------+------
(0 rows)

postgres=# select * from testTable t(a, b);
 a | b 
---+---
(0 rows)

postgres=# select col1 a, col2 b from testTable t(c, d);
ERROR:  column "col1" does not exist at character 8
STATEMENT:  select col1 a, col2 b from testTable t(c, d);
ERROR:  column "col1" does not exist
LINE 1: select col1 a, col2 b from testTable t(c, d);

We do not currently support aliases for subquries. Should we include that support in this pr? Or, follow-up?

maropu · 2017-05-26T08:26:46Z

ping

gatorsmile · 2017-05-28T04:58:10Z

Yeah. That should be a negative case. The FROM clause should be resolved before SELECT clause.

The PR title is not accurate. I think we should keep the original JIRA name. Support table column aliases in FROM clause.

Yeah. These cases should be part of this JIRA. Please add the sub-tasks under this JIRA. Follow what Redshift documents and do them one by one?http://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause30.html

with_subquery_table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
( subquery ) [ AS ] alias [ ( column_alias [, ...] ) ]
table_reference [ NATURAL ] join_type table_reference [ ON join_condition | USING ( join_column [, ...] ) ]

gatorsmile · 2017-05-28T06:13:55Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

First, based on the link http://developer.mimer.se/validator/sql-reserved-words.tml, USING is a reserved word in the ANSI SQL standard since SQL-92.

Second, since 1.2, Hive introduces a flag hive.support.sql11.reserved.keywords for backward compatbility, which defaults to true.

Added In: Hive 1.2.0 with HIVE-6617: https://issues.apache.org/jira/browse/HIVE-6617
Whether to enable support for SQL2011 reserved keywords. When enabled, will support (part of) SQL2011 reserved keywords.

In 2.2, Hive removes this flag and does not allow users to change it to false. That means, users are unable to use these reserved words as identifiers anymore, unless using them as quoted identifiers. See: https://issues.apache.org/jira/browse/HIVE-14872

Thus, I think it is safe to remove USING from the non-reserved words.

cc @hvanhovell @cloud-fan

BTW, how did we decide these non-reserved words for Spark? It seems a lot of non-reserved words (e.g., CUBE and GROUPING) in Spark are the reserved ones in the ANSI standard

I have added as much to the non-reserved keyword list as possible (without creating ambiguities). The reason for this is that many datasources (for instance twitter4j) unfortunately use reserved keywords for column names, and working with these was quite cumbersome. I took the pragmatic approach.

If we want to change this, then we need to do the same Hive did and create a config flag. We remove them for Spark 3.0...

gatorsmile · 2017-05-28T06:19:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Also gives a simple (SQL) example here to explain why we did this.

gatorsmile · 2017-05-28T06:20:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

// Checks if the number of the aliases equals to the number of columns in the table.

gatorsmile · 2017-05-28T06:24:49Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Number of column aliases does not match number of columns. Table name: ${u.tableName}; number of column aliases: ${u.outputNames.size}; number of columns: ${outputAttrs.size}.

gatorsmile · 2017-05-28T06:26:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala

Please add @param for these two parms

maropu · 2017-05-28T07:28:52Z

@gatorsmile ok, thanks for your suggestion. I'll check the doc. and make sub-tasks there.

SparkQA · 2017-05-28T09:19:14Z

Test build #77485 has finished for PR 18079 at commit 25415ed.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-05-28T09:19:27Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala

This fix is not related to this pr though, I modified along with this fix: https://github.com/apache/spark/pull/18079/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R604

maropu · 2017-05-28T09:20:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala

This fix is not related to this pr though, I modified along with this fix: https://github.com/apache/spark/pull/18079/files#diff-b4f9cbed8a042aeb12aeceb13b39d25aR50

SparkQA · 2017-05-28T11:54:40Z

Test build #77486 has finished for PR 18079 at commit 49186ac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2017-05-28T14:23:11Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

-
-    val tableWithAlias = Option(ctx.strictIdentifier).map(_.getText) match {
+    val tableId = visitTableIdentifier(ctx.tableIdentifier)
+    val table = Option(ctx.tableAlias.identifierList) match {


Just if/else? Seems a bit heavy weight to wrap in an option...

...or something like this:

val outputNames = Option(ctx.tableAlias.identifierList).map(visitIdentifierList).getOrElse(Nil) val table = UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier), outputNames)

hvanhovell · 2017-05-28T14:42:56Z

sql/core/src/test/resources/sql-tests/inputs/aliases.sql

@@ -0,0 +1,17 @@
+-- Test data.


How about we name this file table-aliases.sql; that seems a little bit less confusing.

hvanhovell

LGTM from my end. I'll leave the final sign-off to @gatorsmile

gatorsmile · 2017-05-28T17:14:09Z

LGTM too! : )

SparkQA · 2017-05-28T18:52:04Z

Test build #77489 has finished for PR 18079 at commit 8b505c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-28T20:25:26Z

Thanks! Merging to master.

## What changes were proposed in this pull request? This pr added parsing rules to support subquery column aliases in FROM clause. This pr is a sub-task of #18079. ## How was this patch tested? Added tests in `PlanParserSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes #18185 from maropu/SPARK-20962.

…clause ## What changes were proposed in this pull request? This pr added parsing rules to support column aliases for join relations in FROM clause. This pr is a sub-task of apache#18079. ## How was this patch tested? Added tests in `AnalysisSuite`, `PlanParserSuite,` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <[email protected]> Closes apache#18772 from maropu/SPARK-20963-2.

maropu commented May 24, 2017

View reviewed changes

maropu force-pushed the SPARK-20841 branch from e2b50f3 to bb68f65 Compare May 24, 2017 00:38

maropu force-pushed the SPARK-20841 branch from bb68f65 to 902d2a3 Compare May 24, 2017 02:44

gatorsmile reviewed May 24, 2017

View reviewed changes

gatorsmile reviewed May 28, 2017

View reviewed changes

maropu changed the title ~~[SPARK-20841][SQL] Support column aliases for catalog tables~~ [SPARK-20841][SQL] Support table column aliases in FROM clause May 28, 2017

maropu added 2 commits May 28, 2017 16:33

Support column aliases for catalog tables

aff9b95

Apply xiao's review

51dc896

maropu force-pushed the SPARK-20841 branch from b0e5805 to 25415ed Compare May 28, 2017 09:15

maropu commented May 28, 2017

View reviewed changes

Apply more review comments

49186ac

maropu force-pushed the SPARK-20841 branch from 25415ed to 49186ac Compare May 28, 2017 09:22

hvanhovell reviewed May 28, 2017

View reviewed changes

hvanhovell approved these changes May 28, 2017

View reviewed changes

Apply Herman's reviews

8b505c2

asfgit closed this in 24d3428 May 28, 2017

maropu mentioned this pull request Jun 2, 2017

[SPARK-20962][SQL] Support subquery column aliases in FROM clause #18185

Closed

maropu mentioned this pull request Jul 29, 2017

[SPARK-20963][SQL] Support column aliases for join relations in FROM clause #18772

Closed

[SPARK-20841][SQL] Support table column aliases in FROM clause #18079

[SPARK-20841][SQL] Support table column aliases in FROM clause #18079

Uh oh!

Conversation

maropu commented May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 24, 2017

Uh oh!

SparkQA commented May 24, 2017

Uh oh!

SparkQA commented May 24, 2017

Uh oh!

gatorsmile May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented May 25, 2017

Uh oh!

maropu commented May 25, 2017

Uh oh!

maropu commented May 25, 2017

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

gatorsmile commented May 25, 2017

Uh oh!

maropu commented May 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented May 26, 2017

Uh oh!

gatorsmile commented May 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile May 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu May 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile May 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented May 28, 2017

Uh oh!

maropu commented May 24, 2017 •

edited

Loading

gatorsmile May 24, 2017 •

edited

Loading

gatorsmile commented May 24, 2017 •

edited

Loading

maropu commented May 25, 2017 •

edited

Loading

gatorsmile commented May 28, 2017 •

edited

Loading

gatorsmile May 28, 2017 •

edited

Loading

maropu May 28, 2017 •

edited

Loading

gatorsmile May 28, 2017 •

edited

Loading

maropu May 28, 2017 •

edited

Loading