Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented May 24, 2017

What changes were proposed in this pull request?

This pr added parsing rules to support table column aliases in FROM clause.

How was this patch tested?

Added tests in PlanParserSuite, SQLQueryTestSuite, and PlanParserSuite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is not related to this pr though, this is not updated in 3c9eef3

@SparkQA
Copy link

SparkQA commented May 24, 2017

Test build #77270 has finished for PR 18079 at commit e2b50f3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedRelation(

@SparkQA
Copy link

SparkQA commented May 24, 2017

Test build #77272 has finished for PR 18079 at commit bb68f65.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedRelation(

@SparkQA
Copy link

SparkQA commented May 24, 2017

Test build #77280 has finished for PR 18079 at commit 902d2a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedRelation(

Copy link
Member

@gatorsmile gatorsmile May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: keep it unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this USING entry caused a failure in PlanParserSuite;
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L336

- joins *** FAILED ***
  == FAIL: Plans do not match ===
   'Project [*]                               'Project [*]
  !+- 'Join Inner                             +- 'Join UsingJoin(Inner,List(a, b))
      :- 'UnresolvedRelation `t`                 :- 'UnresolvedRelation `t`
  !   +- 'SubqueryAlias using                    +- 'UnresolvedRelation `u`
  !      +- 'UnresolvedRelation `u`, [a, b] (PlanTest.scala:97)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: keep it unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just change it to tableIdentifier sample? tableAlias?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: UnresolvedRelation(t: TableIdentifier, _)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: UnresolvedRelation(t: TableIdentifier, _)

@gatorsmile
Copy link
Member

gatorsmile commented May 24, 2017

Could you check the alias precedence of the other database?

select col1 as a, col2 as b from t1 as t(c, d);

Which alias should be used as the output schema? (a, b) or (c, d)?

@maropu
Copy link
Member Author

maropu commented May 25, 2017

ok, I'll check.

@maropu
Copy link
Member Author

maropu commented May 25, 2017

Currently, we have the same behaviour;

-- PostgreSQL
postgres=# create table t1(col1 int, col2 int);
CREATE TABLE

postgres=# select col1 as a, col2 as b from t1 as t(c, d);
ERROR:  column "col1" does not exist at character 8
STATEMENT:  select col1 as a, col2 as b from t1 as t(c, d);
ERROR:  column "col1" does not exist
LINE 1: select col1 as a, col2 as b from t1 as t(c, d);
               ^
-- MySQL
mysql> create table t1(col1 int, col2 int);
Query OK, 0 rows affected (0.01 sec)

mysql> select col1 as a, col2 as b from t1 as t(c, d);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(c, d)' at line 1
mysql> 


-- Spark with this pr
scala> Seq((1, 2), (2, 3)).toDF("col1", "col2").createOrReplaceTempView("t")

scala> sql("select col1 as a, col2 as b from t as t(c, d)").show
org.apache.spark.sql.AnalysisException: cannot resolve '`col1`' given input columns: [c, d]; line 1 pos 7;
'Project ['col1 AS a#18, 'col2 AS b#19]
+- SubqueryAlias t
   +- Project [col1#5 AS c#20, col2#6 AS d#21]
      +- SubqueryAlias t
         +- Project [_1#2 AS col1#5, _2#3 AS col2#6]
            +- LocalRelation [_1#2, _2#3]

@maropu
Copy link
Member Author

maropu commented May 25, 2017

@gatorsmile oh, you joined Databricks :))) congrats!

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77322 has finished for PR 18079 at commit b0e5805.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

It sounds like PostgresSQL supports it. See the docs in https://www.postgresql.org/docs/9.2/static/sql-select.html

Actually, we also need to support the other alias in the from clauses:

See the link: https://drill.apache.org/docs/from-clause/ and http://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause30.html

   with_subquery_table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
   table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
   ( subquery ) [ AS ] alias [ ( column_alias [, ...] ) ]

@maropu
Copy link
Member Author

maropu commented May 25, 2017

yea, I think so. I mean, in the name conflict case you described above, postgresql throws an error;

postgres=# select * from testTable;
 col1 | col2 
------+------
(0 rows)

postgres=# select * from testTable t(a, b);
 a | b 
---+---
(0 rows)

postgres=# select col1 a, col2 b from testTable t(c, d);
ERROR:  column "col1" does not exist at character 8
STATEMENT:  select col1 a, col2 b from testTable t(c, d);
ERROR:  column "col1" does not exist
LINE 1: select col1 a, col2 b from testTable t(c, d);

We do not currently support aliases for subquries. Should we include that support in this pr? Or, follow-up?

@maropu
Copy link
Member Author

maropu commented May 26, 2017

ping

@gatorsmile
Copy link
Member

gatorsmile commented May 28, 2017

Yeah. That should be a negative case. The FROM clause should be resolved before SELECT clause.

The PR title is not accurate. I think we should keep the original JIRA name. Support table column aliases in FROM clause.

Yeah. These cases should be part of this JIRA. Please add the sub-tasks under this JIRA. Follow what Redshift documents and do them one by one?http://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause30.html

with_subquery_table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
( subquery ) [ AS ] alias [ ( column_alias [, ...] ) ]
table_reference [ NATURAL ] join_type table_reference [ ON join_condition | USING ( join_column [, ...] ) ]

Copy link
Member

@gatorsmile gatorsmile May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, based on the link http://developer.mimer.se/validator/sql-reserved-words.tml, USING is a reserved word in the ANSI SQL standard since SQL-92.

Second, since 1.2, Hive introduces a flag hive.support.sql11.reserved.keywords for backward compatbility, which defaults to true.

Added In: Hive 1.2.0 with HIVE-6617: https://issues.apache.org/jira/browse/HIVE-6617
Whether to enable support for SQL2011 reserved keywords. When enabled, will support (part of) SQL2011 reserved keywords.

In 2.2, Hive removes this flag and does not allow users to change it to false. That means, users are unable to use these reserved words as identifiers anymore, unless using them as quoted identifiers. See: https://issues.apache.org/jira/browse/HIVE-14872

Thus, I think it is safe to remove USING from the non-reserved words.

cc @hvanhovell @cloud-fan

Copy link
Member Author

@maropu maropu May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, how did we decide these non-reserved words for Spark? It seems a lot of non-reserved words (e.g., CUBE and GROUPING) in Spark are the reserved ones in the ANSI standard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added as much to the non-reserved keyword list as possible (without creating ambiguities). The reason for this is that many datasources (for instance twitter4j) unfortunately use reserved keywords for column names, and working with these was quite cumbersome. I took the pragmatic approach.

If we want to change this, then we need to do the same Hive did and create a config flag. We remove them for Spark 3.0...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also gives a simple (SQL) example here to explain why we did this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Checks if the number of the aliases equals to the number of columns in the table.

Copy link
Member

@gatorsmile gatorsmile May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number of column aliases does not match number of columns. Table name: ${u.tableName}; number of column aliases: ${u.outputNames.size}; number of columns: ${outputAttrs.size}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add @param for these two parms

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@maropu maropu changed the title [SPARK-20841][SQL] Support column aliases for catalog tables [SPARK-20841][SQL] Support table column aliases in FROM clause May 28, 2017
@maropu
Copy link
Member Author

maropu commented May 28, 2017

@gatorsmile ok, thanks for your suggestion. I'll check the doc. and make sub-tasks there.

@SparkQA
Copy link

SparkQA commented May 28, 2017

Test build #77485 has finished for PR 18079 at commit 25415ed.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member Author

@maropu maropu May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is not related to this pr though, I modified along with this fix: https://github.com/apache/spark/pull/18079/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R604

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is not related to this pr though, I modified along with this fix: https://github.com/apache/spark/pull/18079/files#diff-b4f9cbed8a042aeb12aeceb13b39d25aR50

@SparkQA
Copy link

SparkQA commented May 28, 2017

Test build #77486 has finished for PR 18079 at commit 49186ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


val tableWithAlias = Option(ctx.strictIdentifier).map(_.getText) match {
val tableId = visitTableIdentifier(ctx.tableIdentifier)
val table = Option(ctx.tableAlias.identifierList) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just if/else? Seems a bit heavy weight to wrap in an option...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or something like this:

val outputNames = Option(ctx.tableAlias.identifierList).map(visitIdentifierList).getOrElse(Nil)
val table = UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier), outputNames)

@@ -0,0 +1,17 @@
-- Test data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we name this file table-aliases.sql; that seems a little bit less confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my end. I'll leave the final sign-off to @gatorsmile

@gatorsmile
Copy link
Member

LGTM too! : )

@SparkQA
Copy link

SparkQA commented May 28, 2017

Test build #77489 has finished for PR 18079 at commit 8b505c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 24d3428 May 28, 2017
asfgit pushed a commit that referenced this pull request Jul 29, 2017
## What changes were proposed in this pull request?
This pr added parsing rules to support subquery column aliases in FROM clause.
This pr is a sub-task of #18079.

## How was this patch tested?
Added tests in `PlanParserSuite` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes #18185 from maropu/SPARK-20962.
wangyum pushed a commit to 1haodian/spark that referenced this pull request Aug 6, 2017
…clause

## What changes were proposed in this pull request?
This pr added parsing rules to support column aliases for join relations in FROM clause.
This pr is a sub-task of apache#18079.

## How was this patch tested?
Added tests in `AnalysisSuite`, `PlanParserSuite,` and `SQLQueryTestSuite`.

Author: Takeshi Yamamuro <[email protected]>

Closes apache#18772 from maropu/SPARK-20963-2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants