[SPARK-18801][SQL][FOLLOWUP] Alias the view with its child #16561

jiangxb1987 · 2017-01-12T10:27:07Z

What changes were proposed in this pull request?

This PR is a follow-up to address the comments https://github.com/apache/spark/pull/16233/files#r95669988 and https://github.com/apache/spark/pull/16233/files#r95662299.

We try to wrap the child by:

Generate the queryOutput by:
1.1. If the query column names are defined, map the column names to attributes in the child output by name;
1.2. Else set the child output attributes to queryOutput.
Map the queryQutput to view output by index, if the corresponding attributes don't match, try to up cast and alias the attribute in queryOutput to the attribute in the view output.
Add a Project over the child, with the new output generated by the previous steps.
If the view output doesn't have the same number of columns neither with the child output, nor with the query column names, throw an AnalysisException.

How was this patch tested?

Add new test cases in SQLViewSuite.

SparkQA · 2017-01-12T13:51:49Z

Test build #71255 has finished for PR 16561 at commit 0e82340.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-12T15:38:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala

+      }
+      val newOutput = output.zip(child.output).map {
+        case (attr, originAttr) =>
+          if (attr.dataType != originAttr.dataType) {


can you check hive's behavior? maybe we can use UpCast here

Seems that Hive supports UpCast between child output and view output, for example:

hive> create table testtable as select 1 a, 2L b; hive> create view testview as select * from testtable; hive> select * from testview; OK 1 2 Time taken: 0.11 seconds, Fetched: 1 row(s) hive> alter table testtable change column a a bigint; hive> alter table testtable change column b b string; hive> desc testtable; OK a bigint b string Time taken: 0.15 seconds, Fetched: 2 row(s) hive> desc testview; OK a int b bigint Time taken: 0.038 seconds, Fetched: 2 row(s) hive> select * from testview; OK 1 2 Time taken: 0.172 seconds, Fetched: 1 row(s)

What should we set for the walkedTypePath here?

It sounds like Hive just forcefully cast it.

hive> explain extended select * from testview; OK ABSTRACT SYNTAX TREE: TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME testview TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR TOK_ALLCOLREF STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: testtable Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: NONE GatherStats: false Select Operator expressions: a (type: bigint), b (type: tinyint) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: NONE ListSink

expressions: a (type: bigint), b (type: tinyint). I tried to alter the columns in the underlying tables to different types. I can see the types of view columns are always casted to the same one as the altered one

SparkQA · 2017-01-15T18:56:05Z

Test build #71400 has finished for PR 16561 at commit 7e19803.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-16T02:27:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

+
+  /**
+   * Return the output column names of the query that creates a view, the column names are used to
+   * resolve a view, should be None if the CatalogTable is not a View or created by older versions


should be Nil

cloud-fan · 2017-01-16T02:30:16Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

 object CatalogTable {
  val VIEW_DEFAULT_DATABASE = "view.default.database"
+  val VIEW_QUERY_OUTPUT_PREFIX = "view.query.out."
+  val VIEW_QUERY_OUTPUT_COLUMN_NUM = VIEW_QUERY_OUTPUT_PREFIX + "numCols"


nit: xxx_NUM_COLUMNS

cloud-fan · 2017-01-16T02:31:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

+   */
+  def viewQueryColumnNames: Seq[String] = {
+    for {
+      numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq


.toSeq is not needed

It is needed to generate the correct output.

cloud-fan · 2017-01-16T02:48:16Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala

+      val queryColumnNames = desc.viewQueryColumnNames
+      // If the view output doesn't have the same number of columns either with the child output,
+      // or with the query column names, throw an AnalysisException.
+      if (output.length != child.output.length && output.length != queryColumnNames.length) {


the comment says or but the code use &&?

SparkQA · 2017-01-16T04:44:10Z

Test build #71415 has finished for PR 16561 at commit d6537a5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-16T05:49:46Z

Test build #71416 has finished for PR 16561 at commit 16ec310.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-16T05:52:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala

+ * child by:
+ * 1. Generate the `queryOutput` by:
+ *    1.1. If the query column names are defined, map the column names to attributes in the child
+ *         output by name;


should we mention that, this is mostly for SELECT * ...?

cloud-fan · 2017-01-16T05:59:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala

+      val queryColumnNames = desc.viewQueryColumnNames
+      // If the view output doesn't have the same number of columns with the child output and the
+      // query column names, throw an AnalysisException.
+      if (output.length != child.output.length && output.length != queryColumnNames.length) {


This condition doesn't look very clear to me. How about if (queryColumnNames.nonEmpty && output.length != queryColumnNames.length)? When queryColumnNames is empty, it means this view is created prior to Spark 2.2, and we don't need to check anything.

cloud-fan · 2017-01-16T06:00:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala

+      }
+      // If the child output is the same with the view output, we don't need to generate the query
+      // output again.
+      val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) {


output != child.output will always be true right?

For a nested view, the inner view operator may have been resolved, in that case the output is the same with child.output.
I have changed the test case SQLViewSuite.test("correctly resolve a nested view") to cover this case.

shall we put this condition after the case? e.g. case v @ View(desc, output, child) if child.resolved && output != child.output

Oh I think that's better!

cloud-fan · 2017-01-16T06:03:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala

+        }
+      } else {
+        child.output
+      }


how about

val queryOutput = if (queryColumnNames.nonEmpty) { if (output.length != queryColumnNames.length) throw ... desc.viewQueryColumnNames.map { colName => findAttributeByName(colName, child.output, resolver) } } else { // For view created before Spark 2.1, the view text is already fully qualified, the plan output is view output. child.output }

cloud-fan · 2017-01-16T07:38:15Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala

+   * Return false iff we may truncate during casting `from` type to `to` type. e.g. long -> int,
+   * timestamp -> date.
+   */
+  def canUpCast(from: DataType, to: DataType): Boolean = (from, to) match {


how about def mayTruncate? canUpCast is not accurate, we may not be able to cast even canUpCast returns true.

SparkQA · 2017-01-16T07:38:43Z

Test build #71426 has started for PR 16561 at commit 21e63f8.

cloud-fan · 2017-01-16T07:39:07Z

LGTM

SparkQA · 2017-01-16T08:20:03Z

Test build #71434 has finished for PR 16561 at commit c86ab48.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-16T08:46:13Z

LGTM

SparkQA · 2017-01-16T10:57:43Z

Test build #71437 has finished for PR 16561 at commit 06e8855.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-16T11:11:48Z

thanks, merging to master!

yhuai · 2017-01-16T22:17:36Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala

+        child.output
+      }
+      // Map the attributes in the query output to the attributes in the view output by index.
+      val newOutput = output.zip(queryOutput).map {


Seems we need to check the size of output and queryOutput.

For views created by older versions of Spark, the view text is fully qualified, so the output is the same with the view output. Or else we have checked that the output have the same length with queryColumnNames. So perhaps we don't need to check the size of output and queryOutput here.

## What changes were proposed in this pull request? This PR is a follow-up to address the comments https://github.com/apache/spark/pull/16233/files#r95669988 and https://github.com/apache/spark/pull/16233/files#r95662299. We try to wrap the child by: 1. Generate the `queryOutput` by: 1.1. If the query column names are defined, map the column names to attributes in the child output by name; 1.2. Else set the child output attributes to `queryOutput`. 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match, try to up cast and alias the attribute in `queryOutput` to the attribute in the view output. 3. Add a Project over the child, with the new output generated by the previous steps. If the view output doesn't have the same number of columns neither with the child output, nor with the query column names, throw an AnalysisException. ## How was this patch tested? Add new test cases in `SQLViewSuite`. Author: jiangxingbo <[email protected]> Closes apache#16561 from jiangxb1987/alias-view.

QQshu1 · 2017-05-04T05:56:32Z

hi , I have a question, why we should Eliminate View in the first of the optimizer.?
thank you.@jiangxb1987

jiangxb1987 · 2017-05-04T07:12:18Z

@QQshu1 As we have mentioned in the comment, the View operator is respected till the end of analysis stage to enable us better understand the analyzed logical plan. On the Beginning of optimize stage, that operator is no longer needed, so we apply the EliminateView rule.

QQshu1 · 2017-05-04T07:20:10Z

@jiangxb1987 thanks, What effect if we don`t Eliminate View？I means whether it effect optimize the tree namely performance or the correctness of results ?

cloud-fan · 2017-05-04T13:11:23Z

the View operator doesn't have a corresponding physical operator, we have to remove it.

alias view with child by map the columns by index.

0e82340

cloud-fan reviewed Jan 12, 2017

View reviewed changes

jiangxb1987 changed the title ~~[SPARK-18209][SQL] Alias the view with its child by mapping the columns by index~~ [SPARK-18209][SQL][FOLLOWUP] Alias the view with its child by mapping the columns by index Jan 12, 2017

UpCast the child output to the view output.

179a265

jiangxb1987 changed the title ~~[SPARK-18209][SQL][FOLLOWUP] Alias the view with its child by mapping the columns by index~~ [SPARK-18801][SQL][FOLLOWUP] Alias the view with its child Jan 13, 2017

support resolve a view created by select star properly.

7e19803

canUpCast() should not overlap with canCast().

d6537a5

cloud-fan reviewed Jan 16, 2017

View reviewed changes

update comments.

16ec310

cloud-fan reviewed Jan 16, 2017

View reviewed changes

code refactor.

21e63f8

cloud-fan reviewed Jan 16, 2017

View reviewed changes

canUpCast() -> mayTruncate()

c86ab48

fix scala style fail.

06e8855

asfgit closed this in e635cbb Jan 16, 2017

yhuai reviewed Jan 16, 2017

View reviewed changes

jiangxb1987 deleted the alias-view branch March 16, 2017 06:43

[SPARK-18801][SQL][FOLLOWUP] Alias the view with its child #16561

[SPARK-18801][SQL][FOLLOWUP] Alias the view with its child #16561

Uh oh!

Conversation

jiangxb1987 commented Jan 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jan 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 16, 2017

Uh oh!

SparkQA commented Jan 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 Jan 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 16, 2017

Uh oh!

cloud-fan commented Jan 16, 2017

Uh oh!

SparkQA commented Jan 16, 2017

Uh oh!

cloud-fan commented Jan 16, 2017

Uh oh!

SparkQA commented Jan 16, 2017

Uh oh!

cloud-fan commented Jan 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QQshu1 commented May 4, 2017

Uh oh!

jiangxb1987 commented May 4, 2017

Uh oh!

QQshu1 commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangxb1987 commented Jan 12, 2017 •

edited

Loading

jiangxb1987 Jan 16, 2017 •

edited

Loading

QQshu1 commented May 4, 2017 •

edited

Loading