[SPARK-22834][SQL] Make insertion commands have real children to fix UI issues #20020

gengliangwang · 2017-12-19T14:37:23Z

What changes were proposed in this pull request?

With #19474, children of insertion commands are missing in UI.
To fix it:

Create a new physical plan DataWritingCommandExec to exec DataWritingCommand with children. So that the other commands won't be affected.
On creation of DataWritingCommand, a new field allColumns must be specified, which is the output of analyzed plan.
In FileFormatWriter, the output schema will use allColumns instead of the output of optimized plan.

Before code changes:

After code changes:

How was this patch tested?

Unit test

SparkQA · 2017-12-19T14:53:26Z

Test build #85110 has finished for PR 20020 at commit d354895.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait DataWritingCommand extends LogicalPlan
case class DataWritingCommandExec(cmd: DataWritingCommand, children: Seq[SparkPlan])

SparkQA · 2017-12-19T17:44:37Z

Test build #85113 has finished for PR 20020 at commit 2d58187.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-20T08:05:02Z

Test build #85159 has finished for PR 20020 at commit e25a9eb.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait DataWritingCommand extends Command
case class DataWritingCommandExec(cmd: DataWritingCommand, children: Seq[SparkPlan])

gengliangwang · 2017-12-20T08:07:31Z

retest this please

SparkQA · 2017-12-20T11:14:17Z

Test build #85161 has finished for PR 20020 at commit e25a9eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait DataWritingCommand extends Command
case class DataWritingCommandExec(cmd: DataWritingCommand, children: Seq[SparkPlan])

cloud-fan · 2017-12-20T13:11:54Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala

shall we force all sub-classes to implement it?

cloud-fan · 2017-12-20T13:15:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala

why do we need this?

cloud-fan · 2017-12-20T13:43:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala

now shall we define query as a child here?

cloud-fan · 2017-12-20T13:50:46Z

since now we let inserting commands have a child, it makes sense to wrap the child with AnalysisBarrier, to avoid re-analyzing them. also cc @viirya

SparkQA · 2017-12-20T17:24:55Z

Test build #85193 has finished for PR 20020 at commit 7ccfd90.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-12-21T01:40:15Z

retest this please

SparkQA · 2017-12-21T04:16:06Z

Test build #85225 has finished for PR 20020 at commit 7ccfd90.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-12-21T05:18:39Z

since now we let inserting commands have a child, it makes sense to wrap the child with AnalysisBarrier, to avoid re-analyzing them.

It makes sense to me too.

viirya · 2017-12-21T05:04:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala

Not a RunnableCommand now.

viirya · 2017-12-21T05:09:02Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala

DataWritingCommand is not a RunnableCommand.

viirya · 2017-12-21T05:10:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala

I think the metrics in RunnableCommand is mainly for use in DataWritingCommand. Now DataWritingCommand is a separate class other than RunnableCommand, do we still need metrics in RunnableCommand?

Well, I prefer to keep it. It is used only in ExecutedCommandExec, and in the future if there are new RunnableCommand with metrics, the new command can just override metrics.

viirya · 2017-12-21T05:15:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala

outputColumns?

viirya · 2017-12-21T05:21:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala

Add AnalysisBarrier around query here?

I tried but it will cause runtime exception. When the AnalysisBarrier is removed by analyzer, the child of DataWritingCommand is no longer AnalysisBarrier

ah, right. We should add the barrier when passing in the query.

SparkQA · 2017-12-21T11:06:52Z

Test build #85249 has finished for PR 20020 at commit 83e0fba.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-12-21T14:03:51Z

retest this please

cloud-fan · 2017-12-21T14:22:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala

is this comment valid?

cloud-fan · 2017-12-21T14:31:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala

since we define children as query :: Nil, here we can just pass query: SparkPlan

cloud-fan · 2017-12-21T14:32:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala

we need logical plan output not physical plan output

SparkQA · 2017-12-21T15:14:09Z

Test build #85268 has finished for PR 20020 at commit 30535b6.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class DataWritingCommandExec(cmd: DataWritingCommand, child: SparkPlan)

SparkQA · 2017-12-21T17:04:30Z

Test build #85266 has finished for PR 20020 at commit 83e0fba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-12-21T17:07:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala

Do we need this?

I was following ExecutedCommandExec. And I can see some slight difference withexplain command. I can remove it.

gatorsmile · 2017-12-21T18:06:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

Do we still need this?

gatorsmile · 2017-12-21T18:07:22Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala

AnalysisBarrier is not needed any more.

gatorsmile · 2017-12-21T18:19:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala

How about adding an AnalysisBarrier here?

I think @cloud-fan wants AnalysisBarrier to be in trait DataWritingCommand to make sure the query is analyzed.
Adding AnalysisBarrier will not be helpful, the query should be analyzed here.

gatorsmile · 2017-12-21T18:39:36Z

How about the command SaveIntoDataSourceCommand?

SparkQA · 2017-12-21T19:08:48Z

Test build #85269 has finished for PR 20020 at commit 5818c32.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-21T20:29:54Z

Test build #85272 has finished for PR 20020 at commit f76bef7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-12-22T02:26:22Z

@gatorsmile This PR focus on the commands that using FileFormatWriter. I will investigate how to resolve the other Insertion commands and create a followup PR, is that OK?

SparkQA · 2017-12-22T13:18:18Z

Test build #85313 has finished for PR 20020 at commit b60f4ec.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-22T16:04:33Z

Test build #85314 has finished for PR 20020 at commit 787e677.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-12-26T07:32:40Z

All these insertion commands are from postHocResolutionRules, while there are other batches after it. Skipping the batches after postHocResolutionRules will cause analysis error.
I decide not to add AnalysisBarrier for correctness and robustness.

SparkQA · 2017-12-26T08:05:01Z

Test build #85396 has finished for PR 20020 at commit cd2bbf8.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-12-26T08:10:58Z

retest this please

SparkQA · 2017-12-26T11:18:02Z

Test build #85397 has finished for PR 20020 at commit cd2bbf8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-12-28T00:14:10Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

move it to HiveCatalogedDDLSuite?

This reverts commit 787e6775ec2ccbcfdcd88e9460f14e4d7f658a98.

SparkQA · 2017-12-29T06:33:41Z

Test build #85493 has finished for PR 20020 at commit 18ec016.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-12-29T07:29:28Z

thanks, merging to master!

gatorsmile · 2017-12-29T10:54:02Z

A late LGTM!

wangyum · 2018-08-27T01:46:48Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala


-  override lazy val metrics: Map[String, SQLMetric] = {
+  // Output columns of the analyzed input query plan
+  def outputColumns: Seq[Attribute]


outputColumns changed from analyzed to optimized. For example:

withTempDir { dir => val path = dir.getCanonicalPath val cnt = 30 val table1Path = s"$path/table1" val table3Path = s"$path/table3" spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id % 3 as bigint) as col2") .write.mode(SaveMode.Overwrite).parquet(table1Path) withTable("table1", "table3") { spark.sql( s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet location '$table1Path/'") spark.sql("CREATE TABLE table3(COL1 bigint, COL2 bigint) using parquet " + "PARTITIONED BY (COL2) " + s"CLUSTERED BY (COL1) INTO 2 BUCKETS location '$table3Path/'") withView("view1") { spark.sql("CREATE VIEW view1 as select col1, col2 from table1 where col1 > -20") spark.sql("INSERT OVERWRITE TABLE table3 select COL1, COL2 from view1 CLUSTER BY COL1") spark.table("table3").show } } }

outputColumns: List(COL1#19L, COL2#20L) outputColumns: List(COL1#19L, COL2#20L) outputColumns: List(COL1#19L, COL2#20L) outputColumns: List(COL1#19L, COL2#20L) outputColumns: List(COL1#19L, COL2#20L) outputColumns: List(COL1#19L, COL2#20L) outputColumns: List(COL1#19L, COL2#20L) outputColumns: List(COL1#19L, COL2#20L) outputColumns: List(col1#16L, col2#17L) outputColumns: List(col1#16L, col2#17L) outputColumns: List(col1#16L, col2#17L)

gengliangwang force-pushed the insert branch from 2d58187 to e25a9eb Compare December 20, 2017 07:09

cloud-fan reviewed Dec 20, 2017

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala Outdated

Copy link

Contributor

cloud-fan Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

cloud-fan reviewed Dec 20, 2017

View reviewed changes

viirya reviewed Dec 21, 2017

View reviewed changes

cloud-fan reviewed Dec 21, 2017

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala Outdated

Copy link

Contributor

cloud-fan Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment valid?

cloud-fan reviewed Dec 21, 2017

View reviewed changes

gatorsmile reviewed Dec 21, 2017

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala Outdated

Copy link

Member

gatorsmile Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?

gatorsmile reviewed Dec 21, 2017

View reviewed changes

gengliangwang force-pushed the insert branch from b60f4ec to 787e677 Compare December 22, 2017 14:07

gatorsmile reviewed Dec 28, 2017

View reviewed changes

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Outdated

Copy link

Member

gatorsmile Dec 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it to HiveCatalogedDDLSuite?

gengliangwang added 8 commits December 29, 2017 09:16

SPARK-22834: Make insert commands have real children to fix UI issues

9717709

address comments

d49d0f0

address comments

043cd55

address comments

cb10431

fix DataWritingCommandExec

c1b542f

remove useless code

21b5318

add AnalysisBarrier

bdffa6d

Revert "add AnalysisBarrier"

0e4d2e1

This reverts commit 787e6775ec2ccbcfdcd88e9460f14e4d7f658a98.

gengliangwang force-pushed the insert branch from cd2bbf8 to e60a86e Compare December 29, 2017 03:23

add one AnalysisBarrier and address comments

18ec016

gengliangwang force-pushed the insert branch from e60a86e to 18ec016 Compare December 29, 2017 03:25

asfgit closed this in d4f0b1d Dec 29, 2017

gengliangwang mentioned this pull request Jan 15, 2018

[SPARK-19256][SQL] Remove ordering enforcement from FileFormatWriter and let planner do that #20206

Closed

wangyum mentioned this pull request Aug 18, 2018

[SPARK-25135][SQL] Insert datasource table may all null when select from view #22124

Closed

wangyum reviewed Aug 27, 2018

View reviewed changes

[SPARK-22834][SQL] Make insertion commands have real children to fix UI issues #20020

[SPARK-22834][SQL] Make insertion commands have real children to fix UI issues #20020

Uh oh!

Conversation

gengliangwang commented Dec 19, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Dec 19, 2017

Uh oh!

SparkQA commented Dec 19, 2017

Uh oh!

SparkQA commented Dec 20, 2017

Uh oh!

gengliangwang commented Dec 20, 2017

Uh oh!

SparkQA commented Dec 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 20, 2017

Uh oh!

SparkQA commented Dec 20, 2017

Uh oh!

gatorsmile commented Dec 21, 2017

Uh oh!

SparkQA commented Dec 21, 2017

Uh oh!

viirya commented Dec 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 21, 2017

Uh oh!

gengliangwang commented Dec 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 21, 2017

Uh oh!

SparkQA commented Dec 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Dec 21, 2017

Uh oh!

SparkQA commented Dec 21, 2017

Uh oh!