Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

With #19474, children of insertion commands are missing in UI.
To fix it:

  1. Create a new physical plan DataWritingCommandExec to exec DataWritingCommand with children. So that the other commands won't be affected.
  2. On creation of DataWritingCommand, a new field allColumns must be specified, which is the output of analyzed plan.
  3. In FileFormatWriter, the output schema will use allColumns instead of the output of optimized plan.

Before code changes:
2017-12-19 10 27 10

After code changes:
2017-12-19 10 27 04

How was this patch tested?

Unit test

@SparkQA
Copy link

SparkQA commented Dec 19, 2017

Test build #85110 has finished for PR 20020 at commit d354895.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait DataWritingCommand extends LogicalPlan
  • case class DataWritingCommandExec(cmd: DataWritingCommand, children: Seq[SparkPlan])

@SparkQA
Copy link

SparkQA commented Dec 19, 2017

Test build #85113 has finished for PR 20020 at commit 2d58187.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85159 has finished for PR 20020 at commit e25a9eb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait DataWritingCommand extends Command
  • case class DataWritingCommandExec(cmd: DataWritingCommand, children: Seq[SparkPlan])

@gengliangwang
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85161 has finished for PR 20020 at commit e25a9eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait DataWritingCommand extends Command
  • case class DataWritingCommandExec(cmd: DataWritingCommand, children: Seq[SparkPlan])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we force all sub-classes to implement it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now shall we define query as a child here?

@cloud-fan
Copy link
Contributor

since now we let inserting commands have a child, it makes sense to wrap the child with AnalysisBarrier, to avoid re-analyzing them. also cc @viirya

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85193 has finished for PR 20020 at commit 7ccfd90.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85225 has finished for PR 20020 at commit 7ccfd90.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Dec 21, 2017

since now we let inserting commands have a child, it makes sense to wrap the child with AnalysisBarrier, to avoid re-analyzing them.

It makes sense to me too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a RunnableCommand now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataWritingCommand is not a RunnableCommand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the metrics in RunnableCommand is mainly for use in DataWritingCommand. Now DataWritingCommand is a separate class other than RunnableCommand, do we still need metrics in RunnableCommand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I prefer to keep it. It is used only in ExecutedCommandExec, and in the future if there are new RunnableCommand with metrics, the new command can just override metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputColumns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add AnalysisBarrier around query here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried but it will cause runtime exception. When the AnalysisBarrier is removed by analyzer, the child of DataWritingCommand is no longer AnalysisBarrier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right. We should add the barrier when passing in the query.

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85249 has finished for PR 20020 at commit 83e0fba.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we define children as query :: Nil, here we can just pass query: SparkPlan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need logical plan output not physical plan output

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85268 has finished for PR 20020 at commit 30535b6.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class DataWritingCommandExec(cmd: DataWritingCommand, child: SparkPlan)

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85266 has finished for PR 20020 at commit 83e0fba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following ExecutedCommandExec. And I can see some slight difference withexplain command. I can remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalysisBarrier is not needed any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding an AnalysisBarrier here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @cloud-fan wants AnalysisBarrier to be in trait DataWritingCommand to make sure the query is analyzed.
Adding AnalysisBarrier will not be helpful, the query should be analyzed here.

@gatorsmile
Copy link
Member

How about the command SaveIntoDataSourceCommand?

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85269 has finished for PR 20020 at commit 5818c32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85272 has finished for PR 20020 at commit f76bef7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

@gatorsmile This PR focus on the commands that using FileFormatWriter. I will investigate how to resolve the other Insertion commands and create a followup PR, is that OK?

@SparkQA
Copy link

SparkQA commented Dec 22, 2017

Test build #85313 has finished for PR 20020 at commit b60f4ec.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 22, 2017

Test build #85314 has finished for PR 20020 at commit 787e677.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

All these insertion commands are from postHocResolutionRules, while there are other batches after it. Skipping the batches after postHocResolutionRules will cause analysis error.
I decide not to add AnalysisBarrier for correctness and robustness.

@SparkQA
Copy link

SparkQA commented Dec 26, 2017

Test build #85396 has finished for PR 20020 at commit cd2bbf8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 26, 2017

Test build #85397 has finished for PR 20020 at commit cd2bbf8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it to HiveCatalogedDDLSuite?

@SparkQA
Copy link

SparkQA commented Dec 29, 2017

Test build #85493 has finished for PR 20020 at commit 18ec016.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in d4f0b1d Dec 29, 2017
@gatorsmile
Copy link
Member

A late LGTM!


override lazy val metrics: Map[String, SQLMetric] = {
// Output columns of the analyzed input query plan
def outputColumns: Seq[Attribute]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputColumns changed from analyzed to optimized. For example:

withTempDir { dir =>
  val path = dir.getCanonicalPath
  val cnt = 30
  val table1Path = s"$path/table1"
  val table3Path = s"$path/table3"
  spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id % 3 as bigint) as col2")
    .write.mode(SaveMode.Overwrite).parquet(table1Path)
  withTable("table1", "table3") {
    spark.sql(
      s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet location '$table1Path/'")
    spark.sql("CREATE TABLE table3(COL1 bigint, COL2 bigint) using parquet " +
      "PARTITIONED BY (COL2) " +
      s"CLUSTERED BY (COL1) INTO 2 BUCKETS location '$table3Path/'")

    withView("view1") {
      spark.sql("CREATE VIEW view1 as select col1, col2 from table1 where col1 > -20")
      spark.sql("INSERT OVERWRITE TABLE table3 select COL1, COL2 from view1 CLUSTER BY COL1")
      spark.table("table3").show
    }
  }
}
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(col1#16L, col2#17L)
outputColumns: List(col1#16L, col2#17L)
outputColumns: List(col1#16L, col2#17L)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants