-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22834][SQL] Make insertion commands have real children to fix UI issues #20020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #85110 has finished for PR 20020 at commit
|
|
Test build #85113 has finished for PR 20020 at commit
|
2d58187 to
e25a9eb
Compare
|
Test build #85159 has finished for PR 20020 at commit
|
|
retest this please |
|
Test build #85161 has finished for PR 20020 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we force all sub-classes to implement it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now shall we define query as a child here?
|
since now we let inserting commands have a child, it makes sense to wrap the child with |
|
Test build #85193 has finished for PR 20020 at commit
|
|
retest this please |
|
Test build #85225 has finished for PR 20020 at commit
|
It makes sense to me too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a RunnableCommand now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataWritingCommand is not a RunnableCommand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the metrics in RunnableCommand is mainly for use in DataWritingCommand. Now DataWritingCommand is a separate class other than RunnableCommand, do we still need metrics in RunnableCommand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I prefer to keep it. It is used only in ExecutedCommandExec, and in the future if there are new RunnableCommand with metrics, the new command can just override metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputColumns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add AnalysisBarrier around query here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried but it will cause runtime exception. When the AnalysisBarrier is removed by analyzer, the child of DataWritingCommand is no longer AnalysisBarrier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, right. We should add the barrier when passing in the query.
|
Test build #85249 has finished for PR 20020 at commit
|
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we define children as query :: Nil, here we can just pass query: SparkPlan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need logical plan output not physical plan output
|
Test build #85268 has finished for PR 20020 at commit
|
|
Test build #85266 has finished for PR 20020 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following ExecutedCommandExec. And I can see some slight difference withexplain command. I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnalysisBarrier is not needed any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding an AnalysisBarrier here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @cloud-fan wants AnalysisBarrier to be in trait DataWritingCommand to make sure the query is analyzed.
Adding AnalysisBarrier will not be helpful, the query should be analyzed here.
|
How about the command |
|
Test build #85269 has finished for PR 20020 at commit
|
|
Test build #85272 has finished for PR 20020 at commit
|
|
@gatorsmile This PR focus on the commands that using |
|
Test build #85313 has finished for PR 20020 at commit
|
b60f4ec to
787e677
Compare
|
Test build #85314 has finished for PR 20020 at commit
|
|
All these insertion commands are from |
|
Test build #85396 has finished for PR 20020 at commit
|
|
retest this please |
|
Test build #85397 has finished for PR 20020 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move it to HiveCatalogedDDLSuite?
This reverts commit 787e6775ec2ccbcfdcd88e9460f14e4d7f658a98.
cd2bbf8 to
e60a86e
Compare
e60a86e to
18ec016
Compare
|
Test build #85493 has finished for PR 20020 at commit
|
|
thanks, merging to master! |
|
A late LGTM! |
|
|
||
| override lazy val metrics: Map[String, SQLMetric] = { | ||
| // Output columns of the analyzed input query plan | ||
| def outputColumns: Seq[Attribute] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputColumns changed from analyzed to optimized. For example:
withTempDir { dir =>
val path = dir.getCanonicalPath
val cnt = 30
val table1Path = s"$path/table1"
val table3Path = s"$path/table3"
spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id % 3 as bigint) as col2")
.write.mode(SaveMode.Overwrite).parquet(table1Path)
withTable("table1", "table3") {
spark.sql(
s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet location '$table1Path/'")
spark.sql("CREATE TABLE table3(COL1 bigint, COL2 bigint) using parquet " +
"PARTITIONED BY (COL2) " +
s"CLUSTERED BY (COL1) INTO 2 BUCKETS location '$table3Path/'")
withView("view1") {
spark.sql("CREATE VIEW view1 as select col1, col2 from table1 where col1 > -20")
spark.sql("INSERT OVERWRITE TABLE table3 select COL1, COL2 from view1 CLUSTER BY COL1")
spark.table("table3").show
}
}
}outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(COL1#19L, COL2#20L)
outputColumns: List(col1#16L, col2#17L)
outputColumns: List(col1#16L, col2#17L)
outputColumns: List(col1#16L, col2#17L)
What changes were proposed in this pull request?
With #19474, children of insertion commands are missing in UI.
To fix it:
DataWritingCommandExecto execDataWritingCommandwith children. So that the other commands won't be affected.DataWritingCommand, a new fieldallColumnsmust be specified, which is the output of analyzed plan.FileFormatWriter, the output schema will useallColumnsinstead of the output of optimized plan.Before code changes:

After code changes:

How was this patch tested?
Unit test