[SPARK-18120][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods #16962

cloud-fan · 2017-02-16T19:43:01Z

What changes were proposed in this pull request?

We only notify QueryExecutionListener for several Dataset operations, e.g. collect, take, etc. We should also do the notification for DataFrameWriter operations.

How was this patch tested?

new regression test

close #16664

cloud-fan · 2017-02-16T19:44:10Z

cc @salilsurendran @vanzin @gatorsmile @sameeragarwal

vanzin

Looks good as far as I can tell. Thanks!

vanzin · 2017-02-16T19:49:02Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala


-    dataSource.write(mode, df)
+    runCommand(df.sparkSession, "save") {
+      SaveIntoDataSourceCommand(


Does this also cover SPARK-19557? If so, might as well mention that in the PR (or close the bug as a duplicate or related or something).

cloud-fan · 2017-02-16T20:47:07Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

+    val qe = session.sessionState.executePlan(command)
+    try {
+      qe.executedPlan.foreach { plan =>
+        plan.resetMetrics()


just realized that, in this code path, we pass in a logical plan and will always get a new physical plan, so we don't need to reset metrics here, let me remove it

SparkQA · 2017-02-16T21:38:54Z

Test build #73009 has finished for PR 16962 at commit ce0e126.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class SaveIntoDataSourceCommand(

SparkQA · 2017-02-16T22:44:13Z

Test build #73012 has finished for PR 16962 at commit c0dcd24.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-02-16T23:28:23Z

...re/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala

+      className = provider,
+      partitionColumns = partitionColumns,
+      options = options).write(mode, Dataset.ofRows(sparkSession, query))
+


Do we need to Invalidate the cache to be consistent with InsertIntoDataSourceCommand?

sparkSession.sharedState.cacheManager.invalidateCache(query)

we don't have the LogicalRelation to be used as cache key.

gatorsmile · 2017-02-16T23:39:12Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

    format("csv").save(path)
  }

+  private def runCommand(session: SparkSession, name: String)(command: LogicalPlan): Unit = {


Add a function description like?

/** * Wrap a DataFrameWriter action to track the QueryExecution and time cost, then report to the * user-registered callback functions. */

gatorsmile · 2017-02-16T23:40:46Z

sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala

+    val commands = ArrayBuffer.empty[(String, LogicalPlan)]
+    val exceptions = ArrayBuffer.empty[(String, Exception)]
+    val listener = new QueryExecutionListener {
+      // Only test successful case here, so no need to implement `onFailure`


invalid comment?

gatorsmile · 2017-02-16T23:41:08Z

LGTM except three minor comments.

gatorsmile · 2017-02-17T01:05:09Z

LGTM

SparkQA · 2017-02-17T01:43:26Z

Test build #73023 has finished for PR 16962 at commit d35fac3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-02-17T05:09:40Z

thanks for the review, merging to master!

hvanhovell · 2017-02-17T10:04:40Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

    format("csv").save(path)
  }

+  private def runCommand(session: SparkSession, name: String)(command: LogicalPlan): Unit = {


Why don't we use SQLExecution instead of this?

the problem is that, in some commands, like InsertIntoHiveTable, we already use SQLExecution.newExecution, so we can't use it again to wrap these commands.

In the future we should figure out a central place to put SQLExecution.newExecution

…methods for DataFrameWriter methods We only notify `QueryExecutionListener` for several `Dataset` operations, e.g. collect, take, etc. We should also do the notification for `DataFrameWriter` operations. new regression test close apache#16664 Author: Wenchen Fan <[email protected]> Closes apache#16962 from cloud-fan/insert.

Call QueryExecutionListener callback methods for DataFrameWriter methods

ce0e126

vanzin reviewed Feb 16, 2017

View reviewed changes

cloud-fan changed the title ~~[SPARK-18120 ][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods~~ [SPARK-18120 ][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods Feb 16, 2017

cloud-fan commented Feb 16, 2017

View reviewed changes

no need to reset metrics

c0dcd24

gatorsmile reviewed Feb 16, 2017

View reviewed changes

address comments

d35fac3

cloud-fan changed the title ~~[SPARK-18120 ][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods~~ [SPARK-18120][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods Feb 17, 2017

asfgit closed this in 54d2359 Feb 17, 2017

hvanhovell reviewed Feb 17, 2017

View reviewed changes

[SPARK-18120][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods #16962

[SPARK-18120][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods #16962

Uh oh!

Conversation

cloud-fan commented Feb 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Feb 16, 2017

Uh oh!

vanzin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 16, 2017

Uh oh!

SparkQA commented Feb 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Feb 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Feb 16, 2017

Uh oh!

gatorsmile commented Feb 17, 2017

Uh oh!

SparkQA commented Feb 17, 2017

Uh oh!

cloud-fan commented Feb 17, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cloud-fan commented Feb 16, 2017 •

edited

Loading

gatorsmile Feb 16, 2017 •

edited

Loading