Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Feb 16, 2017

What changes were proposed in this pull request?

We only notify QueryExecutionListener for several Dataset operations, e.g. collect, take, etc. We should also do the notification for DataFrameWriter operations.

How was this patch tested?

new regression test

close #16664

@cloud-fan
Copy link
Contributor Author

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can tell. Thanks!


dataSource.write(mode, df)
runCommand(df.sparkSession, "save") {
SaveIntoDataSourceCommand(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also cover SPARK-19557? If so, might as well mention that in the PR (or close the bug as a duplicate or related or something).

@cloud-fan cloud-fan changed the title [SPARK-18120 ][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods [SPARK-18120 ][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods Feb 16, 2017
val qe = session.sessionState.executePlan(command)
try {
qe.executedPlan.foreach { plan =>
plan.resetMetrics()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just realized that, in this code path, we pass in a logical plan and will always get a new physical plan, so we don't need to reset metrics here, let me remove it

@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #73009 has finished for PR 16962 at commit ce0e126.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SaveIntoDataSourceCommand(

@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #73012 has finished for PR 16962 at commit c0dcd24.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

className = provider,
partitionColumns = partitionColumns,
options = options).write(mode, Dataset.ofRows(sparkSession, query))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to Invalidate the cache to be consistent with InsertIntoDataSourceCommand?

    sparkSession.sharedState.cacheManager.invalidateCache(query)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have the LogicalRelation to be used as cache key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: )

format("csv").save(path)
}

private def runCommand(session: SparkSession, name: String)(command: LogicalPlan): Unit = {
Copy link
Member

@gatorsmile gatorsmile Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a function description like?

  /**
   * Wrap a DataFrameWriter action to track the QueryExecution and time cost, then report to the
   * user-registered callback functions.
   */

val commands = ArrayBuffer.empty[(String, LogicalPlan)]
val exceptions = ArrayBuffer.empty[(String, Exception)]
val listener = new QueryExecutionListener {
// Only test successful case here, so no need to implement `onFailure`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid comment?

@gatorsmile
Copy link
Member

LGTM except three minor comments.

@gatorsmile
Copy link
Member

LGTM

@SparkQA
Copy link

SparkQA commented Feb 17, 2017

Test build #73023 has finished for PR 16962 at commit d35fac3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SPARK-18120 ][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods [SPARK-18120][SPARK-19557][SQL] Call QueryExecutionListener callback methods for DataFrameWriter methods Feb 17, 2017
@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@asfgit asfgit closed this in 54d2359 Feb 17, 2017
format("csv").save(path)
}

private def runCommand(session: SparkSession, name: String)(command: LogicalPlan): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use SQLExecution instead of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that, in some commands, like InsertIntoHiveTable, we already use SQLExecution.newExecution, so we can't use it again to wrap these commands.

In the future we should figure out a central place to put SQLExecution.newExecution

jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
…methods for DataFrameWriter methods

We only notify `QueryExecutionListener` for several `Dataset` operations, e.g. collect, take, etc. We should also do the notification for `DataFrameWriter` operations.

new regression test

close apache#16664

Author: Wenchen Fan <[email protected]>

Closes apache#16962 from cloud-fan/insert.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants