[SPARK-20213][SQL][UI] Fix DataFrameWriter operations in SQL UI tab. #17540

rdblue · 2017-04-05T17:33:00Z

What changes were proposed in this pull request?

Wraps DataFrameWriter operations in SQLExecution.withNewExecutionId so that SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are sent and the query shows up in the SQL tab of the UI.

How was this patch tested?

Tested by hand that insertInto results in queries in the SQL tab.

srowen · 2017-04-05T17:54:35Z

Looks closely related to #17535 ?

rdblue · 2017-04-05T18:03:59Z

@srowen, agreed. Closely related but not the same code paths. The question is: when should withNewExecutionId get called?

I'm running the test suite now and this patch causes test failures when withNewExecutionId is called twice; once in DataFrameWriter and once in InsertIntoHadoopFsRelationCommand. It looks like the call has now been littered about the codebase (e.g. in InsertIntoHadoopFsRelationCommand and other execution nodes) to fix this problem on certain operations, so we should decide where it should be used and fix tests around that.

The reason why I added it to DataFrameWriter is that it is called in Dataset actions, and it makes sense to call it once from where an action is started. I think it makes the most sense for action methods, like Dataset#collect or DataFrameWriter#insertInto to minimize the number of places we need to add it. I don't think this is a concern that should be addressed by the execution plan.

SparkQA · 2017-04-05T18:22:17Z

Test build #75549 has finished for PR 17540 at commit f9342b5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rdblue · 2017-04-06T16:35:41Z

@cloud-fan, can you look at this? What do you think about the question above: when should withNewExecutionId get called?

SparkQA · 2017-04-06T18:26:28Z

Test build #75579 has finished for PR 17540 at commit a3296a2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rdblue · 2017-04-06T18:38:02Z

The new test failures are caused by a check I inserted. Moving where withNewExecutionId gets called could result in missing SQL queries in the UI, so anywhere I'm replacing withNewExecutionId with checkSQLExecutionId that will cause tests (and only tests) to fail. That way, we can catch all of the call stacks that should have it.

This caught problems in SQL command execution and I've added a patch to fix it.

SparkQA · 2017-04-06T18:54:36Z

Test build #75581 has finished for PR 17540 at commit 429edfb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-04-08T08:44:17Z

The withNewExecutionId was added at rxin@1b0317f#diff-89b9796aae086e790ddd9351f0db8115R134 .

The execution id is used to track all jobs that belong to the same query, so I think it makes sense to call withExecutionId at action methods like Dataset#collect or DataFrameWriter#insertInto

cloud-fan · 2017-04-08T08:47:34Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

how about LocalRelation(c.output, withAction("collect", queryExecution)(_. executeCollect()))

Actually do we need to do this? most Commands are just local operations(talking with metastore).

Yeah, the check I added to ensure we get the same results in the SQL tab has several hundred failures that go through this. Looks like the path is almost always spark.sql when the SQL statement is a command like CTAS.

I like your version and will update.

cloud-fan · 2017-04-08T08:56:31Z

LGTM, @rdblue the failed tests are thrift server tests, which are hard to debug. You can run hive tests locally and see what failed.(usually failed thrift server tests means we have failed hive tests)

rdblue · 2017-04-08T18:02:11Z

Thanks for the review! I'll get the thrift-server tests fixed up next week.

SparkQA · 2017-04-10T22:21:37Z

Test build #75675 has finished for PR 17540 at commit 7cb7d4e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-10T22:32:00Z

Test build #75676 has finished for PR 17540 at commit 68cc2b3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-11T00:30:00Z

Test build #75677 has finished for PR 17540 at commit ce0dbe7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-11T02:09:08Z

Test build #75681 has finished for PR 17540 at commit 7910825.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-12T22:19:15Z

Test build #75748 has finished for PR 17540 at commit 2cee6b7.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-04-13T00:46:23Z

Test build #75749 has finished for PR 17540 at commit 4f3a02b.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-04-14T00:25:50Z

Test build #75780 has finished for PR 17540 at commit cfe4e2c.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-04-14T01:34:33Z

Test build #75781 has finished for PR 17540 at commit 36da73b.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-04-14T05:57:51Z

Test build #75786 has finished for PR 17540 at commit a822309.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-04-14T18:35:50Z

Test build #75813 has finished for PR 17540 at commit 30fa4fc.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

rdblue · 2017-04-14T20:22:13Z

@cloud-fan, could you have another look at this?

There are a few new changes:

withNewExecutionId now warns instead of throwing an exception, but still throws exceptions if spark.testing is defined
SQLExecution.nested allows nested execution IDs without test failures or warnings. This is needed because several places will nest when withNewExceptionId is called at the high-level operations. CacheTableCommand is an example.

Over the last week, I've fixed nearly all of the tests. The remaining failure, SQLExecutionSuite.concurrent query execution (SPARK-10548), is fixed in maven, but fails in SBT. The problem is that exceptions are now only thrown if spark.testing is defined, and for some reason adding it to the test's SparkSession or SparkContext doesn't work on Jenkins. Because this test is reproducing a case that now will never happen for two reasons (the original multi-threading fix and throw only if spark.testing), I'd like to simply remove it. Let me know what you think about that.

Other changes to look at:

SQLMetricsSuite.save metrics started failing because there is a nested execution ID. This is because there are two SQL physical plans. The first, ExecutedCommandExec links in a logical plan that is turned into a second physical plan at runtime. This means that the inner plan can't report the metrics that will be collected when analyzing the outer plan because it doesn't exist yet. The long-term solution is to fix ExecutedCommandExec, but for now this accepts any metrics created by the inner plan.
StreamExecution wasn't calling withNewExecutionId and was caught by the new assertion. I added the call around the entire execution so that there isn't a new SQL execution for every batch. This required creating a special queryExecution to pass in.
DataFrameCallbackSuite had to be updated to include commands that were previously not registered in the SQL tab. The new SQL executions are for dropping tables, so the result looks more correct than before.

cloud-fan · 2017-04-17T04:10:42Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala

this is only called in FileFormatWirter, is there any other places we need to consider?

To keep this PR from growing too big, I want to just use it where I've removed withNewExecutionId to check for regressions. I'll follow up with another PR with more checks.

cloud-fan · 2017-04-17T05:24:36Z

yea let's remove that test

rdblue · 2017-04-17T16:24:05Z

Removed the failing SPARK-10548 test and rebased.

SparkQA · 2017-05-02T17:52:41Z

Test build #76388 has finished for PR 17540 at commit bd324e6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-02T23:07:29Z

Test build #76395 has finished for PR 17540 at commit 7131c32.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-03T01:01:32Z

Test build #76396 has finished for PR 17540 at commit 69ed59e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-03T18:42:46Z

Test build #76422 has finished for PR 17540 at commit 4db4fc9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2017-05-03T23:50:55Z

@rdblue I just tested this PR and found that I could not see any SQL metrics on Web UI. This is pretty important for many users to analyze their queries.

What's your plan to fix it? As far as I understand, you want to show the parameters of write actions, such as InsertIntoHadoopFsRelationCommand(file://..., Parquet, ...). However, that's not the running QueryExecution. Right now we just put SQL metrics to the DAG on UI. After your change, the DAG shown on UI is not the running QueryExecution, and I don't know how to show SQL metrics on a wrong DAG. Is it even possible to fix it in a follow up PR quickly?

rdblue · 2017-05-04T00:00:15Z

@zsxwing, there should be a fix for the metrics without waiting for all of the bad plans to be fixed (which is to basically eliminate the use of ExecutedCommandExec).

The metrics are missing because ExecutedCommandExec doesn't report them via metrics. So we need to update it to get metrics from the command that is run. That requires breaking the command into two phases, one to get a SparkPlan and one to run it. Right now, that all happens at once. This shouldn't be too difficult as a follow-up, but will be a substantial number of changes so I think this PR should be independent.

zsxwing · 2017-05-04T00:05:07Z

So we need to update it to get metrics from the command that is run. That requires breaking the command into two phases, one to get a SparkPlan and one to run it.

Yeah, but how to show metrics you get from a plan on another plan's DAG considering these two plans could be different?

rdblue · 2017-05-04T00:07:29Z

@zsxwing, I don't know. Sounds like we should fix the underlying problem that there are 2 physical plans.

zsxwing · 2017-05-04T00:12:45Z

@zsxwing, I don't know. Sounds like we should fix the underlying problem that there are 2 physical plans.

SQL metrics won't work without fixing it. IMO, that's more serious than the problem you are fixing.

rdblue · 2017-05-04T00:21:51Z

@zsxwing, you don't think there's a way to fix metrics? I don't know exactly how to fix the UI to show two plans worth of metrics, but it seems like it can be done. What about also updating ExecutedCommandExec also report the plan from its child command?

Having two physical plans is a pretty bad problem for a SQL engine to have. If the work-around is to ignore that some parts of the UI don't work, I don't think that's a good plan. Sure, this is going to be a short-term regression for metrics, but what is the alternative to fix the underlying problem?

zsxwing · 2017-05-04T00:44:15Z

I was not saying there is no way to fix metrics. Just asking your thoughts. If we don't have a concrete plan, it might be a long-term regression if just merging this PR.

I just want to ensure that it's going to be fixed soon after merging this PR.

rdblue · 2017-05-04T00:50:58Z

I'm not an expert on the metrics path, but I think we should be able to join up the actual physical plans well enough to display everything. I doubt it will be a long-term regression, but I don't think the fix is small enough that we should include it here. I also think that it is important for this to go in so we can fix the problem. Otherwise, I think it is too easy to ignore it. Not having queries show up in the SQL tab is just as bad as the metrics issue, so I think we're trading up.

SparkQA · 2017-05-04T01:31:57Z

Test build #76431 has finished for PR 17540 at commit f63b773.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-05-04T14:53:40Z

how about we make InsertIntoHadoopFsRelation and InsertIntoHiveTable physical plans instead of RunnableCommand? Someone also asked for this refactor in https://issues.apache.org/jira/browse/SPARK-19256

rdblue · 2017-05-04T16:07:14Z

@cloud-fan, all of the RunnableCommand instances that are currently run through ExecutedCommandExec need to be fixed so that there is only one physical plan. But the scope of those changes is larger than what's needed here. And as @zsxwing notes, the next step is probably to fix metrics for those broken plans.

rdblue · 2017-05-05T20:11:34Z

@cloud-fan, @zsxwing, tests are passing now. Should we commit this so we can start fixing the metrics?

zsxwing · 2017-05-05T21:26:43Z

I don't think we need to rush. As far as I can tell, this PR breaks two things:

SQL Metrics on Web UI is broken.
It doesn't display the batch queries inside a Structured Streaming query. Right now it always shows one SQL query.

What benefits we can gain from this PR that are worth to break the above things?

zsxwing · 2017-05-05T21:39:48Z

I suggest that you just fix them in this PR. If it has to be a large PR, I'm okey with that.

cloud-fan · 2017-05-08T11:46:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala

+          currentBatchId,
+          offsetSeqMetadata)
+
+        SQLExecution.withNewExecutionId(sparkSessionToRunBatches, genericStreamExecution) {


I think we should have an execution id for each streaming batch

cloud-fan · 2017-05-08T13:19:40Z

Hi @rdblue , I have sent you a PR(rdblue#1) to fix the missing metrics issue of ExecutedCommandExec, and we also need to fix the streaming batch metrics issue, then we are ready to go.

Currently the `DataFrameWriter` operations have several problems: 1. non-file-format data source writing action doesn't show up in the SQL tab in Spark UI 2. file-format data source writing action shows a scan node in the SQL tab, without saying anything about writing. (streaming also have this issue, but not fixed in this PR) 3. Spark SQL CLI actions don't show up in the SQL tab. This PR fixes all of them, by refactoring the `ExecuteCommandExec` to make it have children. close apache#17540 existing tests. Also test the UI manually. For a simple command: `Seq(1 -> "a").toDF("i", "j").write.parquet("/tmp/qwe")` before this PR: <img width="266" alt="qq20170523-035840 2x" src="https://cloud.githubusercontent.com/assets/3182036/26326050/24e18ba2-3f6c-11e7-8817-6dd275bf6ac5.png"> after this PR: <img width="287" alt="qq20170523-035708 2x" src="https://cloud.githubusercontent.com/assets/3182036/26326054/2ad7f460-3f6c-11e7-8053-d68325beb28f.png"> Author: Wenchen Fan <[email protected]> Closes apache#18064 from cloud-fan/execution. This also includes the following commits: 0795c16 introduce SQLExecution.ignoreNestedExecutionId cd6e3f0 address comments

srowen mentioned this pull request Apr 5, 2017

[SPARK-20222][SQL] Bring back the Spark SQL UI when executing queries in Spark SQL CLI #17535

Closed

cloud-fan reviewed Apr 8, 2017

View reviewed changes

rdblue force-pushed the SPARK-20213-fix-sql-tab branch from 2cee6b7 to 4f3a02b Compare April 12, 2017 22:21

rdblue force-pushed the SPARK-20213-fix-sql-tab branch 2 times, most recently from cfe4e2c to 36da73b Compare April 13, 2017 22:56

cloud-fan reviewed Apr 17, 2017

View reviewed changes

rdblue force-pushed the SPARK-20213-fix-sql-tab branch from 30fa4fc to 901cec8 Compare April 17, 2017 16:23

rdblue force-pushed the SPARK-20213-fix-sql-tab branch from 1ce1a81 to bd324e6 Compare May 2, 2017 16:27

rdblue force-pushed the SPARK-20213-fix-sql-tab branch from bd324e6 to 7131c32 Compare May 2, 2017 21:34

rdblue force-pushed the SPARK-20213-fix-sql-tab branch from 7131c32 to 69ed59e Compare May 2, 2017 23:23

rdblue force-pushed the SPARK-20213-fix-sql-tab branch from 69ed59e to 4db4fc9 Compare May 3, 2017 16:47

SPARK-20213: Fix more tests with nested SQL executions.

f63b773

rdblue force-pushed the SPARK-20213-fix-sql-tab branch from 4db4fc9 to f63b773 Compare May 3, 2017 23:16

cloud-fan reviewed May 8, 2017

View reviewed changes

cloud-fan mentioned this pull request May 8, 2017

[SPARK-20635][SQL] No SQL tab in Spark UI #17897

Closed

cloud-fan mentioned this pull request May 22, 2017

[SPARK-20213][SQL] Fix DataFrameWriter operations in SQL UI tab #18064

Closed

asfgit closed this in 10e526e May 31, 2017

[SPARK-20213][SQL][UI] Fix DataFrameWriter operations in SQL UI tab. #17540

[SPARK-20213][SQL][UI] Fix DataFrameWriter operations in SQL UI tab. #17540

Uh oh!

Conversation

rdblue commented Apr 5, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Apr 5, 2017

Uh oh!

rdblue commented Apr 5, 2017

Uh oh!

SparkQA commented Apr 5, 2017

Uh oh!

rdblue commented Apr 6, 2017

Uh oh!

SparkQA commented Apr 6, 2017

Uh oh!

rdblue commented Apr 6, 2017

Uh oh!

SparkQA commented Apr 6, 2017

Uh oh!

cloud-fan commented Apr 8, 2017

Uh oh!

cloud-fan Apr 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 8, 2017

Choose a reason for hiding this comment

Uh oh!

rdblue Apr 8, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Apr 8, 2017

Uh oh!

rdblue commented Apr 8, 2017

Uh oh!

SparkQA commented Apr 10, 2017

Uh oh!

SparkQA commented Apr 10, 2017

Uh oh!

SparkQA commented Apr 11, 2017

Uh oh!

SparkQA commented Apr 11, 2017

Uh oh!

SparkQA commented Apr 12, 2017

Uh oh!

SparkQA commented Apr 13, 2017

Uh oh!

SparkQA commented Apr 14, 2017

Uh oh!

SparkQA commented Apr 14, 2017

Uh oh!

SparkQA commented Apr 14, 2017

Uh oh!

SparkQA commented Apr 14, 2017

Uh oh!

rdblue commented Apr 14, 2017

Uh oh!

cloud-fan Apr 17, 2017

Choose a reason for hiding this comment

Uh oh!

rdblue Apr 17, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Apr 17, 2017

Uh oh!

rdblue commented Apr 17, 2017

Uh oh!

SparkQA commented May 2, 2017

Uh oh!

SparkQA commented May 2, 2017

Uh oh!

SparkQA commented May 3, 2017

Uh oh!

SparkQA commented May 3, 2017

Uh oh!

zsxwing commented May 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cloud-fan Apr 8, 2017 •

edited

Loading

zsxwing commented May 3, 2017 •

edited

Loading

zsxwing commented May 4, 2017 •

edited

Loading

zsxwing commented May 5, 2017 •

edited

Loading

zsxwing commented May 5, 2017 •

edited

Loading

cloud-fan commented May 8, 2017 •

edited

Loading