-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20213][SQL][UI] Fix DataFrameWriter operations in SQL UI tab. #17540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks closely related to #17535 ? |
|
@srowen, agreed. Closely related but not the same code paths. The question is: when should I'm running the test suite now and this patch causes test failures when The reason why I added it to |
|
Test build #75549 has finished for PR 17540 at commit
|
|
@cloud-fan, can you look at this? What do you think about the question above: when should |
|
Test build #75579 has finished for PR 17540 at commit
|
|
The new test failures are caused by a check I inserted. Moving where This caught problems in SQL command execution and I've added a patch to fix it. |
|
Test build #75581 has finished for PR 17540 at commit
|
|
The The execution id is used to track all jobs that belong to the same query, so I think it makes sense to call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about LocalRelation(c.output, withAction("collect", queryExecution)(_. executeCollect()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually do we need to do this? most Commands are just local operations(talking with metastore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the check I added to ensure we get the same results in the SQL tab has several hundred failures that go through this. Looks like the path is almost always spark.sql when the SQL statement is a command like CTAS.
I like your version and will update.
|
LGTM, @rdblue the failed tests are thrift server tests, which are hard to debug. You can run hive tests locally and see what failed.(usually failed thrift server tests means we have failed hive tests) |
|
Thanks for the review! I'll get the thrift-server tests fixed up next week. |
|
Test build #75675 has finished for PR 17540 at commit
|
|
Test build #75676 has finished for PR 17540 at commit
|
|
Test build #75677 has finished for PR 17540 at commit
|
|
Test build #75681 has finished for PR 17540 at commit
|
|
Test build #75748 has finished for PR 17540 at commit
|
2cee6b7 to
4f3a02b
Compare
|
Test build #75749 has finished for PR 17540 at commit
|
cfe4e2c to
36da73b
Compare
|
Test build #75780 has finished for PR 17540 at commit
|
|
Test build #75781 has finished for PR 17540 at commit
|
|
Test build #75786 has finished for PR 17540 at commit
|
|
Test build #75813 has finished for PR 17540 at commit
|
|
@cloud-fan, could you have another look at this? There are a few new changes:
Over the last week, I've fixed nearly all of the tests. The remaining failure, SQLExecutionSuite.concurrent query execution (SPARK-10548), is fixed in maven, but fails in SBT. The problem is that exceptions are now only thrown if Other changes to look at:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only called in FileFormatWirter, is there any other places we need to consider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep this PR from growing too big, I want to just use it where I've removed withNewExecutionId to check for regressions. I'll follow up with another PR with more checks.
|
yea let's remove that test |
30fa4fc to
901cec8
Compare
|
Removed the failing SPARK-10548 test and rebased. |
1ce1a81 to
bd324e6
Compare
|
Test build #76388 has finished for PR 17540 at commit
|
bd324e6 to
7131c32
Compare
|
Test build #76395 has finished for PR 17540 at commit
|
7131c32 to
69ed59e
Compare
|
Test build #76396 has finished for PR 17540 at commit
|
69ed59e to
4db4fc9
Compare
|
Test build #76422 has finished for PR 17540 at commit
|
4db4fc9 to
f63b773
Compare
|
@rdblue I just tested this PR and found that I could not see any SQL metrics on Web UI. This is pretty important for many users to analyze their queries. What's your plan to fix it? As far as I understand, you want to show the parameters of |
|
@zsxwing, there should be a fix for the metrics without waiting for all of the bad plans to be fixed (which is to basically eliminate the use of The metrics are missing because |
Yeah, but how to show metrics you get from a plan on another plan's DAG considering these two plans could be different? |
|
@zsxwing, I don't know. Sounds like we should fix the underlying problem that there are 2 physical plans. |
SQL metrics won't work without fixing it. IMO, that's more serious than the problem you are fixing. |
|
@zsxwing, you don't think there's a way to fix metrics? I don't know exactly how to fix the UI to show two plans worth of metrics, but it seems like it can be done. What about also updating Having two physical plans is a pretty bad problem for a SQL engine to have. If the work-around is to ignore that some parts of the UI don't work, I don't think that's a good plan. Sure, this is going to be a short-term regression for metrics, but what is the alternative to fix the underlying problem? |
|
I was not saying there is no way to fix metrics. Just asking your thoughts. If we don't have a concrete plan, it might be a long-term regression if just merging this PR. I just want to ensure that it's going to be fixed soon after merging this PR. |
|
I'm not an expert on the metrics path, but I think we should be able to join up the actual physical plans well enough to display everything. I doubt it will be a long-term regression, but I don't think the fix is small enough that we should include it here. I also think that it is important for this to go in so we can fix the problem. Otherwise, I think it is too easy to ignore it. Not having queries show up in the SQL tab is just as bad as the metrics issue, so I think we're trading up. |
|
Test build #76431 has finished for PR 17540 at commit
|
|
how about we make |
|
@cloud-fan, all of the |
|
@cloud-fan, @zsxwing, tests are passing now. Should we commit this so we can start fixing the metrics? |
|
I don't think we need to rush. As far as I can tell, this PR breaks two things:
What benefits we can gain from this PR that are worth to break the above things? |
|
I suggest that you just fix them in this PR. If it has to be a large PR, I'm okey with that. |
| currentBatchId, | ||
| offsetSeqMetadata) | ||
|
|
||
| SQLExecution.withNewExecutionId(sparkSessionToRunBatches, genericStreamExecution) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have an execution id for each streaming batch
Currently the `DataFrameWriter` operations have several problems: 1. non-file-format data source writing action doesn't show up in the SQL tab in Spark UI 2. file-format data source writing action shows a scan node in the SQL tab, without saying anything about writing. (streaming also have this issue, but not fixed in this PR) 3. Spark SQL CLI actions don't show up in the SQL tab. This PR fixes all of them, by refactoring the `ExecuteCommandExec` to make it have children. close apache#17540 existing tests. Also test the UI manually. For a simple command: `Seq(1 -> "a").toDF("i", "j").write.parquet("/tmp/qwe")` before this PR: <img width="266" alt="qq20170523-035840 2x" src="https://cloud.githubusercontent.com/assets/3182036/26326050/24e18ba2-3f6c-11e7-8817-6dd275bf6ac5.png"> after this PR: <img width="287" alt="qq20170523-035708 2x" src="https://cloud.githubusercontent.com/assets/3182036/26326054/2ad7f460-3f6c-11e7-8053-d68325beb28f.png"> Author: Wenchen Fan <[email protected]> Closes apache#18064 from cloud-fan/execution. This also includes the following commits: 0795c16 introduce SQLExecution.ignoreNestedExecutionId cd6e3f0 address comments
What changes were proposed in this pull request?
Wraps
DataFrameWriteroperations inSQLExecution.withNewExecutionIdso thatSparkListenerSQLExecutionStartandSparkListenerSQLExecutionEndare sent and the query shows up in the SQL tab of the UI.How was this patch tested?
Tested by hand that
insertIntoresults in queries in the SQL tab.