-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11206] Support SQL UI on the history server #9297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #44419 has finished for PR 9297 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mach really needs `case other:String => throw new IOException(s"Unknown Event $other") message, so that when new events are added in future, the old replay code will at least display a meaningful message. Maybe that's for another pull req though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that. Thanks.
|
Test build #44493 has finished for PR 9297 at commit
|
|
Test build #44499 has finished for PR 9297 at commit
|
|
Test build #44503 has finished for PR 9297 at commit
|
|
Hi @carsonwang, This is a really useful addition, but I'm afraid this approach is not really scalable. For example, you have the same problem with the Streaming tab in the UI, and trying to fix that with your approach would just make things worse. I actually think the underlying code needs some lower level changes before this can be implemented more cleanly. For example, a few things that I thought about in the past:
With those in place, the only thing left is a protocol so that the history server can feed events to backend-specific listeners; that could be done by having the history server expose a new trait that is loaded at runtime using something like What do you think? I really would like to avoid moving more stuff to the core; it feels like this would be a pretty good opportunity to work on some of the outstanding issues with this whole part of the code. |
|
@vanzin Thanks a lot for the comment. This sounds great and is very helpful. I agree it is not a good idea to move more stuff to the core. The underlying code change you mentioned sounds a lot of works to do, especially when we consider backwards compatibility. Can we apply part of them so that we can support SQL UI on the history server in this PR and also avoid moving the SQL classes to the core? I was thinking adding a I know this might still not be very scalable. But we don't make things worse once we are able to avoid moving sql classes to the core and also support SQL UI on the history server. Do you think this is doable for this ticket? |
I'm not a big fan of allowing events to write custom serialization code. I think a better approach would be to change Mostly that's for backwards compatibility, so that event that are currently written to the event log don't change format. We can then document that the old format is deprecated and will be switched to the new, Jackson-based one in a future release. |
|
@vanzin , your suggestions are very helpful. I have updated the code following the ideas. I use Jackson to serialize the SQL events and also introduce a trait to register the events and feed them to the SQL listener. The SQL implementation of the trait is loaded at runtime using |
|
Test build #45201 has finished for PR 9297 at commit
|
|
Test build #45203 has finished for PR 9297 at commit
|
|
Test build #45346 has finished for PR 9297 at commit
|
|
Test build #45470 has finished for PR 9297 at commit
|
|
Test build #46320 has finished for PR 9297 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: get rid of this empty line.
|
LGTM. retest this please |
|
Test build #46373 has finished for PR 9297 at commit
|
|
retest this please |
|
Test build #46386 has finished for PR 9297 at commit
|
|
Test build #46387 has finished for PR 9297 at commit
|
|
retest this please |
|
Test build #46637 has finished for PR 9297 at commit
|
|
While I support this feature, people already see problems with scanning/loading app histories off the filesystem. This has the potential to increase it significantly. I think it helps justify having those fs logs save some metadata summary (started, ended, spark version, ...) alongside the logs, so at least the app listing operation can bypass the scan. Separate JIRA, obviously |
|
@steveloughran while that's true, this change shouldn't make things that much worse; things are already pretty bad for applications that generate large logs. I'm merging this to master. |
|
I just observed a SQLListenerMemoryLeakSuite memory leak test failure in Jenkins. Do you think that it could somehow be related to this patch? Here's the failing test: No need to revert or take drastic action yet, but let's keep an eye on that suite to see if the test fails in any more builds; you can monitor this test at https://spark-tests.appspot.com/tests/org.apache.spark.sql.execution.ui.SQLListenerMemoryLeakSuite/no%20memory%20leak, an experimental dashboard that I've been building to help investigate these types of flaky test issues. |
|
I can't really say, I'm not too familiar with the SQL backend here. @chenghao-intel took a look, maybe he has some insight. |
|
Seems failed again, we are debugging on it now, will keep you posted if anything find. |
|
I just submitted #9991 to fix the test failure. Details are described in the new PR. Thanks all! |
|
Just curious: are there any end-to-end tests for this feature? Would it be very hard to add some? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it's still legal to have multiple active SQLContexts in an application (although this is not a good practice), isn't this a user-facing behavioral change? If a user creates multiple SQLContexts, then they will now only have one SQL tab in the UI and one SQLListener, which is different than the old behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is the only purpose of sqlListener to prevent multiple listeners from being created at the same time? If so, I think it would be better to use an AtomicBoolean so that we don't create another strong reference to a SQLListener, which might have a lot of internal state that could lead to memory leaks.
|
I don't think so. Tests would have to be added to one of the In |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is going to be a no-op under certain circumstances then those circumstances should be documented here.
|
@vanzin, in that case, why can't those tests live in the SQL module? I think that we should be able to add a test where we create a SparkContext + SQLContext with event logging enabled, run some queries, stop the context, then programmatically instantiate a HistoryServer, reconstitute the UI from the event logs, and perform some assertions over the loaded UI (e.g. check the number of SQL tabs, check that clicking on the SQL tab returns a page with some expected strings, etc). We could easily define some test helper classes to make this easy to write. |
|
I'm not saying they can't live there, I'm just saying they have to be written (since they don't exist) and when they are, they'll have to live there. |
|
Because this PR broke the Master Maven tests, #9991 isn't read to merge, and there's not an easy way to ignore failing tests (since the problem is broader in scope than that single test), I'm going to revert this for now. Let's aim to resubmit a new PR which addresses the concerns being discussed in #9991. |
Resubmit apache#9297 and apache#9991 On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution. To support SQL UI on the history server: 1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus. 2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module. 3. The new SQL events are written to event log using Jackson. 4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader. Author: Carson Wang <[email protected]> Closes apache#10061 from carsonwang/SqlHistoryUI.
Resubmit apache#9297 and apache#9991 On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution. To support SQL UI on the history server: 1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus. 2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module. 3. The new SQL events are written to event log using Jackson. 4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader. Author: Carson Wang <[email protected]> Closes apache#10061 from carsonwang/SqlHistoryUI. # Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLTab.scala # sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala Added more features: # More details to the SparkPlanInfo # Added the execution plan to action # Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLTab.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution.
To support SQL UI on the history server:
onOtherEventmethod to theSparkListenertrait and post all SQL related events to the same event bus.SparkListenerSQLExecutionStartandSparkListenerSQLExecutionEndare defined in the sql module.SparkHistoryListenerFactoryis added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime usingjava.util.ServiceLoader.