-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11206] (Followup) Fix SQLListenerMemoryLeakSuite test error #9991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a public API. So the user cannot clear SQLContext.sqlListener? This will be a memory leak considering SQLListener usually stores a lot of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously each SQLContext has its own sqlListener. Because now the SQL events are posted to the event bus. All SQLContext now share a single sqlListener. I don't think a user need clear SQLContext.sqlListener. This is only used by the unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPARK-11700 is a bit different. But my point is we should not keep a big object in memory and don't provide an approach to clean it. In some user cases, Spark SQL may be just one of some ETL steps. And if the user finishes his/her work in Spark SQL, he/she usually wants to clean up all resources used by SparkContext/SQLContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is it enough to make SQLContext.clearSqlListener public here? So we provide a way to clear the reference for users who want the object to be GCed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. I can imagine Zeppelin wanting to purge these, or whatever Spark Kernel is named as.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add a SparkContext stop hook. When SparkContext is being stopped, clear the reference. The user doesn't have to call a method to clear the sqlListener reference. The sqlListener is added to SparkContext and will only be garbage collected when SparkContext is stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over on the original PR, I commented to ask why SQLContext.sqlListener needs to be an AtomicReference[SQLListener] instead of an AtomicBoolean or some other sort of atomic primitive. As far as I can tell, we never access any methods or fields of the sqlListener that's stored here, so if we only need to set something for compare-and-swap purposes then I think we shouldn't use an AtomicReference, thereby avoiding the GC issues that it causes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many unit tests use sqlContext.listener. Can you please suggest how to update the unit tests if we changed to use an AtomicBoolean?
|
Test build #46751 has finished for PR 9991 at commit
|
|
The original purpose of this PR is to fix the To prevent memory leak similar to SPARK-11700, I added a |
|
Test build #46805 has finished for PR 9991 at commit
|
|
Test build #46807 has finished for PR 9991 at commit
|
|
retest this please |
|
Test build #46828 has finished for PR 9991 at commit
|
|
retest this please |
|
Test build #46834 has finished for PR 9991 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of adding more hooks, but since SQLContext doesn't have a stop method... anyway, probably good to wrap calls to the hooks with Utils.tryLogNonFatalError.
|
Would things work if Also, does the history server suffer from this problem now that it can instantiate |
|
Haven't followed discussion in detail yet, but just wanted to flag this PR/discussion as a high priority item to get resolved soon, since the failing memory leak test is preventing the Maven builds from running certain subsequent suites. We should try to get this fixed before we start merging a bunch of patches on Monday. |
|
@vanzin , I wrapped the calls to the hooks with |
|
@zsxwing , do you have any further comments regarding how the |
|
Test build #46862 has finished for PR 9991 at commit
|
Sorry, I don't follow. |
It's a bit different because the location of Not related to this issue: I just noticed the location in |
It's a bit different but not in the way @carsonwang explained; whether you use the hook or handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method (or the collection that it manipulates) need to be synchronized in order to be thread-safe?
|
Head's up: since discussion here is still ongoing and I think there's still more work to do, I'm going to revert #9297 in order to un-break the master Maven tests. Could you re-submit a new PR which contains both the code from the original PR and which addresses the state lifecycle issues being discussed here? Thanks! |
Sorry I misunderstood it. Ok I will use SparkListenerApplicationEnd. The issue @zsxwing mentioned can probably be addressed in another PR. |
|
Close this and resubmit #10061 |
Resubmit apache#9297 and apache#9991 On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution. To support SQL UI on the history server: 1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus. 2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module. 3. The new SQL events are written to event log using Jackson. 4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader. Author: Carson Wang <[email protected]> Closes apache#10061 from carsonwang/SqlHistoryUI.
Resubmit apache#9297 and apache#9991 On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution. To support SQL UI on the history server: 1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus. 2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module. 3. The new SQL events are written to event log using Jackson. 4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader. Author: Carson Wang <[email protected]> Closes apache#10061 from carsonwang/SqlHistoryUI. # Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLTab.scala # sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala Added more features: # More details to the SparkPlanInfo # Added the execution plan to action # Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLTab.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
A followup to #9297, fix the SQLListenerMemoryLeakSuite test error. The failure occurs because a
sqlListenercreated by a previous test suite is not cleared in the SQLListenerMemoryLeakSuite.In the failure case, the previous test suite DateFunctionsSuite has 91 completed executions. So the error message is
91 was not less than or equal to 50.For test suites extends
SharedSQLContext, the sqlListener is cleared in the methodbeforeAll. SinceSQLListenerMemoryLeakSuitedoesn't extendSharedSQLContext, thesqlListenerneed to be cleared manually before creating theSQLContext./cc @vanzin @JoshRosen @chenghao-intel