Skip to content

Conversation

@carsonwang
Copy link
Contributor

A followup to #9297, fix the SQLListenerMemoryLeakSuite test error. The failure occurs because a sqlListener created by a previous test suite is not cleared in the SQLListenerMemoryLeakSuite.
In the failure case, the previous test suite DateFunctionsSuite has 91 completed executions. So the error message is 91 was not less than or equal to 50.

For test suites extends SharedSQLContext, the sqlListener is cleared in the method beforeAll. Since SQLListenerMemoryLeakSuite doesn't extend SharedSQLContext, the sqlListener need to be cleared manually before creating the SQLContext.

/cc @vanzin @JoshRosen @chenghao-intel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a public API. So the user cannot clear SQLContext.sqlListener? This will be a memory leak considering SQLListener usually stores a lot of data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously each SQLContext has its own sqlListener. Because now the SQL events are posted to the event bus. All SQLContext now share a single sqlListener. I don't think a user need clear SQLContext.sqlListener. This is only used by the unit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-11700 is a bit different. But my point is we should not keep a big object in memory and don't provide an approach to clean it. In some user cases, Spark SQL may be just one of some ETL steps. And if the user finishes his/her work in Spark SQL, he/she usually wants to clean up all resources used by SparkContext/SQLContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is it enough to make SQLContext.clearSqlListener public here? So we provide a way to clear the reference for users who want the object to be GCed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. I can imagine Zeppelin wanting to purge these, or whatever Spark Kernel is named as.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add a SparkContext stop hook. When SparkContext is being stopped, clear the reference. The user doesn't have to call a method to clear the sqlListener reference. The sqlListener is added to SparkContext and will only be garbage collected when SparkContext is stopped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over on the original PR, I commented to ask why SQLContext.sqlListener needs to be an AtomicReference[SQLListener] instead of an AtomicBoolean or some other sort of atomic primitive. As far as I can tell, we never access any methods or fields of the sqlListener that's stored here, so if we only need to set something for compare-and-swap purposes then I think we shouldn't use an AtomicReference, thereby avoiding the GC issues that it causes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many unit tests use sqlContext.listener. Can you please suggest how to update the unit tests if we changed to use an AtomicBoolean?

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46751 has finished for PR 9991 at commit 8ca3031.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@carsonwang
Copy link
Contributor Author

The original purpose of this PR is to fix the SQLListenerMemoryLeakSuite test failure. This can be resolved by clearing SQLContext.sqlListener before the test.

To prevent memory leak similar to SPARK-11700, I added a SparkContext stop hook to clear the sqlListener reference. I didn't make SQLContext.clearSqlListener a public API because it seems a little confusing for users to call it. And the sqlListener is added to SparkContext, it will not be GCed at once even if a user calls SQLContext.clearSqlListener. Now we clear the reference when the SparkContext is being stopped to allow the sqlListener to be GCed later.

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #46805 has finished for PR 9991 at commit 4549f62.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #46807 has finished for PR 9991 at commit b694e27.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Nov 27, 2015

Test build #46807 has finished for PR 9991 at commit b694e27.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

This one should be fixed in #10011

@zsxwing
Copy link
Member

zsxwing commented Nov 27, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #46828 has finished for PR 9991 at commit b694e27.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@carsonwang
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 28, 2015

Test build #46834 has finished for PR 9991 at commit b694e27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of adding more hooks, but since SQLContext doesn't have a stop method... anyway, probably good to wrap calls to the hooks with Utils.tryLogNonFatalError.

@vanzin
Copy link
Contributor

vanzin commented Nov 29, 2015

Would things work if SQLListener cleaned up after itself when SparkListenerApplicationEnd is received? That would avoid adding hooks in SparkContext.

Also, does the history server suffer from this problem now that it can instantiate SQLListener via SparkHistoryListenerFactory? If it does, handling SparkListenerApplicationEnd would probably solve that problem too.

@JoshRosen
Copy link
Contributor

Haven't followed discussion in detail yet, but just wanted to flag this PR/discussion as a high priority item to get resolved soon, since the failing memory leak test is preventing the Maven builds from running certain subsequent suites. We should try to get this fixed before we start merging a bunch of patches on Monday.

@carsonwang
Copy link
Contributor Author

@vanzin , I wrapped the calls to the hooks with Utils.tryLogNonFatalError. I didn't clean up the SQLListener after a application end event because another SQLContext created later still wants to use the same SQLListener. I think the history server does not have the problem because there is no reference to the SQLListener from a companion object in that case.

@carsonwang
Copy link
Contributor Author

@zsxwing , do you have any further comments regarding how the SQLListener is cleaned up?

@SparkQA
Copy link

SparkQA commented Nov 30, 2015

Test build #46862 has finished for PR 9991 at commit 00df329.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Nov 30, 2015

I didn't clean up the SQLListener after a application end event because another SQLContext created later still wants to use the same SQLListener.

Sorry, I don't follow. SparkListenerApplicationEnd is posted by SparkContext.stop, which is the same place where you're adding the hook to clean up the listener. So it should behave exactly the same way, no?

@zsxwing
Copy link
Member

zsxwing commented Nov 30, 2015

Sorry, I don't follow. SparkListenerApplicationEnd is posted by SparkContext.stop, which is the same place where you're adding the hook to clean up the listener. So it should behave exactly the same way, no?

It's a bit different because the location of postApplicationEnd() is at the beginning of stop?

Not related to this issue: I just noticed the location in postApplicationEnd() may be not in the correct place. There is a race condition that after postApplicationEnd(), some task/job events will still be put into the listener bus. Is it safe to move postApplicationEnd() to the bottom of stop? Then I think it's exactly the same way that adding a hook.

@vanzin
Copy link
Contributor

vanzin commented Nov 30, 2015

It's a bit different because the location of postApplicationEnd() is at the beginning of stop?

It's a bit different but not in the way @carsonwang explained; whether you use the hook or handle SparkListenerApplicationEnd, the listener will be cleared when the context is shut down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method (or the collection that it manipulates) need to be synchronized in order to be thread-safe?

@JoshRosen
Copy link
Contributor

Head's up: since discussion here is still ongoing and I think there's still more work to do, I'm going to revert #9297 in order to un-break the master Maven tests. Could you re-submit a new PR which contains both the code from the original PR and which addresses the state lifecycle issues being discussed here? Thanks!

asfgit pushed a commit that referenced this pull request Nov 30, 2015
This reverts commit cc243a0 / PR #9297

I'm reverting this because it broke SQLListenerMemoryLeakSuite in the master Maven builds.

See #9991 for a discussion of why this broke the tests.
@carsonwang
Copy link
Contributor Author

It's a bit different but not in the way @carsonwang explained; whether you use the hook or handle SparkListenerApplicationEnd, the listener will be cleared when the context is shut down.

Sorry I misunderstood it. Ok I will use SparkListenerApplicationEnd. The issue @zsxwing mentioned can probably be addressed in another PR.

@carsonwang
Copy link
Contributor Author

Close this and resubmit #10061

@carsonwang carsonwang closed this Dec 1, 2015
ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 4, 2015
Resubmit apache#9297 and apache#9991
On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution.

To support SQL UI on the history server:
1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus.
2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module.
3. The new SQL events are written to event log using Jackson.
4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader.

Author: Carson Wang <[email protected]>

Closes apache#10061 from carsonwang/SqlHistoryUI.
Parth-Brahmbhatt pushed a commit to Parth-Brahmbhatt/spark that referenced this pull request Jul 25, 2016
Resubmit apache#9297 and apache#9991
On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution.

To support SQL UI on the history server:
1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus.
2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module.
3. The new SQL events are written to event log using Jackson.
4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader.

Author: Carson Wang <[email protected]>

Closes apache#10061 from carsonwang/SqlHistoryUI.

# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLTab.scala
#	sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

Added more features:
# More details to the SparkPlanInfo
# Added the execution plan to action

# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLTab.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants