-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11572] Exit AsynchronousListenerBus thread when stop() is called #9546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #45302 has finished for PR 9546 at commit
|
|
@ted-yu I have found that the call to Would you mind adding something like that to your PR? |
|
I was unable to duplicate the issue I had with the If I encounter the issue again then I can create a PR to adjust the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this here, can you simplify the try block as follows?
try {
if (stopped.get()) {
// Get out of the while loop and shutdown the daemon thread
return
}
val event = eventQueue.poll()
assert(event != null, "event queue was empty but the listener bus was not stopped")
postToAll(event)
}
I believe this will also fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this into the try? If stopped.get throws an exception we still want to set processingEvent to false
|
Test build #45460 has finished for PR 9546 at commit
|
|
I see several errors in the following form (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45460/consoleFull): Not related to the PR. |
|
Test build #45464 has finished for PR 9546 at commit
|
|
@andrewor14 |
As vonnagy reported in the following thread: http://search-hadoop.com/m/q3RTtk982kvIow22 Attempts to join the thread in AsynchronousListenerBus resulted in lock up because AsynchronousListenerBus thread was still getting messages `SparkListenerExecutorMetricsUpdate` from the DAGScheduler Author: tedyu <[email protected]> Closes #9546 from ted-yu/master. (cherry picked from commit 3e0a6cf) Signed-off-by: Andrew Or <[email protected]>
|
I think that this has caused the "org.apache.spark.scheduler.EventLoggingListenerSuite.End-to-end event logging" test to become flaky in Jenkins. For example: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/4014/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/testReport/junit/org.apache.spark.scheduler/EventLoggingListenerSuite/End_to_end_event_logging/ I believe that this patch may have changed the behavior of the listener bus during shutdown. According to the It looks like this patch just changes things so that we halt immediately once the |
|
Look at the Master SBT build; there's definitely a regression: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/4014/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/testReport/junit/org.apache.spark.scheduler/EventLoggingListenerSuite/End_to_end_event_logging/history/ If you keep clicking on the "Older" link to page back through the test history, you'll find that this first started in https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/3982/testReport/, whose changeset includes this patch: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/3982/changes |
|
Planning to send out a PR to fix the regression by keeping count of queued events first time seeing the stop flag. |
As vonnagy reported in the following thread:
http://search-hadoop.com/m/q3RTtk982kvIow22
Attempts to join the thread in AsynchronousListenerBus resulted in lock up because AsynchronousListenerBus thread was still getting messages
SparkListenerExecutorMetricsUpdatefrom the DAGScheduler