-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler #10700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped
|
Changed Jira ref from SPARK-6950 to SPARK-12755. SPARK-6950 is an older, defunct ticket. Oops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here to explain why this needs to be stopped before the DAGScheduler in order to prevent this change from being accidentally lost in the future?
|
Will this change still be relevant if we remove the Spark Master's embedded HistoryServer? In other words, does this race condition affect the standalone HistoryServer or only the Master history server? If it only affects the master then this isn't worth changing in master since we're going to remove the master's embedded history server for Spark 2.0. It may still be worthwhile for Spark 1.6, though, although there's a risk/benefit trade-off here. |
|
Hi Josh, Good questions. I may have submitted this PR incorrectly. Perhaps you can guide me in the right direction. I submitted this PR for merging into master because my understanding is that's how all PR's for the Spark project should be created. And patches against master may be backported to earlier releases. However, I originally created and tested this patch on branch-1.5 because that's what we're currently running. So while this patch may be irrelevant to master (or Spark 2.0), it's relevant to the Spark 1.5 branch and presumably 1.6 as well. Under these circumstances, should I have submitted a PR against master as I have done? The code contribution guidelines state that only in a special case would a PR be opened against another branch. Does a patch with no or lesser relevance to the master branch compared to an earlier release branch qualify as a "special case"? And if so, which branch should I have submitted the PR against? Thanks. |
|
I should also state that my original motivation in submitting this patch was to address the confusing log messages which I saw in the Spark master log for apps that actually terminated normally. Also, it's just come to mind that this bug may explain another behavior I've seen—sometimes an app's event log is corrupted if it was configured to be compressed. If the log is uncompressed then the ability for the history reader to decode an "in progress" event log allows it to be processed as expected. However, if the event log is being written through a compressed output stream and is not properly flushed before it is processed, then the processing may fail because the file compression was incomplete. (As this just occurred to me I haven't tested this hypothesis, but it does sound like a plausible explanation.) If this is the case, then this patch should correct the problem with corrupt compressed event logs. |
|
Yes, we'd prefer to make changes in |
|
Test build #2395 has finished for PR 10700 at commit
|
|
Test build #2397 has finished for PR 10700 at commit
|
scheduler, take 3. This was my original intention, bungled twice :/
|
Sorry guys. I bungled the ordering of the |
|
Test build #2398 has finished for PR 10700 at commit
|
|
Here are my current thoughts. Josh says this functionality is going to be removed in Spark 2.0. The bug this PR is designed to address manifests itself in Spark 1.5 in three ways (I'm aware of):
The most problematic of these is unrecoverable event logs. I've been frustrated by this before and turned off event log compression as a workaround. Since deploying a build with this patch to one of our dev clusters I haven't seen this problem again. I don't see a simple way to write a test to support this PR. Overall, I feel we should close this PR but keep a reference to it from Jira with a comment that Spark 1.5 and 1.6 users can try this patch—at their own risk—to address the described symptoms if they wish to. It's going into our own Spark 1.x builds. I'll close this PR and the associated Jira issue within the next few days unless someone objects or wishes to continue discussion. Thanks. |
|
Are there downsides to merging this to master, even if the related functionality is about to be removed? it passes tests, and seems to improve an ordering of shutdown, and can be backported to fix an actual minor issue in previous releases. Tests would be cool but you're correct that this one could be really hard to trigger. I see no reason to close this? |
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped. This contribution is my original work, and I license this work to the Spark project under the project's open source license. Author: Michael Allman <[email protected]> Closes #10700 from mallman/stop_event_logger_first. (cherry picked from commit 4ee8191) Signed-off-by: Sean Owen <[email protected]>
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped. This contribution is my original work, and I license this work to the Spark project under the project's open source license. Author: Michael Allman <[email protected]> Closes #10700 from mallman/stop_event_logger_first. (cherry picked from commit 4ee8191) Signed-off-by: Sean Owen <[email protected]>
|
Merged to master/1.6/1.5 |
|
Thanks, @srowen. |
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped.
This contribution is my original work, and I license this work to the Spark project under the project's open source license.