-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10193] [core] [wip] Eliminate Skipped Stages by reusing ShuffleMapStages #8427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins, retest this please |
|
This is going to increase memory pressure. The very early code never cleaned up the Stage-tracking data structures at all, which was clearly unacceptable for long-running Applications. What we have now cleans up as soon as possible, and thus has minimal memory pressure. What you have in this PR lands somewhere in between, and could cause problems if a lot of Stages stick around for a long time. |
|
Test build #41556 has finished for PR 8427 at commit
|
|
@markhamstra yup, no question this will increase memory usage. The question is, should we consider it anyway? Maybe you were implicitly answering "no", but I'm gonna make my case again in any case :) Clearly, if you have long running jobs w/ lots of stages, and you never do anything to clean them up, then I think its a pretty big usability improvement, so worth considering, but that is totally subjective. I realize this is a bit hand wavy now -- I'll try to quantify the memory usage effect so we can make a more informed decision (if others are still interested somewhat). |
|
I wasn't intending to answer "no", but rather just wanting to make sure that we think through the implications of this change. It will increase memory pressure some, but I agree that it shouldn't be a lot because of the already present references via the MapOutputTracker. On balance, I'm inclined to agree with you that this is worth doing. |
|
Test build #41625 timed out for PR 8427 at commit |
|
Just a note about MapOutputTracker - it is fairly trivial to make it use bare minimum amount of memory even if it does not get cleaned up for 'old' stages : using a disk backed map (mapdb for example) via LRU. This is what we used to do for production jobs in some earlier projects. I am not sure what the impact of the current proposal is from memory overhead pov - map output was (obviously) expensive enough to attempt this and the affect was not pervasive/diffuse across the codebase for shuffle output tracking. |
|
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket. |
This eliminates "skipped" stages for jobs that share shuffle dependencies, and instead reuses the same stage. This is done by not removing
ShuffleMapStages when a job finishes, but waiting till the shuffle is cleaned by the context cleaner. It does increase memory usage with long jobs with lots of stages (though its the same order as before, since we already hold on to shuffle data inMapOutputTracker).The advantage is simplified code and a clearer experience for the end user -- jobs which reference an already completed stage link to the already completed stage, rather than referencing a new stage which gets "skipped", which is always confusing. (Perhaps it could still use a better UI treatment to make it clear that stage had already completed as part of a previous job.)