-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15176][Core] Add maxShares setting to Pools #12951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this line exceeds 100 characters.
|
@HyukjinKwon I've run |
|
@njwhite I am not a committer but just one of contributors. I guess most of codes were written by @kayousterhout in this part. |
|
Ping, anything more needed on this PR before merging? |
|
cc @kayousterhout for review |
|
I commented on the JIRA. |
|
Hi @njwhite, I'm not sure I see a strong need for this -- I posted a msg on jira (as Kay had earlier). We should keep discussion about the feature in general there, for archive / searchability. In any case I did look at the code, so a couple of comments about that, if we do decide we want this feature. Unless I'm missing something, it doesn't seem like Also to go along with that, we'd want new test cases demonstrating how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as you're touching this, switch to using string interpolation. (eg. s"Created default pool $DEFAULT_POOL_NAME, ...). Also since this is repeated a few times, you might just add a helper logPoolCreated(pool) or something.
b4b7624 to
c4082c5
Compare
|
Thanks for the review @squito - I've commented on the JIRA about why this feature would be useful. As for the implementation - maybe "maxShare" is the wrong word, as the change doesn't relate to the fair scheduler at all. Instead it limits the maximum number of tasks a |
|
ah, I completely overlooked The added tests verify that Calling it "maxShare" is pretty confusing -- with this implementation it should probably be called "maxRunningTasks" or something. It also seems pretty hard to configure, though, I wonder if users really do want maxShare. We should be sure that whatever we add is what want long-term, so we're not stuck with complexity from a legacy setting. honestly I am still uncertain about adding the feature, need to think about it more -- I'm just giving my comments on the code here. A very clean, well-tested PR can help make your case, but OTOH can also turn into wasted effort ... |
|
Jenkins, ok to test |
|
Test build #59030 has finished for PR 12951 at commit
|
|
Thanks @squito; I've renamed the setting to |
|
Test build #59497 has finished for PR 12951 at commit
|
e100683 to
0669b49
Compare
|
Test build #59498 has finished for PR 12951 at commit
|
|
Test build #59501 has finished for PR 12951 at commit
|
|
Added my comments to the JIRA. In short, I think there is a legitimate use case for this, and there is a significant gap in our current fair-scheduling pool API. Implementing a maxShare property is actually something that has been on my todo list for awhile. |
|
Thanks @markhamstra! The Jenkins build failed because a single test, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case should include scheduling another taskset to a different pool which does not share the limitation, and making sure it can still schedule tasks even when the first task set gets limited.
|
@squito thanks - I've expanded the |
|
Test build #60036 has finished for PR 12951 at commit
|
|
@squito is this OK? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to move this to Schedulable.scala? It looks like Pool and TaskSetManager both have the same implementation (assuming that Int.MAX_VALUE is the default).
|
The naming on this PR is somewhat confusing, because it looks like maxShares is supposed to return the maximum number of remaining tasks that can be run, rather than the maximum number of tasks that can be running at a time. The current name implies the latter. Is it possible to use a more descriptive name for this? maxRemainingTasks? I don't have a great idea here but maybe others do? |
|
Also, once naming is settled on, this PR should include a documentation update to this page: https://spark.apache.org/docs/latest/job-scheduling.html to describe this. |
Help guarantee resource availablity by (hierarchically) limiting the amount of tasks a given pool can run. Also adds support for specifying the parent pool in the "spark.scheduler.allocation.file".
|
Test build #61724 has finished for PR 12951 at commit
|
|
Hi @kayousterhout - I've renamed all references to |
|
ping? |
|
@njwhite sorry to let this idle for so long. I just read through the comments here and on the JIRA and it looks like the consensus on the JIRA was that it would be better to implement maxShare instead of maxRunningTasks, because it's likely easier to configure, and also is less brittle to the cluster size. Can you implement that change? Alternately if you think this should remain maxRunningTasks, comment on the JIRA and we can continue the discussion there. |
|
@njwhite do you have time to work on this and implement maxShares? If not, can you close the PR? |
|
@kayousterhout minShares is a configuration parameter for the fair scheduler algorithm only - what would the semantics of a maxShares setting for the FIFO algorithm be? |
|
Actually, @kayousterhout - I'm not entirely sure what you expect for the semantics of maxShares in general. Maybe a worked example would help: if I have a pool X with 5 running tasks from Taskset A and a maxShares of 7. Pool X is a child of pool Y which has a maxShares of 8. I want to the schedule another task from Taskset A, so should the scheduler allow it or not? Do you need to know how many executors are currently running (and so the maximum number of tasks that could be run)? |
|
Should we process with this PR or should we close this? @kayousterhout @njwhite |
|
Can one of the admins verify this patch? |
What changes were proposed in this pull request?
Help guarantee resource availablity by (hierarchically) limiting the amount of tasks a given pool can run. The maximum number of tasks for a given pool can be configured by the allocation XML file, and child pools are limited to at most the number of tasks of their parent.
How was this patch tested?
Unit tests run and new unit tests added for functionality.