-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13002][Mesos] Send initial request of executors for dyn allocation #11047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM. |
|
Test build #50657 has finished for PR 11047 at commit
|
|
@skyluc just so I understand the issue is not that dynamic allocation doesn't work, but rather |
|
@vanzin isn't there already another place where we do this initial syncing? Does YARN have the same issue? |
docs/running-on-mesos.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it was this way before, but can you please s/coarse grain/coarse-grained.
|
@skyluc This change LGTM by the way. I'm just hesitant on backporting it into 1.6 since (1) it's a small issue, and (2) it changes core behavior and so affects other cluster modes as well. In general we try to be conservative about what goes into a maintenance release unless it's a critical issue. By the way I submitted the standalone mode equivalent of this patch at #11054. The solution is similar; the main difference is that in standalone mode the Master keeps track of the executor limit for each application, whereas in Mesos each driver keeps track of its own limit. |
|
Once you address @mgummelt's comments I'll go ahead and merge this. |
|
For YARN, see |
|
@andrewor14 yes, dynamic allocation works fine, but |
|
Test build #50747 has finished for PR 11047 at commit
|
docs/running-on-mesos.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note, though. Marathon won't be able to launch sbin/start-mesos-shuffle-service.sh because it immediately goes to background and Marathon thinks it exited. It will keep re-launching to the end of days.
What you need is to launch it via spark-class, for instance I'm using bin/spark-class org.apache.spark.deploy.mesos.MesosExternalShuffleService. See this discussion on mesos-user.
sorry what do you mean? Isn't that what this patch is fixing? |
Currently the Master would always set an application's initial executor limit to infinity. If the user specified `spark.dynamicAllocation.initialExecutors`, the config would not take effect. This is similar to #11047 but for standalone mode. Author: Andrew Or <[email protected]> Closes #11054 from andrewor14/standalone-da-initial.
|
(you might need to resolve a small conflict from my standalone patch...) |
|
Test build #50820 has finished for PR 11047 at commit
|
|
Test build #50821 has finished for PR 11047 at commit
|
|
Merged into master. If there are more comments on the docs we can address them separately. |
Fix for SPARK-13002 about the initial number of executors when running with dynamic allocation on Mesos.
Instead of fixing it just for the Mesos case, made the change in
ExecutorAllocationManager. It is already driving the number of executors running on Mesos, only no the initial value.The
NoneandSome(0)are internal details on the computation of resources to reserved, in the Mesos backend scheduler.executorLimitOptionhas to be initialized correctly, otherwise the Mesos backend scheduler will, either, create to many executors at launch, or not create any executors and not be able to recover from this state.Removed the 'special case' description in the doc. It was not totally accurate, and is not needed anymore.
This doesn't fix the same problem visible with Spark standalone. There is no straightforward way to send the initial value in standalone mode.
Somebody knowing this part of the yarn support should review this change.