Skip to content

Conversation

@skyluc
Copy link

@skyluc skyluc commented Feb 3, 2016

Fix for SPARK-13002 about the initial number of executors when running with dynamic allocation on Mesos.
Instead of fixing it just for the Mesos case, made the change in ExecutorAllocationManager. It is already driving the number of executors running on Mesos, only no the initial value.

The None and Some(0) are internal details on the computation of resources to reserved, in the Mesos backend scheduler. executorLimitOption has to be initialized correctly, otherwise the Mesos backend scheduler will, either, create to many executors at launch, or not create any executors and not be able to recover from this state.

Removed the 'special case' description in the doc. It was not totally accurate, and is not needed anymore.

This doesn't fix the same problem visible with Spark standalone. There is no straightforward way to send the initial value in standalone mode.

Somebody knowing this part of the yarn support should review this change.

@dragos
Copy link
Contributor

dragos commented Feb 3, 2016

LGTM.

@SparkQA
Copy link

SparkQA commented Feb 3, 2016

Test build #50657 has finished for PR 11047 at commit 1c75940.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@skyluc just so I understand the issue is not that dynamic allocation doesn't work, but rather spark.dynamicAllocation.initialExecutors doesn't take effect?

@andrewor14
Copy link
Contributor

@vanzin isn't there already another place where we do this initial syncing? Does YARN have the same issue?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was this way before, but can you please s/coarse grain/coarse-grained.

@andrewor14
Copy link
Contributor

@skyluc This change LGTM by the way. I'm just hesitant on backporting it into 1.6 since (1) it's a small issue, and (2) it changes core behavior and so affects other cluster modes as well. In general we try to be conservative about what goes into a maintenance release unless it's a critical issue.

By the way I submitted the standalone mode equivalent of this patch at #11054. The solution is similar; the main difference is that in standalone mode the Master keeps track of the executor limit for each application, whereas in Mesos each driver keeps track of its own limit.

@andrewor14
Copy link
Contributor

Once you address @mgummelt's comments I'll go ahead and merge this.

@vanzin
Copy link
Contributor

vanzin commented Feb 3, 2016

For YARN, see YarnSparkHadoopUtil.getInitialTargetExecutorNumber. I don't think this change will cause any issues with the YARN backend (it should just see a request to set the target number of executors to the same number it already is, so it will just ignore it).

@skyluc
Copy link
Author

skyluc commented Feb 4, 2016

@andrewor14 yes, dynamic allocation works fine, but spark.dynamicAllocation.initialExecutors is not used at start-up.

@SparkQA
Copy link

SparkQA commented Feb 4, 2016

Test build #50747 has finished for PR 11047 at commit 8dda6bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note, though. Marathon won't be able to launch sbin/start-mesos-shuffle-service.sh because it immediately goes to background and Marathon thinks it exited. It will keep re-launching to the end of days.

What you need is to launch it via spark-class, for instance I'm using bin/spark-class org.apache.spark.deploy.mesos.MesosExternalShuffleService. See this discussion on mesos-user.

@andrewor14
Copy link
Contributor

but spark.dynamicAllocation.initialExecutors is not used at start-up.

sorry what do you mean? Isn't that what this patch is fixing?

asfgit pushed a commit that referenced this pull request Feb 4, 2016
Currently the Master would always set an application's initial executor limit to infinity. If the user specified `spark.dynamicAllocation.initialExecutors`, the config would not take effect. This is similar to #11047 but for standalone mode.

Author: Andrew Or <[email protected]>

Closes #11054 from andrewor14/standalone-da-initial.
@andrewor14
Copy link
Contributor

(you might need to resolve a small conflict from my standalone patch...)

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50820 has finished for PR 11047 at commit 003e865.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50821 has finished for PR 11047 at commit f5ab629.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Merged into master. If there are more comments on the docs we can address them separately.

@asfgit asfgit closed this in 0bb5b73 Feb 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants