Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,17 @@ private[spark] class CoarseMesosSchedulerBackend(
val failuresBySlaveId: HashMap[String, Int] = new HashMap[String, Int]

/**
* The total number of executors we aim to have. Undefined when not using dynamic allocation
* and before the ExecutorAllocatorManager calls [[doRequestTotalExecutors]].
* The total number of executors we aim to have. Undefined when not using dynamic allocation.
* Initially set to 0 when using dynamic allocation, the executor allocation manager will send
* the real initial limit later.
*/
private var executorLimitOption: Option[Int] = None
private var executorLimitOption: Option[Int] = {
if (Utils.isDynamicAllocationEnabled(conf)) {
Some(0)
} else {
None
}
}

/**
* Return the current executor limit, which may be [[Int.MaxValue]]
Expand Down
21 changes: 9 additions & 12 deletions docs/running-on-mesos.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,18 +246,15 @@ In either case, HDFS runs separately from Hadoop MapReduce, without being schedu

# Dynamic Resource Allocation with Mesos

Mesos supports dynamic allocation only with coarse grain mode, which can resize the number of executors based on statistics
of the application. While dynamic allocation supports both scaling up and scaling down the number of executors, the coarse grain scheduler only supports scaling down
since it is already designed to run one executor per slave with the configured amount of resources. However, after scaling down the number of executors the coarse grain scheduler
can scale back up to the same amount of executors when Spark signals more executors are needed.

Users that like to utilize this feature should launch the Mesos Shuffle Service that
provides shuffle data cleanup functionality on top of the Shuffle Service since Mesos doesn't yet support notifying another framework's
termination. To launch/stop the Mesos Shuffle Service please use the provided sbin/start-mesos-shuffle-service.sh and sbin/stop-mesos-shuffle-service.sh
scripts accordingly.

The Shuffle Service is expected to be running on each slave node that will run Spark executors. One way to easily achieve this with Mesos
is to launch the Shuffle Service with Marathon with a unique host constraint.
Mesos supports dynamic allocation only with coarse-grain mode, which can resize the number of
executors based on statistics of the application. For general information,
see [Dynamic Resource Allocation](job-scheduling.html#dynamic-resource-allocation).

The External Shuffle Service to use is the Mesos Shuffle Service. It provides shuffle data cleanup functionality
on top of the Shuffle Service since Mesos doesn't yet support notifying another framework's
termination. To launch it, run `$SPARK_HOME/sbin/start-mesos-shuffle-service.sh` on all slave nodes, with `spark.shuffle.service.enabled` set to `true`.

This can also be achieved through Marathon, using a unique host constraint, and the following command: `bin/spark-class org.apache.spark.deploy.mesos.MesosExternalShuffleService`.

# Configuration

Expand Down