Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented Jun 13, 2020

This is WIP since it is on top of SPARK-31197 (which itself is WIP on top off SPARK-20629 ) and should probably have more testing. We use SPARK-31197's exiting of the executor once decommissioning is finished to allow us to replace the usage of killExecutor with decommission executor when enabled during dynamic allocation.

What changes were proposed in this pull request?

If graceful decommissioning is enabled, Spark's dynamic scaling uses this instead of directly killing executors.

Why are the changes needed?

When scaling down Spark we should avoid triggering recomputes as much as possible.

Does this PR introduce any user-facing change?

Hopefully their jobs run faster. It also enables experimental shuffle service free decommissioning when graceful decommissioning is enabled.

How was this patch tested?

For now I've extended the ExecutorAllocationManagerSuite to cover this.

@SparkQA
Copy link

SparkQA commented Jun 13, 2020

Test build #123956 has finished for PR 28818 at commit 2ff94ec.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @holdenk .. Using proper decommissioning for dynamic allocation would be great and help unify these codepaths. Thank you again for working on this.

My two comments below are mere nits and just to make sure I am following along. It reads fine as is.

@SparkQA
Copy link

SparkQA commented Jun 14, 2020

Test build #123995 has finished for PR 28818 at commit ef3f523.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk holdenk force-pushed the SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling-on-top-of-SPARK-1197 branch from ef3f523 to 7691d2d Compare June 16, 2020 19:56
@SparkQA
Copy link

SparkQA commented Jun 16, 2020

Test build #124139 has finished for PR 28818 at commit 7691d2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* committed.
*/
event.blockUpdatedInfo.blockId match {
case ShuffleDataBlockId(shuffleId, _, _) => exec.addShuffle(shuffleId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are touching ExecutorMonitor, when do we have a counter operation, exec.removeShuffle? In this PR, it seems that executorsKilled is used. Is it enough?
cc @dbtsai

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so since we're only doing migrations during decommissioning, whatever shuffle files remain on the host when it dies will do the cleanup. I can't think of why we would need to do a delete operation here as well, but if it would be useful for your follow on work I can add it?

@SparkQA
Copy link

SparkQA commented Jun 17, 2020

Test build #124154 has finished for PR 28818 at commit 9bb0293.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk holdenk force-pushed the SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling-on-top-of-SPARK-1197 branch from 9bb0293 to 81f9dbb Compare June 17, 2020 20:57
@SparkQA
Copy link

SparkQA commented Jun 17, 2020

Test build #124179 has finished for PR 28818 at commit 81f9dbb.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2020

Test build #124181 has finished for PR 28818 at commit 2674055.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @Ngone51 @jiangxb1987

@HyukjinKwon
Copy link
Member

cc @tgravescs too

@holdenk
Copy link
Contributor Author

holdenk commented Jun 18, 2020

To be clear this a WIP PR and not yet ready for review. I created it to give additional context after talking with @agrawaldevesh about our respective goals. I'll try and get a "read for review" PR out next week.

@holdenk
Copy link
Contributor Author

holdenk commented Jun 24, 2020

Just as a follow up since @HyukjinKwon requested an SPIP for this work I won't be moving this PR out of WIP this week as originally planned.

Copy link

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question inline please.

I didn't quite follow why this PR is rebased on top of https://issues.apache.org/jira/browse/SPARK-31197 and https://issues.apache.org/jira/browse/SPARK-20629. They seem orthogonal to me. Can you please update the PR description to reflect the relationship b/w this PR and these Jira tickets. Thanks !

*/
def decommissionExecutor(executorId: String): Boolean = {
val decommissionedExecutors = decommissionExecutors(Seq(executorId),
adjustTargetNumExecutors = true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is adjustTargetNumExecutors defaulting to true here ? This would mean that all schedulers would try to replinish the executor when asked to DecommissionExecutor(...) -- for example by the Master or when an executor gets a SIGPWR.

I think it shouldn't be the default -- it should atleast be configurable. It only makes sense to have adjustTargetNumExecutors=true when called from org.apache.spark.streaming.scheduler.ExecutorAllocationManager#killExecutor (ie when it is truly called from dynamic allocation codepath and we have decided that we want to replinish the executor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look above there is a configurable call. This matches how killExecutor is implemented down on line 124.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please point me to where is the configurable call ? I don't see a config check in the code paths that call this method.

It's fine for killExecutor to unconditionally adjust the target number of executors because it is only called in the dynamic allocation codepath, but decommissionExecutor would be called from many other codepaths as well (for example when the driver gets a DecommissionExecutor message) -- and thus I think it should just assume that it should replenish the executor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look on line 95 of this file. I think we should match the semantics of killExecutor as much as possible. If there's a place where we don't want it we can use decommissionExecutors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, Should we rename decommissionExecutor (singular) to decommissionAndKillExecutor to reflect its purpose better ? It would be too easy to confuse it with decommissionExecutors (on line 95 of this file which allows to not replenish the target number of executors).

Do you want to make the change to the callers of decommissionExecutor in this PR and switch them to decommissioExecutors(Seq(executorId), false) instead. The ones I am most concerned about are:

  • The handling of message DecommissionExecutor (both sync and async variants) in CoarseGrainedSchedulerBackend
  • StandaloneSchedulerBackend.executorDecommissioned

In both the above cases, I think we may not always want replenishing. For example, in the standalone case, when the Worker gets a SIGPWR -- do we want to replenish the executors on the remaining workers (ie oversubscribe the remaining workers) ? Similarly when an executor gets a SIGPWR, do we want to put that load on the remaining executors ? I think the answer to both should be NO unless we are doing a dynamic allocation.

Personally I am fine with any choice of naming here as long as the semantics are not silently changed under the cover, as is the case presently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a new function, what are we changing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecutorAllocationClient is a base class of CoarseGrainedSchedulerBackend. We moved decommissionExecutor from the latter class to the former and as such it is not a new function. Since CoarseGrainedSchedulerBackend no longer overrides decommissionExecutor, ExecutorAllocationClient.decommissionExecutor will be called when CoarseGrainedSchedulerBackend gets a DecommissionExecutor message -- and the semantics of that codepath have been changed to unconditionally impose adjustTargetNumExecutors=true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, I'll update the previous calls to decommissioExecutor

@holdenk holdenk force-pushed the SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling-on-top-of-SPARK-1197 branch from 2674055 to 9565c40 Compare July 21, 2020 19:35
@holdenk
Copy link
Contributor Author

holdenk commented Jul 21, 2020

@agrawaldevesh So this PR depends on the behaviour of the VM eventually exiting (https://issues.apache.org/jira/browse/SPARK-31197 ) since it's replacing the usage of killExecutor during dynamic allocation.

/**
* Mark a given executor as decommissioned and stop making resource offers for it.
*/
private def decommissionExecutor(executorId: String): Boolean = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holdenk ... to answer your question: This block of code was moved to the base class ExecutorAllocationClient. So the code in ExecutorAllocationClient is not "New". Furthermore, the semantics of this code were changed as it was moved to now unconditionally replenish the executors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

@SparkQA
Copy link

SparkQA commented Jul 21, 2020

Test build #126273 has finished for PR 28818 at commit 9565c40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2020

Test build #126281 has finished for PR 28818 at commit 3db60f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk holdenk force-pushed the SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling-on-top-of-SPARK-1197 branch from 3db60f2 to daf96dd Compare July 22, 2020 02:19
@SparkQA
Copy link

SparkQA commented Jul 22, 2020

Test build #126292 has finished for PR 28818 at commit daf96dd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

holdenk added 4 commits July 23, 2020 11:22
Because the mock always says there is an RDD we may replicate more than once, and now that there are independent threads
…we don't scale down too low, update the streaming ExecutorAllocationManager to also delegate to decommission

Fix up executor add for resource profile
…eanup the locks we use in decommissioning and clarify some more bits.
@holdenk holdenk force-pushed the SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling-on-top-of-SPARK-1197 branch from daf96dd to f921ddd Compare July 23, 2020 21:35
@SparkQA
Copy link

SparkQA commented Jul 23, 2020

Test build #126435 has finished for PR 28818 at commit f921ddd.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@agrawaldevesh
Copy link

@holdenk can this PR be abandoned/closed now since this is finally in ?

@holdenk holdenk closed this Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants