[SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blacklisting feature. #28287

venkata91 · 2020-04-22T00:11:58Z

What changes were proposed in this pull request?

In this change, when dynamic allocation is enabled instead of aborting immediately when there is an unschedulable taskset due to blacklisting, pass an event saying SparkListenerUnschedulableTaskSetAdded which will be handled by ExecutorAllocationManager and request more executors needed to schedule the unschedulable blacklisted tasks. Once the event is sent, we start the abortTimer similar to [SPARK-22148][SPARK-15815] to abort in the case when no new executors launched either due to max executors reached or cluster manager is out of capacity.

Why are the changes needed?

This is an improvement. In the case when dynamic allocation is enabled, this would request more executors to schedule the unschedulable tasks instead of aborting the stage without even retrying upto spark.task.maxFailures times (in some cases not retrying at all). This is a potential issue with respect to Spark's Fault tolerance.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit tests both in ExecutorAllocationManagerSuite and TaskSchedulerImplSuite

venkata91 · 2020-04-22T01:43:47Z

Can you please review @squito @tgravescs @mridulm ?

tgravescs · 2020-04-22T13:41:25Z

so took a quick look and I don't follow. You send a message that says all blacklisted, but all you do is call the onSchedulerBacklogged, how does that add another executor if they are all blacklisted? that will update when it needs to add more, but it doesn't change the number if calculated. So for instance say I needed 3 executors. All of them blacklisted, updating the time to get a new one won't change that it thinks it needs 3.
If we change that to tell it to get one more, then my question is how do you do the proper accounting on that. This is why in the jira and other PR we said the allocation manager needs to be closely tied with the blacklist manager.

venkata91 · 2020-04-22T19:08:51Z

so took a quick look and I don't follow. You send a message that says all blacklisted, but all you do is call the onSchedulerBacklogged, how does that add another executor if they are all blacklisted? that will update when it needs to add more, but it doesn't change the number if calculated. So for instance say I needed 3 executors. All of them blacklisted, updating the time to get a new one won't change that it thinks it needs 3.
If we change that to tell it to get one more, then my question is how do you do the proper accounting on that. This is why in the jira and other PR we said the allocation manager needs to be closely tied with the blacklist manager.

Ok. Got it, makes sense now. Let me think more about it.

venkata91 · 2020-04-24T01:00:07Z

@tgravescs #22288 I went through the discussions as part of this PR. Is there any other PR you're referring to with discussions around Dynamic allocation and BlacklistManager? Also can you please explain a bit more on the proper accounting? Lets say we're requesting one executor (its possible, we're doing multiple rounds of requesting one executor) and also ensuring its with in the bound, won't dynamic allocation eventually get the num executors to consistent state with idle executors getting removed periodically?

tgravescs · 2020-04-24T14:05:36Z

it might have been discussions no on PR or buried in comments, I don't have time to go looking. there are many different conditions to consider. The main one we were focusing on is that you had as many executors as you needed to execute the task you had left. This means the allocation manager was not going to ask for more. The problem is that some or all of those executor can get blacklisted. The only way for the dynamic allocation manager to know it needs to ask for more is for it to know that nodes are blacklisted and it needs to ask for some more executors - thus internally incrementing is count of executors needed and asking yarn or other resource manager for more. so now the number of executors the allocation manager is different then what its normal calculations would figure out. simplified: #executors = (#tasks * #cpus per task/#cores per executor). So you have to change the number of executors and you have to keep taking that it account because the allocation manager is always trying to calculate if it needs more or less executors. You also have to notify it when executors become unblacklisted, or perhaps the ones that were blacklisted idle timeout, etc. The allocation manager has to know a lot more details about the blacklisting and take that into account when its calculating the number of executors it needs.

venkata91 · 2020-06-23T06:37:56Z

@tgravescs After thinking about the problem and also after discussing with @mridulm, I have handled this problem now by just keeping track of unschedulable task sets in order to add more executors when dynamic allocation is enabled. Now once some task becomes schedulable, we'll clear this set since some executor got free or we have just acquired a new executor and found a way to make progress. Let me know what do you think about this change. Thanks for taking a look previously and giving the overall context

venkata91 · 2020-06-24T17:24:06Z

Can someone help me with why this Generate documents check is failing? Not sure I'm understanding the issue here. Any pointers would be appreciated.

tgravescs · 2020-06-24T19:20:12Z

sorry haven't had a chance to look at your rework, I rekicked the checks as it might have been transient issue.

venkata91 · 2020-06-25T06:40:01Z

No worries @tgravescs Thanks for taking a look. For some reason, these checks keeps failing but it doesn't look to be related to my changes. some cache issue probably?

tgravescs · 2020-06-25T13:13:16Z

yes its possible, I saw some issues on a few other pro although it was with different parts. It maybe early next week before I can review.

venkata91 · 2020-06-25T18:56:51Z

yes its possible, I saw some issues on a few other pro although it was with different parts. It maybe early next week before I can review.

Thanks that should be fine.

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

core/src/main/java/org/apache/spark/SparkFirehoseListener.java

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

tgravescs

At a high level seems like an ok approach to request at least some executors, even if it won't fit all of them, it makes it so you can make progress. Its unfortunate to add yet more tracking of the same thing in multiple places. I wish the allocation manager would move into scheduler, but that is much bigger change.

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

tgravescs · 2020-07-10T13:27:39Z

ok to test

SparkQA · 2020-07-10T17:53:53Z

Test build #125613 has finished for PR 28287 at commit 0784dc3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UnschedulableTaskSetAdded(stageId: Int, stageAttemptId: Int)
case class UnschedulableTaskSetRemoved(stageId: Int, stageAttemptId: Int)
case class SparkListenerUnschedulableTaskSetAdded(
case class SparkListenerUnschedulableTaskSetRemoved(

venkata91 · 2020-07-10T20:20:13Z

Test build #125613 has finished for PR 28287 at commit 0784dc3.

This patch fails Spark unit tests.

This patch merges cleanly.

This patch adds the following public classes (experimental):

case class UnschedulableTaskSetAdded(stageId: Int, stageAttemptId: Int)

case class UnschedulableTaskSetRemoved(stageId: Int, stageAttemptId: Int)

case class SparkListenerUnschedulableTaskSetAdded(

case class SparkListenerUnschedulableTaskSetRemoved(

It seems like the failed tests are unrelated seems like transient issues. @tgravescs Should we rerun it again?

tgravescs · 2020-07-10T20:38:53Z

test this please

SparkQA · 2020-07-11T00:57:55Z

Test build #125637 has finished for PR 28287 at commit 0784dc3.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UnschedulableTaskSetAdded(stageId: Int, stageAttemptId: Int)
case class UnschedulableTaskSetRemoved(stageId: Int, stageAttemptId: Int)
case class SparkListenerUnschedulableTaskSetAdded(
case class SparkListenerUnschedulableTaskSetRemoved(

SparkQA · 2020-07-12T21:43:31Z

Test build #125723 has finished for PR 28287 at commit b4c27fb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UnschedulableTaskSetAdded(stageId: Int, stageAttemptId: Int)
case class UnschedulableTaskSetRemoved(stageId: Int, stageAttemptId: Int)
case class SparkListenerUnschedulableTaskSetAdded(
case class SparkListenerUnschedulableTaskSetRemoved(

venkata91 · 2020-07-20T16:41:45Z

@tgravescs Can you please take a look again? Now that the tests are passing. Thanks

tgravescs

couple minor things otherwise looks good

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

venkata91 · 2020-07-21T22:13:18Z

couple minor things otherwise looks good

Awesome. Thanks. Addressed the review comments. Hopefully the tests should pass fine the first time itself.

…location is enabled and a task becomes unschedulable due to spark's blacklisting feature. In this change, in the case of dynamic allocation is enabled instead of aborting an unschedulable blacklisted task blacklist immediately using the SparkListener pass an event saying UnschedulableBlacklistTaskSubmitted which will be handled by ExecutorAllocationManager and request more executors to schedule the unschedulable blacklisted task. Once the event is sent, we start the abortTimer similar to [SPARK-22148][SPARK-15815] Currently manually tested it in our clusters. Also trying to figure out how to add unit tests.

venkata91 · 2020-07-21T23:59:32Z

@tgravescs it seems like couple of tests are failing. Can you please kick this off again?

SparkQA · 2020-07-22T01:10:34Z

Test build #126282 has finished for PR 28287 at commit 2f019d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-22T01:33:42Z

Test build #126283 has finished for PR 28287 at commit d9f473d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2020-07-22T13:37:38Z

@venkata91 I assume your updates keep doing rebase? If possible its great if you can just do up merges because doing rebases makes seeing the diffs between requested changes much harder.

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

venkata91 · 2020-07-22T17:56:18Z

@venkata91 I assume your updates keep doing rebase? If possible its great if you can just do up merges because doing rebases makes seeing the diffs between requested changes much harder.

Sure, makes sense. Will keep that in mind for the future.

tgravescs

pending Jenkins, looks good.

SparkQA · 2020-07-22T21:10:47Z

Test build #126352 has finished for PR 28287 at commit d6f1e73.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2020-07-23T17:33:48Z

merged to master, thanks @venkata91

venkata91 · 2020-07-23T17:36:41Z

merged to master, thanks @venkata91

Thanks @tgravescs for patiently doing multiple rounds of reviews. :)

…location is enabled and a task becomes unschedulable due to spark's blacklisting feature Ref: LIHADOOP:52649 In this change, when dynamic allocation is enabled instead of aborting immediately when there is an unschedulable taskset due to blacklisting, pass an event saying `SparkListenerUnschedulableTaskSetAdded` which will be handled by `ExecutorAllocationManager` and request more executors needed to schedule the unschedulable blacklisted tasks. Once the event is sent, we start the abortTimer similar to [SPARK-22148][SPARK-15815] to abort in the case when no new executors launched either due to max executors reached or cluster manager is out of capacity. This is an improvement. In the case when dynamic allocation is enabled, this would request more executors to schedule the unschedulable tasks instead of aborting the stage without even retrying upto spark.task.maxFailures times (in some cases not retrying at all). This is a potential issue with respect to Spark's Fault tolerance. Added unit tests both in ExecutorAllocationManagerSuite and TaskSchedulerImplSuite Closes apache#28287 from venkata91/SPARK-31418. Authored-by: Venkata krishnan Sowrirajan <[email protected]> Signed-off-by: Thomas Graves <[email protected]> RB=2048861 BUG=LIHADOOP-52649 G=spark-reviewers R=ekrogen,mshen A=ekrogen

probot-autolabeler bot added the CORE label Apr 22, 2020

venkata91 force-pushed the SPARK-31418 branch 3 times, most recently from 7e12d91 to bf0cf52 Compare June 20, 2020 17:38

venkata91 force-pushed the SPARK-31418 branch from 247a0c3 to 2408ee2 Compare June 24, 2020 18:24

xkrogen reviewed Jun 29, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala Show resolved Hide resolved

xkrogen reviewed Jun 29, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala Outdated Show resolved Hide resolved

xkrogen reviewed Jun 29, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala Outdated Show resolved Hide resolved

mridulm reviewed Jul 4, 2020

View reviewed changes

venkata91 force-pushed the SPARK-31418 branch from eedd16b to 947cc16 Compare July 4, 2020 19:57

tgravescs reviewed Jul 8, 2020

View reviewed changes

venkata91 force-pushed the SPARK-31418 branch from ada01f3 to 0784dc3 Compare July 9, 2020 23:19

venkata91 force-pushed the SPARK-31418 branch from 0784dc3 to b4c27fb Compare July 12, 2020 19:11

tgravescs reviewed Jul 21, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala Outdated Show resolved Hide resolved

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala Outdated Show resolved Hide resolved

venkata91 added 11 commits July 21, 2020 15:14

better log message

213bbef

Added one more test

de05589

fix to avoid the corner case where we might not request any executor

7f0fba1

Handle corner cases

060b37e

addressed Erik's comments

432e3cd

Addressed Mridul's comments

8a00d5b

Addressed tgraves and mridulm comments

7b4140b

Address tgraves review comments

6fc74b3

fix the test changes

ac7b612

Address tgraves review comments

d9f473d

venkata91 force-pushed the SPARK-31418 branch from 2f019d5 to d9f473d Compare July 21, 2020 22:14

tgravescs reviewed Jul 22, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala Outdated Show resolved Hide resolved

Addressed indentation review comments

d6f1e73

tgravescs approved these changes Jul 22, 2020

View reviewed changes

asfgit closed this in e7fb67c Jul 23, 2020

Ngone51 mentioned this pull request Dec 18, 2020

[SPARK-33799][CORE] Handle excluded executors/nodes in ExecutorMonitor #30795

Closed

venkata91 mentioned this pull request Sep 18, 2023

[FLINK-20767][table planner] Support filter push down on nested fields apache/flink#23313

Merged

[SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blacklisting feature. #28287

[SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blacklisting feature. #28287

Uh oh!

Conversation

venkata91 commented Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

venkata91 commented Apr 22, 2020

Uh oh!

tgravescs commented Apr 22, 2020

Uh oh!

venkata91 commented Apr 22, 2020

Uh oh!

venkata91 commented Apr 24, 2020

Uh oh!

tgravescs commented Apr 24, 2020

Uh oh!

venkata91 commented Jun 23, 2020

Uh oh!

venkata91 commented Jun 24, 2020

Uh oh!

tgravescs commented Jun 24, 2020

Uh oh!

venkata91 commented Jun 25, 2020

Uh oh!

tgravescs commented Jun 25, 2020

Uh oh!

venkata91 commented Jun 25, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgravescs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgravescs commented Jul 10, 2020

Uh oh!

SparkQA commented Jul 10, 2020

Uh oh!

venkata91 commented Jul 10, 2020

Uh oh!

tgravescs commented Jul 10, 2020

Uh oh!

SparkQA commented Jul 11, 2020

Uh oh!

SparkQA commented Jul 12, 2020

Uh oh!

venkata91 commented Jul 20, 2020

Uh oh!

tgravescs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

venkata91 commented Jul 21, 2020

Uh oh!

venkata91 commented Jul 21, 2020

Uh oh!

SparkQA commented Jul 22, 2020

Uh oh!

SparkQA commented Jul 22, 2020

Uh oh!

tgravescs commented Jul 22, 2020

Uh oh!

venkata91 commented Apr 22, 2020 •

edited

Loading