[SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend. #17619

timout · 2017-04-12T07:28:12Z

What changes were proposed in this pull request?

MesosCoarseGrainedSchedulerBackend ignored spark.blacklist.enabled configuration property and used hardcoded MAX_SLAVE_FAILURES = 2. The purpose of that fix is to remove that hard-coded behaviour. BlacklistTracker is resposible for blacklist functionality.

@mgummelt
@jsoltren

How was this patch tested?

Unit tests, Manual testing.
This patch is a clean up. That functionality is tested by BlacklistTracker tests.

Author: [email protected]

…listTracker

squito · 2017-12-21T17:58:35Z

Sorry I am only just looking at this now --

I am not so sure this is doing what you think. I think the notion of "task" in MesosCoarseGrainedSchedulerBackend might be something different, its really an "executor" in spark's terminology. Perhaps that code should have some additional comments explaining that. Tasks are still handled in spark's TaskScheduler / TaskSetManager etc.

@mgummelt can you confirm my understanding?

timout · 2017-12-23T03:43:40Z

That does exactly what is supposed to do. And you absolutely right it related to executors.
I am sorry if it is not clear from my previous explanations.
Let us say:
Spark Streaming App - very long running app:
Driver, started by marathon using docker image, schedules (in mesos meaning) executors using
docker images.(net=HOST) (every executor started from docker image on some mesos agent)
So if some recoverable error happens, for instance:
ExecutorLostFailure (executor 40 exited caused by one of the running tasks) Reason: Remote RPC client disassociated...(I do not know how about others but it is relatively often in my env.)
As result the executor will be dead and after 2 failures mesos agent node will be included in MesosCoarseGrainedSchedulerBackend black list and driver will never schedule (in mesos meaning) executor on it. So the app will starve... and notice will not die.
That exactly what happened with my streams apps before that patch.

That patch may be incompatible with master already but i can fix it if needed.

squito · 2017-12-27T20:09:02Z

ok I think I understand. This sounds like the equivalent of some of the existing blacklisting behavior which current only exists on yarn -- when a request is made to yarn, the spark context tells yarn which nodes it has blacklisted:

https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128

however, it still seems like there is a missing piece -- you have to tell mesos which nodes you don't want executors on, right?

I also don't understand why you'd get starvation in your app with this -- shouldn't mesos be requesting executors on other nodes?

anyway, I'm agreeing that something seems wrong with the mesos scheduling when there is a bad node, but I'm not certain this is the right fix, and I just don't know enough about the communication between mesos and spark to say exactly what should be done instead, sorry.
@mgummelt can you comment?

might actually be better to have this discussion on jira, since we're talking about general design, not specifics of this change

andreimaximov · 2017-12-27T20:17:10Z

@squito not sure if this is still the case, but as of 4 months ago starvation could happen if enough failures occurred on each node so the entire cluster ended up blacklisted. Unlikely but possible for a long running app running on a sufficiently small cluster.

squito · 2017-12-27T20:24:33Z

@andreimaximov that is still sort of the case for all cluster managers. You shouldn't get starvation, you should see the app actively fail (SPARK-15865 was the main change, though some small follow-on stuff after that). What else can you do if it seems there is something wrong with every node in your cluster?

But if you're really seeing your app just hang in mesos in that situation -- yeah seems like something needs to be fixed in the spark-mesos interaction. unfortunately I won't have a clear picture of what needs to change without spending more time understanding what is there now ...

hantuzun · 2018-02-15T11:13:16Z

Even though we only run normal Spark jobs this PR is going to fix a case for us as well.

squito

sorry for the long delay @timout and thanks for the ping on this @hantuzun . So I took a deeper look finally, and I understand what is going on here -- I think this is the right change, I just have one change for how to still take advantage of the node blacklist.

There are still some parts of mesos that I don't totally understand which I'd like clarification on @mgummelt :

In yarn, spark has to actively make a request for more resources. In that request, spark tells yarn's resource manager its blacklist, so that the resource manager avoids even offering resources that the app has blacklisted.

IIUC, in mesos the process is more passive. the mesos cluster manager will repeatedly call resourceOffers with whatever its got, and app decides whether it wants it or not. resourceOffers will get called continuously even if the app has already hits its max number of executors and has no interest in more resources.

The app never updates the mesos cluster manager at all about its preferences -- blacklist, locality preferences, resource sizes, etc. I just want to make sure we shouldn't be changing something else to actively tell mesos the app's node blacklist

and maybe less important for this PR, but just something which confused me:

I'm surprised that buildMesosTasks doesn't call declineOffer for offers it doesn't use. If the offer has fewer resources than this spark app can use, shouldn't we tell the mesos that other apps can have it? Then the same thing would go for offers we don't use because of blacklisting.

squito · 2018-02-15T15:26:19Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

      cpus + totalCoresAcquired <= maxCores &&
      mem <= offerMem &&
      numExecutors() < executorLimit &&
-      slaves.get(slaveId).map(_.taskFailures).getOrElse(0) < MAX_SLAVE_FAILURES &&


rather than just deleting this, we should replace it with a check to scheduler.nodeBlacklist(), like the YarnScheduler is doing here:

spark/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala

Line 128 in 44e20c4

val nodeBlacklist: Set[String] = scheduler.nodeBlacklist()

@squito regarding 3. That part of code launches aggressively tasks on matched offers.
The remainder of the offer is implicitly declined https://mail-archives.apache.org/mod_mbox/mesos-user/201507.mbox/%[email protected]%3E
In addition if an offer cannot be used to launch a task is declined later on here:
https://github.com/apache/spark/blob/83c008762af444eef73d835eb6f506ecf5aebc17/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L440-#L444

thanks @skonto , also I realized there was more explicit calls to declineOffer than I thought initially after a closer read of the code. btw @IgorBerman has opened an updated version of this PR here #20640 -- would appreciate a review over there

IgorBerman · 2018-02-19T12:56:22Z

+1 here, we are running spark core jobs but with long running driver on Mesos. Sometimes executors fail which is normal(one of the reasons is temp port conflict). With time - less and less executors are valid for the driver, so it creates situation where Mesos cluster has free resource but no-one uses them

squito · 2018-02-20T22:01:46Z

for anyone watching this: @IgorBerman submitted an updated version of this here #20640 which I plan to merge unless there are any objections.

AmplabJenkins · 2018-06-09T00:22:34Z

Can one of the admins verify this patch?

swevrywhere · 2019-05-01T16:46:29Z

Sorry, I'm new to GH. Does "closed" mean that it's been merged or that it was abandoned without checking in? If it wasn't checked in, what's the holdup? Was this addressed elsewhere?

antiout added 3 commits April 12, 2017 02:17

Removed hardcoded blacklist functionality, must be controled by Black…

70ed95a

…listTracker

Merge remote-tracking branch 'upstream/master'

078634e

Removed hardcoded blacklist functionality, must be controled by Black…

df2f319

…listTracker

squito reviewed Feb 15, 2018

View reviewed changes

IgorBerman mentioned this pull request Feb 20, 2018

[SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend #20640

Closed

HyukjinKwon mentioned this pull request Jul 16, 2018

[INFRA] Close stale PR #21781

Closed

asfgit closed this in 1a4fda8 Jul 19, 2018

[SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend. #17619

[SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend. #17619

Uh oh!

Conversation

timout commented Apr 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

squito commented Dec 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timout commented Dec 23, 2017

Uh oh!

squito commented Dec 27, 2017

Uh oh!

andreimaximov commented Dec 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squito commented Dec 27, 2017

Uh oh!

hantuzun commented Feb 15, 2018

Uh oh!

squito left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

squito Feb 15, 2018

Choose a reason for hiding this comment

Uh oh!

skonto Feb 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

squito Feb 20, 2018

Choose a reason for hiding this comment

Uh oh!

IgorBerman commented Feb 19, 2018

Uh oh!

squito commented Feb 20, 2018

Uh oh!

AmplabJenkins commented Jun 9, 2018

Uh oh!

swevrywhere commented May 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

timout commented Apr 12, 2017 •

edited

Loading

squito commented Dec 21, 2017 •

edited

Loading

andreimaximov commented Dec 27, 2017 •

edited

Loading

squito left a comment •

edited

Loading

skonto Feb 20, 2018 •

edited

Loading