[SPARK-11163] Remove unnecessary addPendingTask calls. #9154

kayousterhout · 2015-10-16T23:35:10Z

This commit removes unnecessary calls to addPendingTask in
TaskSetManager.executorLost. These calls are unnecessary: for
tasks that are still pending and haven't been launched, they're
still in all of the correct pending lists, so calling addPendingTask
has no effect. For tasks that are currently running (which may still be
in the pending lists, depending on how they were scheduled), we call
addPendingTask in handleFailedTask, so the calls at the beginning
of executorLost are redundant.

I think these calls are left over from when we re-computed the locality
levels in addPendingTask; now that we call recomputeLocality separately,
I don't think these are necessary.

Now that those calls are removed, the readding parameter in addPendingTask
is no longer necessary, so this commit also removes that parameter.

@markhamstra can you take a look at this?

cc @vanzin

kayousterhout · 2015-10-16T23:57:04Z

@CodingCat @cmccabe you both also modified this code recently...does this change look reasonable to you?

vanzin · 2015-10-17T00:32:11Z

they're still in all of the correct pending lists, so calling addPendingTask has no effect

That's not really accurate, is it? TaskSetManager.executorLost is called by TaskSchedulerImpl.executorLost after it has updated the list of available executors and which hosts they live on; so if there are pending tasks for the executor and / or the host, they're currently in the wrong list, and the addPendingTask calls fix that.

SparkQA · 2015-10-17T02:12:33Z

Test build #43868 has finished for PR 9154 at commit 26aa899.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

CodingCat · 2015-10-17T03:02:52Z

@vanzin I think the key point is at

spark/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

Line 821 in 8ac71d6

    
           handleFailedTask(tid, TaskState.FAILED, ExecutorLostFailure(info.executorId, isNormalExit))

and in handleFailedTask, we have called addPendingTask

spark/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

Line 652 in 8ac71d6

def handleFailedTask(tid: Long, state: TaskState, reason: TaskEndReason) {

My suggestion is that maybe we shall remove readding parameter in addPendingTask, as the removed lines in this patch are the only places where we are not using the default values...

vanzin · 2015-10-17T03:08:54Z

@CodingCat but handleFailedTask only calls addPendingTask for, as the name suggests, failed tasks. This code is re-scheduling pending tasks that haven't even begun execution yet.

CodingCat · 2015-10-17T03:14:12Z

errr....do we distinguish running and pending in TaskInfo?....I really need to update my knowledge about scheduler code....

if no, the handleFailedTask is actually called within

spark/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

Line 816 in 8ac71d6

for ((tid, info) <- taskInfos if info.running && info.executorId == execId) {

what it does is we re-add all tasks whose executorId equals to the ID of this just failed executor

CodingCat · 2015-10-17T03:23:00Z

the code block starting from https://github.com/apache/spark/pull/9154/files#diff-bad3987c83bd22d46416d3dd9d208e76R790 seems fishy to me....

are we updating copiesRunning for twice?

nvm, no.....

kayousterhout · 2015-10-19T05:38:03Z

@vanzin my understanding is that addPendingTask looks through each preferred location for that task, and adds the task to the lists for (1) the list of executors corresponding to that location, (2) the list of hosts corresponding to that location and (3) the list of racks corresponding to that location. For tasks that are not yet running, it seems like this calling this in executorLost should have no effect: the only difference from the previous time that addPendingTask was called (before the executor was lost) is that, before the executor was lost, the task would have been added to one additional list for the lost executor.

What do you mean about tasks being in the wrong list? I see that there will be an entry (with a list of pending tasks) in pendingTaskForExecutor corresponding to an executor that is dead, but calling "addPendingTasks" never removes anything from any mappings, so that call doesn't fix that problem.

vanzin · 2015-10-19T17:12:15Z

What do you mean about tasks being in the wrong list?

Let's say you have a task whose locality preference includes "executor 1". Now "executor 1" dies. That task would be in the pendingTasksForExecutor list for that executor, and now that's wrong and it should be fixed. My understanding is that addPendingTask is achieving that here.

kayousterhout · 2015-10-19T17:20:38Z

What do you mean by "fixed"? As I said in my earlier comment, I don't see how addPendingTasks fixes that, since addPendingTask doesn't remove anything from any mappings.

kayousterhout · 2015-10-19T17:28:55Z

To give a specific example, suppose task t1 has preferred locations on executor e1 (on host h1), e2 (also on host h1) and e3 (on host h2).

The data structures will look like:

pendingTasksForExecutor: {e1: t1, e2: t1, e3: t1}
pendingTasksForHost: {h1: t1, h2: t1}

We've agreed that the "addPendingTask" call is irrelevant for tasks that are currently running (because addPendingTask is called for any running tasks in handleFailedTask), so let's say t1 hasn't been run yet.

Now suppose executor e2 dies. We never remove any entries from pendingTasksForExecutor or pendingTasksForHost (not in addPendingTask, nor anyplace else, as far as I can tell; we still won't schedule things on the died executor, because the TaskSetManager will never get a resource offer for it). addPendingTask will "readd" entries for each of t1's preferred locations (for ExecutorCacheTaskLocations, we even re-add to the list for the lost executor; for HdfsCacheTaskLocations, we won't re-add it to the list for the lost executor, but in either case it doesn't matter, because the entry is already in the list anyway). Since all of these locations were already added above, so this call has no effect.

Which part of this reasoning do you think is incorrect?

vanzin · 2015-10-19T17:37:09Z

Ok, I see. I think I was under the impression that the code was actually removing tasks from old pending lists, but that doesn't seem to be the case. That also looks like a bug in itself - what's the point of keeping a pending list for a dead executor? I can see keeping a pending list for empty hosts because of dynamic allocation, but even that sounds fishy (the per-host pending list can be re-created if an executor is ever started again on that host).

So you're right, after reading the code again a few more times it does seem addPendingTask is redundant.

kayousterhout · 2015-10-19T17:42:07Z

Yeah you're right that there doesn't seem to be any point in keeping entries for dead executors. I'm guessing the assumption is that all this state is cleaned up after the stage finishes anyway, so it's not a big deal to have some extra state hanging around. But in any case, that's a separate bug.

@markhamstra if you have time to look at this, it would be helpful to have one more set of eyes!

markhamstra · 2015-10-20T18:01:12Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

After this diff, nothing is using the misspelled readding parameter anymore, so we may as well drop that from addPendingTask.

Ah, now I see that @CodingCat had the same suggestion.

markhamstra · 2015-10-20T19:00:09Z

what's the point of keeping a pending list for a dead executor?

I haven't done enough code spelunking to know, but I'm wondering whether the pending list for a dead executor may become useful for "I'm not dead" executors. [An extended Monty Python quote is very tempting, and surprisingly relevant, at this point.] It's entirely possible for executors to miss heartbeats (usually because they have ingested some bad user code that is now consuming most of their CPU resources) and appear dead to the master, only to arise from the dead a short time later (and potentially promptly miss another heartbeat.) If the before-you-were-dead pending task lists are or should be reattached to the Lazurus executors, then there potentially is a point to keeping them around.

On the other hand, the "Ah, thank you very much" approach of clubbing the "not dead" is often what ends up needing to be done manually for these executors that refuse to die, so trying to complete their resurrection and make them useful again may be a fool's errand regardless.

markhamstra · 2015-10-20T19:04:26Z

This LGTM.

This commit removes unnecessary calls to addPendingTask in TaskSetManager.executorLost. These calls are unnecessary: for tasks that are still pending and haven't been launched, they're still in all of the correct pending lists, so calling addPendingTask has no effect. For tasks that are currently running (which may still be in the pending lists, depending on how they were scheduled), we call addPendingTask in handleFailedTask, so the calls at the beginning of executorLost are redundant. I think these calls are left over from when we re-computed the locality levels in addPendingTask; now that we call recomputeLocality separately, I don't think these are necessary.

SparkQA · 2015-10-21T20:03:22Z

Test build #44079 has finished for PR 9154 at commit 4394f00.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout · 2015-10-21T21:06:11Z

Jenkins, retest this please

SparkQA · 2015-10-21T23:03:59Z

Test build #44093 has finished for PR 9154 at commit 4394f00.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout mentioned this pull request Oct 16, 2015

[SPARK-10622] [core] [yarn] Differentiate dead from "mostly dead" executors. #8887

Closed

markhamstra reviewed Oct 20, 2015
View reviewed changes

kayousterhout force-pushed the SPARK-11163 branch from 26aa899 to 4394f00 Compare October 21, 2015 18:21

asfgit closed this in 3535b91 Oct 22, 2015

kayousterhout deleted the SPARK-11163 branch April 11, 2017 22:11

[SPARK-11163] Remove unnecessary addPendingTask calls. #9154

[SPARK-11163] Remove unnecessary addPendingTask calls. #9154

Uh oh!

Conversation

kayousterhout commented Oct 16, 2015

Uh oh!

kayousterhout commented Oct 16, 2015

Uh oh!

vanzin commented Oct 17, 2015

Uh oh!

SparkQA commented Oct 17, 2015

Uh oh!

CodingCat commented Oct 17, 2015

Uh oh!

vanzin commented Oct 17, 2015

Uh oh!

CodingCat commented Oct 17, 2015

Uh oh!

CodingCat commented Oct 17, 2015

nvm, no.....

Uh oh!

kayousterhout commented Oct 19, 2015

Uh oh!

vanzin commented Oct 19, 2015

Uh oh!

kayousterhout commented Oct 19, 2015

Uh oh!

kayousterhout commented Oct 19, 2015

Uh oh!

vanzin commented Oct 19, 2015

Uh oh!

kayousterhout commented Oct 19, 2015

Uh oh!

markhamstra Oct 20, 2015

Choose a reason for hiding this comment

Uh oh!

markhamstra Oct 20, 2015

Choose a reason for hiding this comment

Uh oh!

markhamstra commented Oct 20, 2015

Uh oh!

markhamstra commented Oct 20, 2015

Uh oh!

SparkQA commented Oct 21, 2015

Uh oh!

kayousterhout commented Oct 21, 2015

Uh oh!

SparkQA commented Oct 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants