[SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task. #15181

squito · 2016-09-21T16:01:33Z

What changes were proposed in this pull request?

In TaskResultGetter, enqueueFailedTask currently deserializes the result
as a TaskEndReason. But the type is actually more specific, its a
TaskFailedReason. This just leads to more blind casting later on – it
would be more clear if the msg was cast to the right type immediately,
so method parameter types could be tightened.

How was this patch tested?

Existing unit tests via jenkins. Note that the code was already performing a blind-cast to a TaskFailedReason before in any case, just in a different spot, so there shouldn't be any behavior change.

In TaskResultGetter, enqueueFailedTask currently deserializes the result as a TaskEndReason. But the type is actually more specific, its a TaskFailedReason. This just leads to more blind casting later on – it would be more clear if the msg was cast to the right type immediately, so method parameter types could be tightened.

andrewor14

LGTM. I wanted to confirm with one part but the rest looks great.

andrewor14 · 2016-09-21T17:57:01Z

core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala

          try {
            if (serializedData != null && serializedData.limit() > 0) {
-              reason = serializer.get().deserialize[TaskEndReason](
+              reason = serializer.get().deserialize[TaskFailedReason](


Is this totally safe? Can you point me to the code where we serialize this?

Good question. its sent as an executor status update in various failure handling scenarios here, though its not so nicely grouped that its super-obvious they always go together, unfortunately:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L357

I walked through the various cases and convinced myself they went together. I made one more minor improvement to make that more clear, a couple of .toTaskEndReason methods could be renamed to .toTaskFailedReason (d991a3c).

Really, though there were two other things that convinced me this was OK:

the only other possible thing a TaskEndReason could be is Success. But you'll notice that Success is never serialized as the msg -- its just implicit with TaskState.FINISHED, and then the Success part just gets dropped in on the driver-side here:

spark/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

Line 669 in 248922f

sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)

The TaskEndReason was already getting blind-casted before anyway:

spark/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

Line 710 in 248922f

reason.asInstanceOf[TaskFailedReason].toErrorString

This was already in the same thread, without any try/ catch etc. checks anyway. Well, except for this:

spark/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

Line 701 in 248922f

if (info.failed || info.killed) {

which I decided wasn't a concern mostly b/c of reason (1) again.

There's more cleanup we could do here:

TaskState.isFailed is unused, instead there is a different hard-coded check here:

spark/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

Line 376 in 248922f

} else if (Set(TaskState.FAILED, TaskState.KILLED, TaskState.LOST).contains(state)) {

We could more directly fix the weak coupling between TaskState and TaskEndReason by replacing ExecutorBackend.statusUpdate with more particular methods, so that the coupling is more explicit.

and probably other related things too. If you have a strong preference, I could address those here, but figured it was best to just get in this small cleanup I felt confident in for now.

great. It's always better to be more explicit

SparkQA · 2016-09-21T18:07:12Z

Test build #65723 has finished for PR 15181 at commit 0bd782b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-21T20:55:07Z

Test build #65727 has finished for PR 15181 at commit d991a3c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-09-21T21:45:00Z

Thanks for the clean up. I'm merging this into master. Because this patch touches multiple files in the critical scheduler code I'm hesitant on back porting this.

squito mentioned this pull request Sep 21, 2016

[SPARK-8425][CORE] Application Level Blacklisting #14079

Closed

andrewor14 approved these changes Sep 21, 2016

View reviewed changes

more cleanup of TaskEndReason -> TaskFailedReason

d991a3c

asfgit closed this in 9fcf1c5 Sep 21, 2016

squito deleted the SPARK-17623 branch September 22, 2016 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task. #15181

[SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task. #15181

Uh oh!

squito commented Sep 21, 2016

Uh oh!

andrewor14 left a comment

Uh oh!

andrewor14 Sep 21, 2016

Uh oh!

squito Sep 21, 2016

Uh oh!

andrewor14 Sep 21, 2016

Uh oh!

SparkQA commented Sep 21, 2016

Uh oh!

SparkQA commented Sep 21, 2016

Uh oh!

andrewor14 commented Sep 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task. #15181

[SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task. #15181

Uh oh!

Conversation

squito commented Sep 21, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

andrewor14 left a comment

Choose a reason for hiding this comment

Uh oh!

andrewor14 Sep 21, 2016

Choose a reason for hiding this comment

Uh oh!

squito Sep 21, 2016

Choose a reason for hiding this comment

Uh oh!

andrewor14 Sep 21, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 21, 2016

Uh oh!

SparkQA commented Sep 21, 2016

Uh oh!

andrewor14 commented Sep 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants