Skip to content

Conversation

@squito
Copy link
Contributor

@squito squito commented Sep 21, 2016

What changes were proposed in this pull request?

In TaskResultGetter, enqueueFailedTask currently deserializes the result
as a TaskEndReason. But the type is actually more specific, its a
TaskFailedReason. This just leads to more blind casting later on – it
would be more clear if the msg was cast to the right type immediately,
so method parameter types could be tightened.

How was this patch tested?

Existing unit tests via jenkins. Note that the code was already performing a blind-cast to a TaskFailedReason before in any case, just in a different spot, so there shouldn't be any behavior change.

In TaskResultGetter, enqueueFailedTask currently deserializes the result
as a TaskEndReason. But the type is actually more specific, its a
TaskFailedReason. This just leads to more blind casting later on – it
would be more clear if the msg was cast to the right type immediately,
so method parameter types could be tightened.
Copy link
Contributor

@andrewor14 andrewor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wanted to confirm with one part but the rest looks great.

try {
if (serializedData != null && serializedData.limit() > 0) {
reason = serializer.get().deserialize[TaskEndReason](
reason = serializer.get().deserialize[TaskFailedReason](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this totally safe? Can you point me to the code where we serialize this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. its sent as an executor status update in various failure handling scenarios here, though its not so nicely grouped that its super-obvious they always go together, unfortunately:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L357

I walked through the various cases and convinced myself they went together. I made one more minor improvement to make that more clear, a couple of .toTaskEndReason methods could be renamed to .toTaskFailedReason (d991a3c).

Really, though there were two other things that convinced me this was OK:

  1. the only other possible thing a TaskEndReason could be is Success. But you'll notice that Success is never serialized as the msg -- its just implicit with TaskState.FINISHED, and then the Success part just gets dropped in on the driver-side here:

    sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)

  2. The TaskEndReason was already getting blind-casted before anyway:

    reason.asInstanceOf[TaskFailedReason].toErrorString

    This was already in the same thread, without any try/ catch etc. checks anyway. Well, except for this:

which I decided wasn't a concern mostly b/c of reason (1) again.

There's more cleanup we could do here:

  • TaskState.isFailed is unused, instead there is a different hard-coded check here:
    } else if (Set(TaskState.FAILED, TaskState.KILLED, TaskState.LOST).contains(state)) {
  • We could more directly fix the weak coupling between TaskState and TaskEndReason by replacing ExecutorBackend.statusUpdate with more particular methods, so that the coupling is more explicit.

and probably other related things too. If you have a strong preference, I could address those here, but figured it was best to just get in this small cleanup I felt confident in for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great. It's always better to be more explicit

@SparkQA
Copy link

SparkQA commented Sep 21, 2016

Test build #65723 has finished for PR 15181 at commit 0bd782b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2016

Test build #65727 has finished for PR 15181 at commit d991a3c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Thanks for the clean up. I'm merging this into master. Because this patch touches multiple files in the critical scheduler code I'm hesitant on back porting this.

@asfgit asfgit closed this in 9fcf1c5 Sep 21, 2016
@squito squito deleted the SPARK-17623 branch September 22, 2016 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants