-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task. #15181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In TaskResultGetter, enqueueFailedTask currently deserializes the result as a TaskEndReason. But the type is actually more specific, its a TaskFailedReason. This just leads to more blind casting later on – it would be more clear if the msg was cast to the right type immediately, so method parameter types could be tightened.
andrewor14
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I wanted to confirm with one part but the rest looks great.
| try { | ||
| if (serializedData != null && serializedData.limit() > 0) { | ||
| reason = serializer.get().deserialize[TaskEndReason]( | ||
| reason = serializer.get().deserialize[TaskFailedReason]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this totally safe? Can you point me to the code where we serialize this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. its sent as an executor status update in various failure handling scenarios here, though its not so nicely grouped that its super-obvious they always go together, unfortunately:
I walked through the various cases and convinced myself they went together. I made one more minor improvement to make that more clear, a couple of .toTaskEndReason methods could be renamed to .toTaskFailedReason (d991a3c).
Really, though there were two other things that convinced me this was OK:
-
the only other possible thing a
TaskEndReasoncould be isSuccess. But you'll notice thatSuccessis never serialized as the msg -- its just implicit withTaskState.FINISHED, and then theSuccesspart just gets dropped in on the driver-side here:sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) -
The TaskEndReason was already getting blind-casted before anyway:
reason.asInstanceOf[TaskFailedReason].toErrorString
This was already in the same thread, without any try/ catch etc. checks anyway. Well, except for this:if (info.failed || info.killed) {
which I decided wasn't a concern mostly b/c of reason (1) again.
There's more cleanup we could do here:
TaskState.isFailedis unused, instead there is a different hard-coded check here:} else if (Set(TaskState.FAILED, TaskState.KILLED, TaskState.LOST).contains(state)) { - We could more directly fix the weak coupling between TaskState and TaskEndReason by replacing
ExecutorBackend.statusUpdatewith more particular methods, so that the coupling is more explicit.
and probably other related things too. If you have a strong preference, I could address those here, but figured it was best to just get in this small cleanup I felt confident in for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great. It's always better to be more explicit
|
Test build #65723 has finished for PR 15181 at commit
|
|
Test build #65727 has finished for PR 15181 at commit
|
|
Thanks for the clean up. I'm merging this into master. Because this patch touches multiple files in the critical scheduler code I'm hesitant on back porting this. |
What changes were proposed in this pull request?
In TaskResultGetter, enqueueFailedTask currently deserializes the result
as a TaskEndReason. But the type is actually more specific, its a
TaskFailedReason. This just leads to more blind casting later on – it
would be more clear if the msg was cast to the right type immediately,
so method parameter types could be tightened.
How was this patch tested?
Existing unit tests via jenkins. Note that the code was already performing a blind-cast to a TaskFailedReason before in any case, just in a different spot, so there shouldn't be any behavior change.