-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8625] [Core] Propagate user exceptions in tasks back to driver #7014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
OK to test |
|
Test build #961 has finished for PR 7014 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: multiline method defs should have each arg on a separate line, all indented 4 extra spaces. Also include the return type (I know it wasn't there before but as long as you're touching this ...)
private[scheduler] def handleTaskSetFailed(
taskSet: TaskSet,
reason: String,
exception: Option[Throwable] = None): Unit = {|
@tomwhite this looks great! thanks so much for working on this, I think it will be a really good addition. I left some minor style comments in addition to the ones the automatic checker found, but overall this seems close. |
|
@squito thanks for the review! I've addressed your comments in a new commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: order alphabetically {File, NotSerializableException}
|
a couple of minor comments, aside from that just waiting to see if the tests pass |
|
Test build #965 has finished for PR 7014 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this break pattern matching on this class? Is it possible to add an unapply method to preserve matching for the old signature? I'm rusty on the details of backwards compatibility for case classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwendell You're right - it will break pattern matching on the class. My understanding is that an unapply method won't help since pattern matching won't use it (they are for user code).
Case classes don't play well with binary compatibility, it seems. To do this compatibly, we'd have to have another case class, called ExceptionFailureWithCause say, and a trait that both it and ExceptionFailure extend with the common fields. Then everywhere that handles ExceptionFailure would also have to handle ExceptionFailureWithCause.
Having said all that, this class is marked @DeveloperApi so it's within the contract to change it. The fullStackTrace field was added last November, for example. I can understand the general reluctance to change code even if it is marked as being for developers only, but it's not clear if the workaround here to preserve binary compatibility is worth the complexity it adds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're definitely out of luck for binary compatibility, but I think @pwendell just wanted to preserve source compatibility (ie. maybe users will need to recompile, but they won't need to change their code at all).
However, I don't think that is possible either. You would need to have another method like def unapply(ef: ExceptionFailure): Option[(String, String, Array[StackTraceElement], String, Option[TaskMetrics])] -- ie., exactly the same as the built-in unapply, but without the final Option[Throwable] in the return type. But that isn't legal overloading -- it has the same set of arguments as the built-in unapply, just a different return type.
Is there another way around this I'm not seeing? I agree we shouldn't change things willy-nilly just b/c its @DeveloperApi, but IMO this change is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - if it's not possible to preserve matching, let's just change it. Adding fields will only break matching and shouldn't hut users who just access the fields. We might later on add some docs to suggest avoiding use of matching on developer API's for this reason (no need to block this though).
|
@pwendell are you OK with the change to |
|
ping @pwendell |
|
I think the compatibility is okay, but two other quick questions:
|
|
@pwendell I'm not seeing anything concerning in the DAGScheduler changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the wrong place for this doc on preserveCause. That should go on the constructor that has it. Here I'd just doc the exception (and add an explanation of which exception is corresponds to as Patrick asked)
|
I played with this locally a bit, and I think that actually defaulting ... except for the default on So I'm leaning towards just putting in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after "{" (or use parens instead, with no spaces around them) (Realize that this was there before, but might as well fix it now)
|
Scheduler changes LGTM, subject to @squito's suggestion |
|
Thanks for all the feedback @pwendell, @squito, and @kayousterhout. I've addressed all your points and updated this PR. @squito Regarding |
|
Test build #1082 has finished for PR 7014 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that an exception does not throw a NotSerializableException but cannot be deserialized on the other side? This could be due to, say, class loader issues or other issues not caught during serialization time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(By the way, the particular exception I'm thinking of is Scala's MatchError, which actually includes the object which failed to match.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely possible. I've written a test for this case, and implemented a fix which uses a wrapper for the exception that gracefully falls back if deserialization of the exception fails.
|
Test build #1238 has finished for PR 7014 at commit
|
|
Looks good from my end. |
|
@tomwhite there is a legit failure here, looks like you need to merge w/ master and fix a compile error |
a14f282 to
d531c93
Compare
|
I rebased this on master. Is there a way to get this to be retested? |
|
Jenkins, retest this please |
|
I triggered Jenkins. |
|
Test build #1401 has finished for PR 7014 at commit
|
|
@tomwhite still looks like a real compile error: |
This allows clients to retrieve the original exception from the cause field of the SparkException that is thrown by the driver. If the original exception is not in fact Serializable then it will not be returned, but the message and stacktrace will be. (All Java Throwables implement the Serializable interface, but this is no guarantee that a particular implementation can actually be serialized.)
last failure for a task, and add a test for this case. Remove the default of None for the failure exception. Address nits.
d531c93 to
4c884d0
Compare
|
Thanks. The error came from new code that was committed in SPARK-4352. I've rebased and fixed the offending line. |
|
Test build #1404 has finished for PR 7014 at commit
|
|
Test build #1405 has finished for PR 7014 at commit
|
This allows clients to retrieve the original exception from the cause field of the SparkException that is thrown by the driver. If the original exception is not in fact Serializable then it will not be returned, but the message and stacktrace will be. (All Java Throwables implement the Serializable interface, but this is no guarantee that a particular implementation can actually be serialized.) Author: Tom White <[email protected]> Closes #7014 from tomwhite/propagate-user-exceptions. (cherry picked from commit 2e68066) Signed-off-by: Imran Rashid <[email protected]>
|
merged to master & 1.5 thanks @tomwhite ! |
This allows clients to retrieve the original exception from the cause field of the SparkException that is thrown by the driver. If the original exception is not in fact Serializable then it will not be returned, but the message and stacktrace will be. (All Java Throwables implement the Serializable interface, but this is no guarantee that a particular implementation can actually be serialized.) Author: Tom White <[email protected]> Closes apache#7014 from tomwhite/propagate-user-exceptions.
This allows clients to retrieve the original exception from the
cause field of the SparkException that is thrown by the driver.
If the original exception is not in fact Serializable then it will
not be returned, but the message and stacktrace will be. (All Java
Throwables implement the Serializable interface, but this is no
guarantee that a particular implementation can actually be
serialized.)