-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10248] [core] track exceptions in dagscheduler event loop in tests #8466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change is so I could trigger an exception (by passing in null), seems like a good change in any case.
|
Test build #41634 has finished for PR 8466 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test used to just log an exception on cancel(jobId). I'm not sure what it was supposed to be testing before. I made the minimal change here, by capturing the exception and checking it. But maybe cancel(jobId) should not be creating an exception? Is the idea that if you submit a job with no partitions, it will immediately stop? That way, if you try to cancel it, you'd just hit this case with a harmless logDebug? That suggests we should change handleJobSubmitted to handle empty jobs, the same way we handle it in submitMissingTasks for stages with no partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary, but was helpful for some debugging, figured it couldn't hurt.
|
Test build #41635 has finished for PR 8466 at commit
|
|
Hey, so is the conclusion that the DAGScheduler actually did pass the exception to JobListeners, but we weren't listening for it in our test suite? I thought the initial problem was that some exception got swallowed before ever being passed up. |
Conflicts: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
|
Hi @mateiz sorry I totally missed your comment a while back. so I think poorly explained the issue in the first place. Exceptions are already making it to the joblisteners to the best of my knowledge (as part of The issue that I'm trying to address is just confusing behavior in the tests. Before this change, if there is an exception inside the event processing loop, then all that happens is some internal state is set. Furthermore, the event process loop stops the The net effect is that when a test doesn't behave the way you expect it to, its hard to unravel why. Unless tests include The only goal here is just to make those issues much more immediately obvious. For these tests, as soon as there is an exception in the event process loop, throw the exception and fail the test. |
|
Test build #43477 has finished for PR 8466 at commit
|
|
LGTM. |
|
retest this please |
|
LGTM pending tests |
|
Test build #47556 has finished for PR 8466 at commit
|
|
retest this please |
|
Test build #47684 has finished for PR 8466 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a related change? If we add another TaskEndReason in the future we might forget to add it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, I think the compiler will report uncompleted matching as a warning and the build will fail. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was needed just for the added test case -- I needed a way to intentionally throw an error inside the event loop, and with this change if event.reason is null, you get an exception.
And yeah, if you add another type but dont' include it in this match, you'll get a fatal warning in compilation since its a sealed trait.
|
I don't think that test failure is related at all, but I also dont' see why that test would be flaky :( |
|
Jenkins, retest this please |
|
Test build #47861 has finished for PR 8466 at commit
|
|
retest this please |
|
Test failure is unrelated. Merging into master 1.6. |
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception. This was suggested by mateiz on #7699. It may have already turned up an issue in "zero split job". Author: Imran Rashid <[email protected]> Closes #8466 from squito/SPARK-10248. (cherry picked from commit 38d9795) Signed-off-by: Andrew Or <[email protected]>
|
Test build #47883 has finished for PR 8466 at commit
|
DAGSchedulerEventLoopnormally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.This was suggested by @mateiz on #7699. It may have already turned up an issue in "zero split job".