-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18741][STREAMING] Reuse or clean-up SparkContext in streaming tests #16174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Do you mean the recent failures? It was fixed by #16105. |
|
|
||
| @Before | ||
| public void setUp() { | ||
| SparkContext$.MODULE$.stopActiveContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just call SparkContext.stopActiveContext() from Java? I see static methods in the bytecode for things like jarOfObject.
The line is also indented incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intellij and SBT were both complaining. So I did this.
I'll try to rebuild and see what happens.
| } | ||
| } | ||
|
|
||
| private[spark] def stopActiveContext(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I'm not a big fan of the approach you're taking here: calling this method before running tests. That feels like a sledgehammer to fix flaky tests. I think it would be better for test code to be more careful about cleaning after itself. Kinda like most tests in spark-core use LocalSparkContext to more or less automatically do that without the need for these methods.
The ReuseableSparkContext trait you have is a step in that direction. If you make sure all needed streaming tests are using it, and keep this state within that class, I think it would be a better change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I don't like stopping SparkContext before running tests, either. It will hide the mistakes in other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be unnecessary with more carefully written tests? that always close the context etc when done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that the approach is far from subtle.
It seems that #16105 fixes this (also on my branch). I am closing this for now. Thanks for the feedback.
|
Test build #69736 has finished for PR 16174 at commit
|
What changes were proposed in this pull request?
Tests in Spark Streaming currently create a
SparkContextfor each test, and sometimes do not clean-up afterwards. This is resource intensive and it can lead to unneeded test failures (flakyness) when park.driver.allowMultipleContexts is disabled (this happens when the order of tests changes).This PR makes most test re-use a
SparkContext. For tests that have to create a new context (for instanceCheckpointSuite) we make sure that no activeSparkContextexists before the test, and that the createdSparkContextis cleaned up afterwards. I have refactored theTestSuiteBaseinto two classesTestSuiteBaseand a parent classReusableSparkContext; this to makeSparkContextmanagement relatively straightforward for most tests.I have done a simple very unscientific benchmark (n=1), and streaming tests with this patch took 212 seconds and streaming tests without this patch took 252 seconds.
How was this patch tested?
The patch only covers test code.