-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-4183] Close transport-related resources between SparkContexts #3053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most significant leak, we'd leak one event loop per SparkContext
|
Test build #22706 has started for PR 3053 at commit
|
|
Test build #22707 has started for PR 3053 at commit
|
|
Test FAILed. |
|
Test build #22708 has started for PR 3053 at commit
|
|
Test build #22706 has finished for PR 3053 at commit
|
|
Test PASSed. |
|
Test build #22708 has finished for PR 3053 at commit
|
|
Test FAILed. |
|
Test build #22723 has started for PR 3053 at commit
|
|
Test build #22726 has started for PR 3053 at commit
|
|
Test build #22723 has finished for PR 3053 at commit
|
|
Test FAILed. |
|
Test build #22730 has started for PR 3053 at commit
|
|
Test FAILed. |
|
Test build #22726 has finished for PR 3053 at commit
|
|
Test FAILed. |
|
Test build #22735 has started for PR 3053 at commit
|
|
Test build #22730 has finished for PR 3053 at commit
|
|
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why print instead of logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all for my current jenkins debugging, where I can see the printlns but not logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have SSH access to AMPLab Jenkins, you can download per-test log dumps (see the email that I sent to our Spark list for instructions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea if I have ssh access, this seemed easier. Man, I caught a lot of leaky tests!
|
Test build #22735 timed out for PR 3053 at commit |
|
Test FAILed. |
|
Test build #22749 has started for PR 3053 at commit
|
|
Test build #22750 has started for PR 3053 at commit
|
|
Test build #22755 has started for PR 3053 at commit
|
|
Test build #22749 has finished for PR 3053 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwendell maybe take a look at the changes here? No one in particular is responsible for this test, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the shutdown code isn't in a try-finally block or after() function, I guess it's still possible that it won't be called when tests fail. So, one test failure still might trigger spurious failures of later tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is less of a concern, though, since I guess we're more worried about leaked resources from a passing test causing a subsequent test to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, either would be better, but since each test creates a different set of things, and sometimes calling stop() throws an exception, I decided to just do it in the non-failing case for this PR.
|
Test build #22758 has started for PR 3053 at commit
|
|
Test build #22758 has finished for PR 3053 at commit
|
|
Test PASSed. |
|
Test build #22771 has started for PR 3053 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdas I have reverted the changes, though you can see from the fact that we have to do this that the API is not clean.
|
Test build #22772 has started for PR 3053 at commit
|
|
LGTM. Feel free to merge after tests pass. |
|
Test build #22771 has finished for PR 3053 at commit
|
|
Test FAILed. |
|
Jenkins, retest this please. |
|
Test build #22775 has started for PR 3053 at commit
|
|
Test build #22772 has finished for PR 3053 at commit
|
|
Test PASSed. |
|
Test build #22775 has finished for PR 3053 at commit
|
|
Test PASSed. |
|
Okay great - let's try this again. I'll merge it! |
…fter calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - #3053 (diff) - #3121 (comment) Author: Josh Rosen <[email protected]> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. (cherry picked from commit 7b41b17) Signed-off-by: Tathagata Das <[email protected]>
…fter calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - #3053 (diff) - #3121 (comment) Author: Josh Rosen <[email protected]> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet.
…fter calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - #3053 (diff) - #3121 (comment) Author: Josh Rosen <[email protected]> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. (cherry picked from commit 7b41b17) Signed-off-by: Tathagata Das <[email protected]>
…fter calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - #3053 (diff) - #3121 (comment) Author: Josh Rosen <[email protected]> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. (cherry picked from commit 7b41b17) Signed-off-by: Tathagata Das <[email protected]>
A leak of event loops may be causing test failures.