[SPARK-4183] Close transport-related resources between SparkContexts #3053

aarondav · 2014-11-01T22:37:24Z

A leak of event loops may be causing test failures.

aarondav · 2014-11-01T22:37:58Z

core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala

This is the most significant leak, we'd leak one event loop per SparkContext

SparkQA · 2014-11-01T22:44:50Z

Test build #22706 has started for PR 3053 at commit ea6582d.

This patch merges cleanly.

SparkQA · 2014-11-01T22:52:32Z

Test build #22707 has started for PR 3053 at commit 0540d2f.

This patch merges cleanly.

AmplabJenkins · 2014-11-01T22:57:32Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22707/
Test FAILed.

SparkQA · 2014-11-01T22:57:33Z

Test build #22708 has started for PR 3053 at commit 53d6198.

This patch merges cleanly.

SparkQA · 2014-11-02T00:06:34Z

Test build #22706 has finished for PR 3053 at commit ea6582d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T00:06:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22706/
Test PASSed.

SparkQA · 2014-11-02T00:07:32Z

Test build #22708 has finished for PR 3053 at commit 53d6198.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T00:07:36Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22708/
Test FAILed.

SparkQA · 2014-11-02T01:10:03Z

Test build #22723 has started for PR 3053 at commit 5717473.

This patch merges cleanly.

SparkQA · 2014-11-02T01:57:34Z

Test build #22726 has started for PR 3053 at commit 391cdb8.

This patch merges cleanly.

SparkQA · 2014-11-02T02:20:05Z

Test build #22723 has finished for PR 3053 at commit 5717473.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T02:20:08Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22723/
Test FAILed.

SparkQA · 2014-11-02T02:40:07Z

Test build #22730 has started for PR 3053 at commit 73abd86.

This patch merges cleanly.

AmplabJenkins · 2014-11-02T02:47:18Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22729/
Test FAILed.

SparkQA · 2014-11-02T02:55:35Z

Test build #22726 has finished for PR 3053 at commit 391cdb8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T02:55:38Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22726/
Test FAILed.

SparkQA · 2014-11-02T03:47:29Z

Test build #22735 has started for PR 3053 at commit ff80166.

This patch merges cleanly.

SparkQA · 2014-11-02T04:38:11Z

Test build #22730 has finished for PR 3053 at commit 73abd86.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T04:38:14Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22730/
Test FAILed.

rxin · 2014-11-02T05:08:28Z

network/common/src/main/java/org/apache/spark/network/server/TransportServer.java

why print instead of logging?

This is all for my current jenkins debugging, where I can see the printlns but not logs.

If you have SSH access to AMPLab Jenkins, you can download per-test log dumps (see the email that I sent to our Spark list for instructions).

No idea if I have ssh access, this seemed easier. Man, I caught a lot of leaky tests!

SparkQA · 2014-11-02T05:47:29Z

Test build #22735 timed out for PR 3053 at commit ff80166 after a configured wait of 120m.

AmplabJenkins · 2014-11-02T05:47:33Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22735/
Test FAILed.

SparkQA · 2014-11-02T06:34:47Z

Test build #22749 has started for PR 3053 at commit e77f936.

This patch merges cleanly.

SparkQA · 2014-11-02T06:49:51Z

Test build #22750 has started for PR 3053 at commit ca2e6e6.

This patch merges cleanly.

SparkQA · 2014-11-02T07:37:34Z

Test build #22755 has started for PR 3053 at commit 060a2b9.

This patch merges cleanly.

SparkQA · 2014-11-02T07:38:11Z

Test build #22749 has finished for PR 3053 at commit e77f936.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

aarondav · 2014-11-02T09:03:40Z

core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala

@pwendell maybe take a look at the changes here? No one in particular is responsible for this test, I think.

Since the shutdown code isn't in a try-finally block or after() function, I guess it's still possible that it won't be called when tests fail. So, one test failure still might trigger spurious failures of later tests.

Maybe this is less of a concern, though, since I guess we're more worried about leaked resources from a passing test causing a subsequent test to fail.

Yeah, either would be better, but since each test creates a different set of things, and sometimes calling stop() throws an exception, I decided to just do it in the non-failing case for this PR.

SparkQA · 2014-11-02T09:04:56Z

Test build #22758 has started for PR 3053 at commit 7e49f10.

This patch merges cleanly.

SparkQA · 2014-11-02T09:36:01Z

Test build #22758 has finished for PR 3053 at commit 7e49f10.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T09:36:04Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22758/
Test PASSed.

SparkQA · 2014-11-02T20:57:34Z

Test build #22771 has started for PR 3053 at commit 8f96475.

This patch merges cleanly.

aarondav · 2014-11-02T20:59:10Z

streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala

@tdas I have reverted the changes, though you can see from the fact that we have to do this that the API is not clean.

SparkQA · 2014-11-02T21:05:04Z

Test build #22772 has started for PR 3053 at commit e676d18.

This patch merges cleanly.

rxin · 2014-11-02T21:55:18Z

LGTM. Feel free to merge after tests pass.

SparkQA · 2014-11-02T22:01:14Z

Test build #22771 has finished for PR 3053 at commit 8f96475.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T22:01:18Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22771/
Test FAILed.

aarondav · 2014-11-02T22:24:54Z

Jenkins, retest this please.

SparkQA · 2014-11-02T22:27:34Z

Test build #22775 has started for PR 3053 at commit e676d18.

This patch merges cleanly.

SparkQA · 2014-11-02T22:30:37Z

Test build #22772 has finished for PR 3053 at commit e676d18.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T22:30:40Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22772/
Test PASSed.

SparkQA · 2014-11-02T23:50:56Z

Test build #22775 has finished for PR 3053 at commit e676d18.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-02T23:51:00Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22775/
Test PASSed.

pwendell · 2014-11-03T00:26:27Z

Okay great - let's try this again. I'll merge it!

…fter calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - #3053 (diff) - #3121 (comment) Author: Josh Rosen <[email protected]> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. (cherry picked from commit 7b41b17) Signed-off-by: Tathagata Das <[email protected]>

…fter calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - #3053 (diff) - #3121 (comment) Author: Josh Rosen <[email protected]> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet.

…fter calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - #3053 (diff) - #3121 (comment) Author: Josh Rosen <[email protected]> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. (cherry picked from commit 7b41b17) Signed-off-by: Tathagata Das <[email protected]>

aarondav reviewed Nov 1, 2014
View reviewed changes

aarondav force-pushed the leak branch from 0540d2f to 53d6198 Compare November 1, 2014 22:56

rxin reviewed Nov 2, 2014
View reviewed changes

aarondav reviewed Nov 2, 2014
View reviewed changes

Keep original ssc semantics

8f96475

Typo!

e676d18

aarondav reviewed Nov 2, 2014
View reviewed changes

asfgit closed this in 2ebd1df Nov 3, 2014

aarondav mentioned this pull request Nov 6, 2014

[SPARK-4180] [Core] Prevent creation of multiple active SparkContexts #3121

Closed

JoshRosen mentioned this pull request Nov 7, 2014

[SPARK-4301] StreamingContext should not allow start() to be called after calling stop() #3160

Closed

[SPARK-4183] Close transport-related resources between SparkContexts #3053

[SPARK-4183] Close transport-related resources between SparkContexts #3053

Uh oh!

Conversation

aarondav commented Nov 1, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 1, 2014

Uh oh!

SparkQA commented Nov 1, 2014

Uh oh!

AmplabJenkins commented Nov 1, 2014

Uh oh!

SparkQA commented Nov 1, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

AmplabJenkins commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

AmplabJenkins commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

AmplabJenkins commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

AmplabJenkins commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

AmplabJenkins commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

AmplabJenkins commented Nov 2, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

AmplabJenkins commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!

SparkQA commented Nov 2, 2014

Uh oh!