[SPARK-4301] StreamingContext should not allow start() to be called after calling stop() #3160

JoshRosen · 2014-11-07T19:22:32Z

In Spark 1.0.0+, calling stop() on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call stop() on a fresh StreamingContext followed by start(). I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow start() stop() start() then I don't think it makes sense to allow stop() start().

The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call stop(stopSparkContext=True), then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures.

Prior discussions:

…text has not been started yet.

…files#diff-e144dbee130ed84f9465853ddce65f8eR49

JoshRosen · 2014-11-07T19:23:01Z

streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala

This reverts a workaround that Aaron added for this issue in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49

JoshRosen · 2014-11-07T19:23:09Z

/cc @tdas for review

tdas · 2014-11-07T19:25:52Z

streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala

Add "underlying SparkContext"

SparkQA · 2014-11-07T19:27:38Z

Test build #23061 has started for PR 3160 at commit 5142517.

This patch merges cleanly.

tdas · 2014-11-07T19:28:22Z

Just one minor comment, otherwise looks good to me.

SparkQA · 2014-11-07T19:40:02Z

Test build #23063 has started for PR 3160 at commit 832a7f4.

This patch merges cleanly.

SparkQA · 2014-11-07T20:32:49Z

Test build #23061 has finished for PR 3160 at commit 5142517.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-07T20:32:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23061/
Test FAILed.

SparkQA · 2014-11-07T20:45:29Z

Test build #23063 has finished for PR 3160 at commit 832a7f4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-07T20:45:32Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23063/
Test FAILed.

tdas · 2014-11-07T21:07:36Z

Jenkins, test this again.

JoshRosen · 2014-11-07T21:09:40Z

It's probably not a spurious failure.

This strengthens the invariant that calling stop(true) _always_ stops the underlying SparkContext, no matter what sequence of calls may have preceded it.

JoshRosen · 2014-11-07T21:44:29Z

Spotted the problem: my change had swapped the order of sc.stop() and scheduler.stop(); I've fixed this. I also added a test for the case where we call stop(false) followed by stop(true). In this case, the SparkContext should still be cleaned up.

SparkQA · 2014-11-07T21:50:18Z

Test build #23071 has started for PR 3160 at commit bdbe5da.

This patch merges cleanly.

tdas · 2014-11-07T22:00:25Z

streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala

Minor nit: This unit test should be logically after the "stop only streaming context", as that tests the stopContext = false

JoshRosen · 2014-11-07T22:09:24Z

Good catches; I've addressed both comments.

SparkQA · 2014-11-07T22:15:04Z

Test build #23073 has started for PR 3160 at commit dbcc929.

This patch merges cleanly.

SparkQA · 2014-11-07T22:55:51Z

Test build #23071 has finished for PR 3160 at commit bdbe5da.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-07T22:55:55Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23071/
Test FAILed.

SparkQA · 2014-11-07T23:39:50Z

Test build #23073 has finished for PR 3160 at commit dbcc929.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-07T23:39:54Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23073/
Test PASSed.

tdas · 2014-11-09T02:09:54Z

merging this. thanks josh for this PR!

…fter calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - #3053 (diff) - #3121 (comment) Author: Josh Rosen <[email protected]> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. (cherry picked from commit 7b41b17) Signed-off-by: Tathagata Das <[email protected]>

JoshRosen added 3 commits November 7, 2014 10:53

StreamingContext.stop() should stop SparkContext even if StreamingCon…

5558e70

…text has not been started yet.

Revert workaround added in https://github.com/apache/spark/pull/3053/…

813e471

…files#diff-e144dbee130ed84f9465853ddce65f8eR49

Add tests; improve Scaladoc.

5142517

JoshRosen reviewed Nov 7, 2014
View reviewed changes

tdas reviewed Nov 7, 2014
View reviewed changes

streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala Outdated

Copy link

Contributor

tdas Nov 7, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "underlying SparkContext"

Address review comment

832a7f4

JoshRosen added 2 commits November 7, 2014 13:25

Always stop SparkContext, even if stop(false) has already been called.

03e9c40

This strengthens the invariant that calling stop(true) _always_ stops the underlying SparkContext, no matter what sequence of calls may have preceded it.

Stop SparkContext after stopping scheduler, not before.

bdbe5da

tdas reviewed Nov 7, 2014
View reviewed changes

Address more review comments

dbcc929

asfgit closed this in 7b41b17 Nov 9, 2014

[SPARK-4301] StreamingContext should not allow start() to be called after calling stop() #3160

[SPARK-4301] StreamingContext should not allow start() to be called after calling stop() #3160

Uh oh!

Conversation

JoshRosen commented Nov 7, 2014

Uh oh!

JoshRosen Nov 7, 2014

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Nov 7, 2014

Uh oh!

tdas Nov 7, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 7, 2014

Uh oh!

tdas commented Nov 7, 2014

Uh oh!

SparkQA commented Nov 7, 2014

Uh oh!

SparkQA commented Nov 7, 2014

Uh oh!

AmplabJenkins commented Nov 7, 2014

Uh oh!

SparkQA commented Nov 7, 2014

Uh oh!

AmplabJenkins commented Nov 7, 2014

Uh oh!

tdas commented Nov 7, 2014

Uh oh!

JoshRosen commented Nov 7, 2014

Uh oh!

JoshRosen commented Nov 7, 2014

Uh oh!

SparkQA commented Nov 7, 2014

Uh oh!

tdas Nov 7, 2014

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Nov 7, 2014

Uh oh!

SparkQA commented Nov 7, 2014

Uh oh!

SparkQA commented Nov 7, 2014

Uh oh!

AmplabJenkins commented Nov 7, 2014

Uh oh!

SparkQA commented Nov 7, 2014

Uh oh!

AmplabJenkins commented Nov 7, 2014

Uh oh!

tdas commented Nov 9, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants