Skip to content

Conversation

@JoshRosen
Copy link
Contributor

In Spark 1.0.0+, calling stop() on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call stop() on a fresh StreamingContext followed by start(). I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow start() stop() start() then I don't think it makes sense to allow stop() start().

The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call stop(stopSparkContext=True), then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures.

Prior discussions:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen
Copy link
Contributor Author

/cc @tdas for review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "underlying SparkContext"

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23061 has started for PR 3160 at commit 5142517.

  • This patch merges cleanly.

@tdas
Copy link
Contributor

tdas commented Nov 7, 2014

Just one minor comment, otherwise looks good to me.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23063 has started for PR 3160 at commit 832a7f4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23061 has finished for PR 3160 at commit 5142517.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23061/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23063 has finished for PR 3160 at commit 832a7f4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23063/
Test FAILed.

@tdas
Copy link
Contributor

tdas commented Nov 7, 2014

Jenkins, test this again.

@JoshRosen
Copy link
Contributor Author

It's probably not a spurious failure.

This strengthens the invariant that calling stop(true) _always_ stops the
underlying SparkContext, no matter what sequence of calls may have preceded
it.
@JoshRosen
Copy link
Contributor Author

Spotted the problem: my change had swapped the order of sc.stop() and scheduler.stop(); I've fixed this. I also added a test for the case where we call stop(false) followed by stop(true). In this case, the SparkContext should still be cleaned up.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23071 has started for PR 3160 at commit bdbe5da.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: This unit test should be logically after the "stop only streaming context", as that tests the stopContext = false

@JoshRosen
Copy link
Contributor Author

Good catches; I've addressed both comments.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23073 has started for PR 3160 at commit dbcc929.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23071 has finished for PR 3160 at commit bdbe5da.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23071/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23073 has finished for PR 3160 at commit dbcc929.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23073/
Test PASSed.

@tdas
Copy link
Contributor

tdas commented Nov 9, 2014

merging this. thanks josh for this PR!

asfgit pushed a commit that referenced this pull request Nov 9, 2014
…fter calling stop()

In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`.

The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures.

Prior discussions:
- #3053 (diff)
- #3121 (comment)

Author: Josh Rosen <[email protected]>

Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits:

dbcc929 [Josh Rosen] Address more review comments
bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before.
03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called.
832a7f4 [Josh Rosen] Address review comment
5142517 [Josh Rosen] Add tests; improve Scaladoc.
813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49
5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet.

(cherry picked from commit 7b41b17)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in 7b41b17 Nov 9, 2014
asfgit pushed a commit that referenced this pull request Nov 9, 2014
…fter calling stop()

In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`.

The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures.

Prior discussions:
- #3053 (diff)
- #3121 (comment)

Author: Josh Rosen <[email protected]>

Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits:

dbcc929 [Josh Rosen] Address more review comments
bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before.
03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called.
832a7f4 [Josh Rosen] Address review comment
5142517 [Josh Rosen] Add tests; improve Scaladoc.
813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49
5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet.

(cherry picked from commit 7b41b17)
Signed-off-by: Tathagata Das <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 9, 2014
…fter calling stop()

In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`.

The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures.

Prior discussions:
- #3053 (diff)
- #3121 (comment)

Author: Josh Rosen <[email protected]>

Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits:

dbcc929 [Josh Rosen] Address more review comments
bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before.
03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called.
832a7f4 [Josh Rosen] Address review comment
5142517 [Josh Rosen] Add tests; improve Scaladoc.
813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49
5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet.

(cherry picked from commit 7b41b17)
Signed-off-by: Tathagata Das <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants