Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented May 12, 2015

In a REPL/notebook environment, its very easy to lose a reference to a StreamingContext by overriding the variable name. So if you happen to execute the following commands

val ssc = new StreamingContext(...) // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...) // accidentally run cmd 1 again

The value of ssc will be overwritten. Now you can neither start the new context (as only one context can be started), nor stop the previous context (as the reference is lost).
Hence its best to maintain a singleton reference to the active context, so that we never loose reference for the active context.
Since this problem occurs useful in REPL environments, its best to add this as an Experimental support in the Scala API only so that it can be used in Scala REPLs and notebooks.

@tdas
Copy link
Contributor Author

tdas commented May 12, 2015

@pwendell

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32453 has started for PR 6070 at commit d37a846.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32453 has finished for PR 6070 at commit d37a846.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32453/
Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you renamed this to getActiveOrCreate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@pwendell
Copy link
Contributor

The API design all looks good to me. I'm a bit confused about the locking (seems strange to have both a guarding lock and the atomic reference) and I was wondering if maybe there is an opportunity for racing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nonEmpty -> isEmpty

@tdas
Copy link
Contributor Author

tdas commented May 12, 2015

@pwendell @zsxwing I should have mentioned in the title that this is WIP as #6060 will cause merge conflicts and is needed the full picture regarding locks.

@tdas tdas changed the title [SPARK-7553] [STREAMING] Added methods to maintain a singleton StreamingContext [SPARK-7553] [STREAMING][WIP] Added methods to maintain a singleton StreamingContext May 12, 2015
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32470 has started for PR 6070 at commit 3884a25.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32470 has finished for PR 6070 at commit 3884a25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32470/
Test FAILed.

@pwendell
Copy link
Contributor

Okay, LGTM

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32519 has started for PR 6070 at commit 64706c9.

@zsxwing
Copy link
Member

zsxwing commented May 12, 2015

LGTM

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32519/
Test PASSed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32531 has started for PR 6070 at commit a797171.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32531 has finished for PR 6070 at commit a797171.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32531/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32533 has started for PR 6070 at commit 731c9a1.

@tdas tdas changed the title [SPARK-7553] [STREAMING][WIP] Added methods to maintain a singleton StreamingContext [SPARK-7553] [STREAMING] Added methods to maintain a singleton StreamingContext May 12, 2015
@harishreedharan
Copy link
Contributor

If I create a new StreamingContext using the constructor it will not be considered as an active one which can be retrieved from getActiveOrCreate. This should be clearly documented, as this might cause leaks like the one this PR is trying to prevent. Also, how would this work when we are recovering from a checkpoint? How does one tell that the one from checkpoint should be the active one?

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32533 has finished for PR 6070 at commit 731c9a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32533/
Test PASSed.

@tdas
Copy link
Contributor Author

tdas commented May 12, 2015

Is the current documentation on getActiveOrCreate insufficient? It states that either get the active one which is started but not stopped, or get from checkpoint / create new. And getActiveOrCreate also takes a checkpoint path. And as the docs says, getActiveOrCreate(checkpointPath) will either

  1. get the active context, or
  2. recover from checkpoint, or
  3. create a new context
    In 1, the context will be already active. In the last 2, the contexts will be in INITIALIZED state and wil have to be started to start anything. BTW, start() as also been made idempotent, or in other words, no-op in case the cntext is already started. So in all the three above cases, you can call start().

So the following lines become idempotent

StreamingContext.getActiveOrCreate(checkpointPath, creatingFunc).start()

@harishreedharan
Copy link
Contributor

The current docs don't clearly say that if a new StreamingContext is created directly using the constructor and not one of the getActiveOrCreate methods, it is not tracked. It should probably be made clear that an app that uses getActiveOrCreate should not directly call the constructor.

Sorry, I didn't notice the getActiveOrCreate(checkpointPath) - so you can ignore that comment.

@tdas
Copy link
Contributor Author

tdas commented May 12, 2015

Actually, it is tracked! A new StreamingContext created directly is also tracked. So the semantics is what is intuitive, and so I dont think it needs an extra documentation.

@harishreedharan
Copy link
Contributor

OK, then we are good :)

@tdas
Copy link
Contributor Author

tdas commented May 12, 2015

Alright! Merging this then. Thank you all for reviewing.

On Tue, May 12, 2015 at 4:10 PM, Hari Shreedharan [email protected]
wrote:

OK, then we are good :)


Reply to this email directly or view it on GitHub
#6070 (comment).

asfgit pushed a commit that referenced this pull request May 12, 2015
…ingContext

In a REPL/notebook environment, its very easy to lose a reference to a StreamingContext by overriding the variable name. So if you happen to execute the following commands
```
val ssc = new StreamingContext(...) // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...) // accidentally run cmd 1 again
```
The value of ssc will be overwritten. Now you can neither start the new context (as only one context can be started), nor stop the previous context (as the reference is lost).
Hence its best to maintain a singleton reference to the active context, so that we never loose reference for the active context.
Since this problem occurs useful in REPL environments, its best to add this as an Experimental support in the Scala API only so that it can be used in Scala REPLs and notebooks.

Author: Tathagata Das <[email protected]>

Closes #6070 from tdas/SPARK-7553 and squashes the following commits:

731c9a1 [Tathagata Das] Fixed style
a797171 [Tathagata Das] Added more unit tests
19fc70b [Tathagata Das] Added :: Experimental :: in docs
64706c9 [Tathagata Das] Fixed test
634db5d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7553
3884a25 [Tathagata Das] Fixing test bug
d37a846 [Tathagata Das] Added getActive and getActiveOrCreate

(cherry picked from commit 00e7b09)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in 00e7b09 May 12, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
…ingContext

In a REPL/notebook environment, its very easy to lose a reference to a StreamingContext by overriding the variable name. So if you happen to execute the following commands
```
val ssc = new StreamingContext(...) // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...) // accidentally run cmd 1 again
```
The value of ssc will be overwritten. Now you can neither start the new context (as only one context can be started), nor stop the previous context (as the reference is lost).
Hence its best to maintain a singleton reference to the active context, so that we never loose reference for the active context.
Since this problem occurs useful in REPL environments, its best to add this as an Experimental support in the Scala API only so that it can be used in Scala REPLs and notebooks.

Author: Tathagata Das <[email protected]>

Closes apache#6070 from tdas/SPARK-7553 and squashes the following commits:

731c9a1 [Tathagata Das] Fixed style
a797171 [Tathagata Das] Added more unit tests
19fc70b [Tathagata Das] Added :: Experimental :: in docs
64706c9 [Tathagata Das] Fixed test
634db5d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7553
3884a25 [Tathagata Das] Fixing test bug
d37a846 [Tathagata Das] Added getActive and getActiveOrCreate
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…ingContext

In a REPL/notebook environment, its very easy to lose a reference to a StreamingContext by overriding the variable name. So if you happen to execute the following commands
```
val ssc = new StreamingContext(...) // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...) // accidentally run cmd 1 again
```
The value of ssc will be overwritten. Now you can neither start the new context (as only one context can be started), nor stop the previous context (as the reference is lost).
Hence its best to maintain a singleton reference to the active context, so that we never loose reference for the active context.
Since this problem occurs useful in REPL environments, its best to add this as an Experimental support in the Scala API only so that it can be used in Scala REPLs and notebooks.

Author: Tathagata Das <[email protected]>

Closes apache#6070 from tdas/SPARK-7553 and squashes the following commits:

731c9a1 [Tathagata Das] Fixed style
a797171 [Tathagata Das] Added more unit tests
19fc70b [Tathagata Das] Added :: Experimental :: in docs
64706c9 [Tathagata Das] Fixed test
634db5d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7553
3884a25 [Tathagata Das] Fixing test bug
d37a846 [Tathagata Das] Added getActive and getActiveOrCreate
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…ingContext

In a REPL/notebook environment, its very easy to lose a reference to a StreamingContext by overriding the variable name. So if you happen to execute the following commands
```
val ssc = new StreamingContext(...) // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...) // accidentally run cmd 1 again
```
The value of ssc will be overwritten. Now you can neither start the new context (as only one context can be started), nor stop the previous context (as the reference is lost).
Hence its best to maintain a singleton reference to the active context, so that we never loose reference for the active context.
Since this problem occurs useful in REPL environments, its best to add this as an Experimental support in the Scala API only so that it can be used in Scala REPLs and notebooks.

Author: Tathagata Das <[email protected]>

Closes apache#6070 from tdas/SPARK-7553 and squashes the following commits:

731c9a1 [Tathagata Das] Fixed style
a797171 [Tathagata Das] Added more unit tests
19fc70b [Tathagata Das] Added :: Experimental :: in docs
64706c9 [Tathagata Das] Fixed test
634db5d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7553
3884a25 [Tathagata Das] Fixing test bug
d37a846 [Tathagata Das] Added getActive and getActiveOrCreate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants