Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented May 12, 2015

This is a revision of the earlier version (see #5773) that passed the active SparkContext explicitly through a new set of Java and Scala API. The drawbacks are.

  • Hard to implement in python.
  • New API introduced. This is even more confusing since we are introducing getActiveOrCreate in SPARK-7553

Furthermore, there is now a direct way get an existing active SparkContext or create a new on - SparkContext.getOrCreate(conf). Its better to use this to get the SparkContext rather than have a new API to explicitly pass the context.

So in this PR I have

  • Removed the new versions of StreamingContext.getOrCreate() which took SparkContext
  • Added the ability to pick up existing SparkContext when the StreamingContext tries to create a SparkContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the whole unit test to test the deleted API has been replaced by a sub-unit-test that tests older API's ability to use an existing SparkContext.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32535 has started for PR 6096 at commit f024b77.

@tdas
Copy link
Contributor Author

tdas commented May 12, 2015

@zsxwing

@zsxwing
Copy link
Member

zsxwing commented May 12, 2015

LGTM

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32535 has finished for PR 6096 at commit f024b77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • decisionTreeCode = '''class DecisionTreeParams(Params):
    • class DecisionTreeParams(Params):
    • class LinearRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    • class LinearRegressionModel(JavaModel):
    • class TreeRegressorParams(object):
    • class RandomForestParams(object):
    • class GBTParams(object):
    • class DecisionTreeRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class DecisionTreeRegressionModel(JavaModel):
    • class RandomForestRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasSeed,
    • class RandomForestRegressionModel(JavaModel):
    • class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    • class GBTRegressionModel(JavaModel):

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32535/
Test PASSed.

Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@tdas
Copy link
Contributor Author

tdas commented May 12, 2015

@JoshRosen Since you took a look at the original PR #5428, could you take a look at this?
Also @harishreedharan.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32550 has started for PR 6096 at commit 53f4b2d.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32550 has finished for PR 6096 at commit 53f4b2d.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32550/
Test FAILed.

@tdas
Copy link
Contributor Author

tdas commented May 13, 2015

test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32554 has started for PR 6096 at commit 53f4b2d.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32554 has finished for PR 6096 at commit 53f4b2d.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class OneVsRest extends Estimator[OneVsRestModel] with OneVsRestParams

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32554/
Test FAILed.

@tdas
Copy link
Contributor Author

tdas commented May 13, 2015

test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32556 has started for PR 6096 at commit 53f4b2d.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32556 has finished for PR 6096 at commit 53f4b2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32556/
Test PASSed.

@JoshRosen
Copy link
Contributor

This seems fine to me. My only comment would be whether we need to document this behavior anywhere or whether this behavior change might cause confusion for users. I don't think it should cause issues, though, since the old code would have just thrown a confusing "multiple active SparkContexts in the same JVM" message in these scenarios.

@tdas
Copy link
Contributor Author

tdas commented May 14, 2015

Yeah, this would not have been a possible code path before due to that
error. So I dont think this changes anything for anyone. Alright, I am
merging this. Thanks all!

On Wed, May 13, 2015 at 1:43 PM, Josh Rosen [email protected]
wrote:

This seems fine to me. My only comment would be whether we need to
document this behavior anywhere or whether this behavior change might cause
confusion for users. I don't think it should cause issues, though, since
the old code would have just thrown a confusing "multiple active
SparkContexts in the same JVM" message in these scenarios.


Reply to this email directly or view it on GitHub
#6096 (comment).

asfgit pushed a commit that referenced this pull request May 14, 2015
…ated from checkpoint and existing SparkContext

This is a revision of the earlier version (see #5773) that passed the active SparkContext explicitly through a new set of Java and Scala API. The drawbacks are.

* Hard to implement in python.
* New API introduced. This is even more confusing since we are introducing getActiveOrCreate in SPARK-7553

Furthermore, there is now a direct way get an existing active SparkContext or create a new on - SparkContext.getOrCreate(conf). Its better to use this to get the SparkContext rather than have a new API to explicitly pass the context.

So in this PR I have
* Removed the new versions of StreamingContext.getOrCreate() which took SparkContext
* Added the ability to pick up existing SparkContext when the StreamingContext tries to create a SparkContext.

Author: Tathagata Das <[email protected]>

Closes #6096 from tdas/SPARK-6752 and squashes the following commits:

53f4b2d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-6752
f024b77 [Tathagata Das] Removed extra API and used SparkContext.getOrCreate

(cherry picked from commit bce00da)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in bce00da May 14, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
…ated from checkpoint and existing SparkContext

This is a revision of the earlier version (see apache#5773) that passed the active SparkContext explicitly through a new set of Java and Scala API. The drawbacks are.

* Hard to implement in python.
* New API introduced. This is even more confusing since we are introducing getActiveOrCreate in SPARK-7553

Furthermore, there is now a direct way get an existing active SparkContext or create a new on - SparkContext.getOrCreate(conf). Its better to use this to get the SparkContext rather than have a new API to explicitly pass the context.

So in this PR I have
* Removed the new versions of StreamingContext.getOrCreate() which took SparkContext
* Added the ability to pick up existing SparkContext when the StreamingContext tries to create a SparkContext.

Author: Tathagata Das <[email protected]>

Closes apache#6096 from tdas/SPARK-6752 and squashes the following commits:

53f4b2d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-6752
f024b77 [Tathagata Das] Removed extra API and used SparkContext.getOrCreate
@danielwegener
Copy link

Just curious. How are we supposed to create a StreamingContext from a given SparkContext and a directory if we don't know if the directory exists (which matches the dropped confusing getOrCreate-overloading)?

@tdas
Copy link
Contributor Author

tdas commented Jun 12, 2015

@danielwegener In the creating function that you pass on to getOrCreate, you can do

def creatingFunc(): StreamingContext = {
    val ssc = new StreamingContext(SparkContet.getOrCreate(), batchInterval)
     OR
    val ssc = new StreamingContext(referenceToExistingSparkContext, batchInterval)
}

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…ated from checkpoint and existing SparkContext

This is a revision of the earlier version (see apache#5773) that passed the active SparkContext explicitly through a new set of Java and Scala API. The drawbacks are.

* Hard to implement in python.
* New API introduced. This is even more confusing since we are introducing getActiveOrCreate in SPARK-7553

Furthermore, there is now a direct way get an existing active SparkContext or create a new on - SparkContext.getOrCreate(conf). Its better to use this to get the SparkContext rather than have a new API to explicitly pass the context.

So in this PR I have
* Removed the new versions of StreamingContext.getOrCreate() which took SparkContext
* Added the ability to pick up existing SparkContext when the StreamingContext tries to create a SparkContext.

Author: Tathagata Das <[email protected]>

Closes apache#6096 from tdas/SPARK-6752 and squashes the following commits:

53f4b2d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-6752
f024b77 [Tathagata Das] Removed extra API and used SparkContext.getOrCreate
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…ated from checkpoint and existing SparkContext

This is a revision of the earlier version (see apache#5773) that passed the active SparkContext explicitly through a new set of Java and Scala API. The drawbacks are.

* Hard to implement in python.
* New API introduced. This is even more confusing since we are introducing getActiveOrCreate in SPARK-7553

Furthermore, there is now a direct way get an existing active SparkContext or create a new on - SparkContext.getOrCreate(conf). Its better to use this to get the SparkContext rather than have a new API to explicitly pass the context.

So in this PR I have
* Removed the new versions of StreamingContext.getOrCreate() which took SparkContext
* Added the ability to pick up existing SparkContext when the StreamingContext tries to create a SparkContext.

Author: Tathagata Das <[email protected]>

Closes apache#6096 from tdas/SPARK-6752 and squashes the following commits:

53f4b2d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-6752
f024b77 [Tathagata Das] Removed extra API and used SparkContext.getOrCreate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants