Skip to content

Conversation

@zjffdu
Copy link
Contributor

@zjffdu zjffdu commented May 18, 2016

What changes were proposed in this pull request?

Override the existing SparkContext is the provided SparkConf is different. PySpark part hasn't been fixed yet, will do that after the first round of review to ensure this is the correct approach.

How was this patch tested?

Manually verify it in spark-shell.

@rxin Please help review it, I think this is a very critical issue for spark 2.0

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58734 has finished for PR 13160 at commit 7db5981.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 18, 2016

hm I don't think it is safe to create two contexts and stop the existing one. Maybe we can go ahead and change the sql conf setting, but not the spark context setting?

@zjffdu
Copy link
Contributor Author

zjffdu commented May 18, 2016

Changing sql conf setting seems a better approach. Besides that some other setting in SparkSession#Build also won't take effect if there's already an existing SparkContext. e.g. if there's an local mode SparkContext, then user can not create SparkSession of yarn-cluster mode, At least we need to add warning message for that case.

@rxin
Copy link
Contributor

rxin commented May 18, 2016

logging a warning seems like a good idea!

@zjffdu
Copy link
Contributor Author

zjffdu commented May 18, 2016

It seems changing sqlconf won't work. Because SharedState is coupled with SparkContext closely while SparkContext.conf is immutable. That means if SparkContext is created manually before SparkSession, then every setting can not be changed. It looks it would involve more changes to fix it.

/**
 * A class that holds all state shared across sessions in a given [[SQLContext]].
 */
private[sql] class SharedState(val sparkContext: SparkContext) {

  /**
   * Class for caching query results reused in future executions.
   */
  val cacheManager: CacheManager = new CacheManager

  /**
   * A listener for SQL-specific [[org.apache.spark.scheduler.SparkListenerEvent]]s.
   */
  val listener: SQLListener = SQLContext.createListenerAndUI(sparkContext)

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58755 has finished for PR 13160 at commit 09c2646.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Builder extends Logging

@rxin
Copy link
Contributor

rxin commented May 19, 2016

Can you explain more why is it a problem? I think it is OK to not change the undelying SparkContext, since the configs are supposed to be mutable, but only change the SparkSession's configs.

@zjffdu
Copy link
Contributor Author

zjffdu commented May 19, 2016

SharedState is constructed from SparkContext, so if there's already an existing SparkContext, I can not pass additional session conf to SharedState since the conf of SparkContext is immutable. Please correct me if I don't understand it correctly.

@rxin
Copy link
Contributor

rxin commented May 19, 2016

I've done it here: #13200

@rxin
Copy link
Contributor

rxin commented May 20, 2016

@zjffdu did #13200 solve the problem?

@zjffdu
Copy link
Contributor Author

zjffdu commented May 23, 2016

@rxin Seems it didn't resolve the issue described in the jira, I don't have time to check the changes, may take a look at it tomorrow. Another issue I found is that hive-site.xml is not picked up by spark, I see the following log even I put hive-site.xml on classpath ( it should connect to hive metastore service and warehouse path should be a hdfs path)

16/05/23 20:45:00 INFO HiveSharedState: Setting Hive metastore warehouse path to '/Users/jzhang/github/spark/spark-warehouse'

@SparkQA
Copy link

SparkQA commented May 24, 2016

Test build #59199 has finished for PR 13160 at commit 4f11366.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Builder extends Logging

@SparkQA
Copy link

SparkQA commented May 24, 2016

Test build #59200 has finished for PR 13160 at commit d89bd06.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zjffdu
Copy link
Contributor Author

zjffdu commented May 24, 2016

test it again.

@SparkQA
Copy link

SparkQA commented May 24, 2016

Test build #59201 has finished for PR 13160 at commit a4f160e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 24, 2016

cc @andrewor14

if (activeContext.get() == null) {
setActiveContext(new SparkContext(config), allowMultipleContexts = false)
}
logWarning("Use an existing SparkContext, some configuration may not take effect.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe log this only if there are non-empty configs? same thing for all the warning messages later.

@zjffdu
Copy link
Contributor Author

zjffdu commented May 25, 2016

Thanks @rxin, update the PR.

@rxin
Copy link
Contributor

rxin commented May 25, 2016

LGTM pending Jenkins.

@SparkQA
Copy link

SparkQA commented May 25, 2016

Test build #59243 has finished for PR 13160 at commit 071839f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

LGTM2. Merging into master 2.0

asfgit pushed a commit that referenced this pull request May 25, 2016
…hen this already an existing SparkContext

## What changes were proposed in this pull request?

Override the existing SparkContext is the provided SparkConf is different. PySpark part hasn't been fixed yet, will do that after the first round of review to ensure this is the correct approach.

## How was this patch tested?

Manually verify it in spark-shell.

rxin  Please help review it, I think this is a very critical issue for spark 2.0

Author: Jeff Zhang <[email protected]>

Closes #13160 from zjffdu/SPARK-15345.

(cherry picked from commit 01e7b9c)
Signed-off-by: Andrew Or <[email protected]>
@asfgit asfgit closed this in 01e7b9c May 25, 2016
options.foreach { case (k, v) => sparkConf.set(k, v) }
SparkContext.getOrCreate(sparkConf)
val sc = SparkContext.getOrCreate(sparkConf)
// maybe this is an existing SparkContext, update its SparkConf which maybe used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check whether the sc was pre-existing, right ?

In that case the foreach below is not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants