-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15345][SQL][PYSPARK]. SparkSession's conf doesn't take effect when this already an existing SparkContext #13160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #58734 has finished for PR 13160 at commit
|
|
hm I don't think it is safe to create two contexts and stop the existing one. Maybe we can go ahead and change the sql conf setting, but not the spark context setting? |
|
Changing sql conf setting seems a better approach. Besides that some other setting in SparkSession#Build also won't take effect if there's already an existing SparkContext. e.g. if there's an local mode SparkContext, then user can not create SparkSession of yarn-cluster mode, At least we need to add warning message for that case. |
|
logging a warning seems like a good idea! |
|
It seems changing sqlconf won't work. Because SharedState is coupled with SparkContext closely while SparkContext.conf is immutable. That means if SparkContext is created manually before SparkSession, then every setting can not be changed. It looks it would involve more changes to fix it. |
|
Test build #58755 has finished for PR 13160 at commit
|
|
Can you explain more why is it a problem? I think it is OK to not change the undelying SparkContext, since the configs are supposed to be mutable, but only change the SparkSession's configs. |
|
SharedState is constructed from SparkContext, so if there's already an existing SparkContext, I can not pass additional session conf to SharedState since the conf of SparkContext is immutable. Please correct me if I don't understand it correctly. |
|
I've done it here: #13200 |
|
@rxin Seems it didn't resolve the issue described in the jira, I don't have time to check the changes, may take a look at it tomorrow. Another issue I found is that hive-site.xml is not picked up by spark, I see the following log even I put hive-site.xml on classpath ( it should connect to hive metastore service and warehouse path should be a hdfs path) |
…when this already an existing SparkContext
|
Test build #59199 has finished for PR 13160 at commit
|
|
Test build #59200 has finished for PR 13160 at commit
|
|
test it again. |
|
Test build #59201 has finished for PR 13160 at commit
|
|
cc @andrewor14 |
| if (activeContext.get() == null) { | ||
| setActiveContext(new SparkContext(config), allowMultipleContexts = false) | ||
| } | ||
| logWarning("Use an existing SparkContext, some configuration may not take effect.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe log this only if there are non-empty configs? same thing for all the warning messages later.
|
Thanks @rxin, update the PR. |
|
LGTM pending Jenkins. |
|
Test build #59243 has finished for PR 13160 at commit
|
|
LGTM2. Merging into master 2.0 |
…hen this already an existing SparkContext ## What changes were proposed in this pull request? Override the existing SparkContext is the provided SparkConf is different. PySpark part hasn't been fixed yet, will do that after the first round of review to ensure this is the correct approach. ## How was this patch tested? Manually verify it in spark-shell. rxin Please help review it, I think this is a very critical issue for spark 2.0 Author: Jeff Zhang <[email protected]> Closes #13160 from zjffdu/SPARK-15345. (cherry picked from commit 01e7b9c) Signed-off-by: Andrew Or <[email protected]>
| options.foreach { case (k, v) => sparkConf.set(k, v) } | ||
| SparkContext.getOrCreate(sparkConf) | ||
| val sc = SparkContext.getOrCreate(sparkConf) | ||
| // maybe this is an existing SparkContext, update its SparkConf which maybe used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can check whether the sc was pre-existing, right ?
In that case the foreach below is not needed.
What changes were proposed in this pull request?
Override the existing SparkContext is the provided SparkConf is different. PySpark part hasn't been fixed yet, will do that after the first round of review to ensure this is the correct approach.
How was this patch tested?
Manually verify it in spark-shell.
@rxin Please help review it, I think this is a very critical issue for spark 2.0