[SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession #28316

yaooqinn · 2020-04-23T16:14:13Z

What changes were proposed in this pull request?

SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession
This seems a long-standing bug.

scala> spark.sql("set spark.sql.warehouse.dir").show
+--------------------+--------------------+
|                 key|               value|
+--------------------+--------------------+
|spark.sql.warehou...|file:/Users/kenty...|
+--------------------+--------------------+


scala> spark.sql("set spark.sql.warehouse.dir=2");
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.warehouse.dir;
  at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154)
  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42)
  at org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100)
  at org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
  at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
  at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
  ... 47 elided

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get
getClass   getOrCreate

scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").getOrCreate
20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect.
res7: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@6403d574

scala> spark.sql("set spark.sql.warehouse.dir").show
+--------------------+-----+
|                 key|value|
+--------------------+-----+
|spark.sql.warehou...|  xyz|
+--------------------+-----+


scala>
OptionsAttachments

Why are the changes needed?

bugfix as shown in the previous section

Does this PR introduce any user-facing change?

Yes, static SQL configurations with SparkSession.builder.config do not propagate to any existing or new SparkSession instances.

How was this patch tested?

new ut.

… the existing active or default SparkSession

yaooqinn · 2020-04-23T16:14:55Z

cc: @cloud-fan @dongjoon-hyun @maropu @HyukjinKwon many thanks.

SparkQA · 2020-04-23T21:42:53Z

Test build #121688 has finished for PR 28316 at commit 92bc41b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-04-24T00:13:21Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

      var session = activeThreadSession.get()
      if ((session ne null) && !session.sparkContext.isStopped) {
-        options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
+        for ((k, v) <- options if !SQLConf.staticConfKeys.contains(k)) {


Could we have clearer warning message than configuration may not take effect if options has any static config?

+1 too. we can warn static SQL separately. But do we still need a general warning messages for core configurations?
e.g.

for ((k, v) <- options) { if (SQLConf.staticConfKeys.contains(k)) { logWarning(s"Using an existing SparkSession, the static configuration $k is ignored.") } else { session.sessionState.conf.setConfString(k, v) } } if ((options -- SQLConf.staticConfKeys.asScala).nonEmpty) { logWarning("Using an existing SparkSession; some spark core configurations may not take" + " effect.") }

maropu · 2020-04-24T00:13:35Z

sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala

+    assert(!session.conf.get(WAREHOUSE_PATH).contains("SPARK-31532-db"))
+    assert(session.conf.get(WAREHOUSE_PATH) === session2.conf.get(WAREHOUSE_PATH))
+    assert(session2.conf.get(GLOBAL_TEMP_DATABASE) === "globaltempdb-spark-31532")
+


nit: remove this blank

maropu · 2020-04-24T00:14:23Z

Nice catch! Looks fine.

maropu · 2020-04-24T00:19:22Z

sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala

+
+  test("SPARK-31532: should not propagate static sql configs to the existing" +
+    " active/default SparkSession") {
+    val session = SparkSession.builder()


Just in case, could we add tests for the case where we should set static configs at SparkSession (In case that no active/default SparkSession exists)? Or, we already have such a test?

I add a new test for such a case, that is, if SparkContext instance exists but no SparkSession exists, then the static configs remain changeable before SparkSession is finally created.

cloud-fan · 2020-04-24T02:52:17Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

    }
+
+    private def applyModifiableSettings(session: SparkSession): Unit = {
+      for ((k, v) <- options) {


how about

val (staticConfs, otherConfs) = options.partition(kv => SQLConf.staticConfKeys.contains(kv._1)) if (staticConfs.nonEmpty) warn... if (otherConfs.nonEmpty) warn...

SparkQA · 2020-04-24T07:05:01Z

Test build #121714 has finished for PR 28316 at commit 0c6a270.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-04-24T07:05:02Z

Test build #121712 has finished for PR 28316 at commit ad91ccb.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-04-24T07:10:07Z

retest this please

SparkQA · 2020-04-24T12:52:37Z

Test build #121730 has finished for PR 28316 at commit 0c6a270.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-04-24T12:56:54Z

retest this please

SparkQA · 2020-04-24T19:09:00Z

Test build #121761 has finished for PR 28316 at commit 0c6a270.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

… the existing active or default SparkSession ### What changes were proposed in this pull request? SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession This seems a long-standing bug. ```scala scala> spark.sql("set spark.sql.warehouse.dir").show +--------------------+--------------------+ | key| value| +--------------------+--------------------+ |spark.sql.warehou...|file:/Users/kenty...| +--------------------+--------------------+ scala> spark.sql("set spark.sql.warehouse.dir=2"); org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.warehouse.dir; at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154) at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) at org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100) at org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) ... 47 elided scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get getClass getOrCreate scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").getOrCreate 20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect. res7: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession6403d574 scala> spark.sql("set spark.sql.warehouse.dir").show +--------------------+-----+ | key|value| +--------------------+-----+ |spark.sql.warehou...| xyz| +--------------------+-----+ scala> OptionsAttachments ``` ### Why are the changes needed? bugfix as shown in the previous section ### Does this PR introduce any user-facing change? Yes, static SQL configurations with SparkSession.builder.config do not propagate to any existing or new SparkSession instances. ### How was this patch tested? new ut. Closes #28316 from yaooqinn/SPARK-31532. Authored-by: Kent Yao <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 8424f55) Signed-off-by: Takeshi Yamamuro <[email protected]>

maropu · 2020-04-24T23:54:51Z

Thanks, @yaooqinn @cloud-fan ! Merged to master/3.0/2.4 according to SPARK-31532.

maropu · 2020-04-24T23:56:27Z

FYI: @gatorsmile @dongjoon-hyun

dongjoon-hyun · 2020-04-25T02:06:21Z

Thank you for pinging me, @maropu .
BTW, unfortunately, this seems to break branch-2.4.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-maven-hadoop-2.6/840/testReport/junit/org.apache.spark.sql/SparkSessionBuilderSuite/SPARK_31532__should_not_propagate_static_sql_configs_to_the_existing_active_default_SparkSession/

org.apache.spark.sql.SparkSessionBuilderSuite.SPARK-31532: should not propagate static sql configs to the existing active/default SparkSession
org.scalatest.exceptions.TestFailedException: "global[TempDB-SPARK]-31532" did not equal "global[tempdb-spark]-31532"

dongjoon-hyun · 2020-04-25T02:11:49Z

We need SPARK-28179 (#24979) to fix the failure, but it will be a behavior change in branch-2.4. Or, we can fix the test case to lowercase. Or, simply revert this from branch-2.4.

cc @gatorsmile , @HyukjinKwon , @srowen

maropu · 2020-04-25T02:37:39Z

Oops, I forgot the Xiao's comment #28262 (comment) ...
How about just fixing the test suite? This is a bugfix, so I personally think branch-2.4 needs this fix.

dongjoon-hyun · 2020-04-25T02:44:14Z

In branch-2.4, the following is the problem.

  val GLOBAL_TEMP_DATABASE = buildStaticConf("spark.sql.globalTempDatabase")
    .internal()
    .stringConf
    .createWithDefault("global_temp")

dongjoon-hyun · 2020-04-25T02:45:11Z

I took a look. It looks okay to change the conf itself.

dongjoon-hyun · 2020-04-25T02:45:46Z

Could you make a follow-up PR with the following?

    // System preserved database should not exists in metastore. However it's hard to guarantee it
    // for every session, because case-sensitivity differs. Here we always lowercase it to make our
    // life easier.
    .transform(_.toLowerCase(Locale.ROOT))

maropu · 2020-04-25T02:46:08Z

Yea, ok. I'll do now.

dongjoon-hyun · 2020-04-25T02:46:39Z

Technically, the difference is the stored value in conf. The generated table is lower case always due to val globalTempDB = sparkContext.conf.get(GLOBAL_TEMP_DATABASE).toLowerCase(Locale.ROOT).

dongjoon-hyun · 2020-04-25T02:47:04Z

Ping the related people including me on the follow-PR. Thanks!

dongjoon-hyun · 2020-04-25T02:48:55Z

Ah.. There is an old test case using case-sensitive test.

dongjoon-hyun · 2020-04-25T02:49:45Z

Please update them together. And, let's see the test result.

sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala:    assert(session.sessionState.conf.getConf(GLOBAL_TEMP_DATABASE) === "globalTempDB-SPARK-31234")
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala:    assert(session.sessionState.conf.getConf(GLOBAL_TEMP_DATABASE) === "globalTempDB-SPARK-31234")

…BAL_TEMP_DATABASE config in SparkSessionBuilderSuite ### What changes were proposed in this pull request? This PR intends to fix test code for using lowercases for the `GLOBAL_TEMP_DATABASE` config in `SparkSessionBuilderSuite`. The handling of the config is different between branch-3.0+ and branch-2.4. In branch-3.0+, Spark always lowercases a value in the config, so I think we had better always use lowercases for it in the test. This comes from the dongjoon-hyun comment: #28316 (comment) ### Why are the changes needed? To fix the test failure in branch-2.4. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Fixed the test. Closes #28339 from maropu/SPARK-31532. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

… the existing active or default SparkSession ### What changes were proposed in this pull request? SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession This seems a long-standing bug. ```scala scala> spark.sql("set spark.sql.warehouse.dir").show +--------------------+--------------------+ | key| value| +--------------------+--------------------+ |spark.sql.warehou...|file:/Users/kenty...| +--------------------+--------------------+ scala> spark.sql("set spark.sql.warehouse.dir=2"); org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.warehouse.dir; at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154) at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) at org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100) at org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) ... 47 elided scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get getClass getOrCreate scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").getOrCreate 20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect. res7: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession6403d574 scala> spark.sql("set spark.sql.warehouse.dir").show +--------------------+-----+ | key|value| +--------------------+-----+ |spark.sql.warehou...| xyz| +--------------------+-----+ scala> OptionsAttachments ``` ### Why are the changes needed? bugfix as shown in the previous section ### Does this PR introduce any user-facing change? Yes, static SQL configurations with SparkSession.builder.config do not propagate to any existing or new SparkSession instances. ### How was this patch tested? new ut. Closes apache#28316 from yaooqinn/SPARK-31532. Authored-by: Kent Yao <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 8424f55) Signed-off-by: Takeshi Yamamuro <[email protected]>

yaooqinn added 2 commits April 24, 2020 00:05

[SPARK-31532][SQL] Builder should not propagate static sql configs to…

4328169

… the existing active or default SparkSession

add test

9c67e0b

probot-autolabeler bot added the SQL label Apr 23, 2020

blank lines

92bc41b

maropu reviewed Apr 24, 2020

View reviewed changes

address comments

ad91ccb

cloud-fan reviewed Apr 24, 2020

View reviewed changes

address comments

0c6a270

cloud-fan approved these changes Apr 24, 2020

View reviewed changes

maropu approved these changes Apr 24, 2020

View reviewed changes

maropu closed this in 8424f55 Apr 24, 2020

[SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession #28316

[SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession #28316

Uh oh!

Conversation

yaooqinn commented Apr 23, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

yaooqinn commented Apr 23, 2020

Uh oh!

SparkQA commented Apr 23, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Apr 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 24, 2020

Uh oh!

SparkQA commented Apr 24, 2020

Uh oh!

maropu commented Apr 24, 2020

Uh oh!

SparkQA commented Apr 24, 2020

Uh oh!

cloud-fan commented Apr 24, 2020

Uh oh!

SparkQA commented Apr 24, 2020

Uh oh!

maropu commented Apr 24, 2020

Uh oh!

maropu commented Apr 24, 2020

Uh oh!

dongjoon-hyun commented Apr 25, 2020

Uh oh!

dongjoon-hyun commented Apr 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Apr 25, 2020

Uh oh!

dongjoon-hyun commented Apr 25, 2020

Uh oh!

dongjoon-hyun commented Apr 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Apr 25, 2020

Uh oh!

maropu commented Apr 25, 2020

Uh oh!

dongjoon-hyun commented Apr 25, 2020

Uh oh!

dongjoon-hyun commented Apr 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Apr 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Apr 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

dongjoon-hyun commented Apr 25, 2020 •

edited

Loading

dongjoon-hyun commented Apr 25, 2020 •

edited

Loading

dongjoon-hyun commented Apr 25, 2020 •

edited

Loading

dongjoon-hyun commented Apr 25, 2020 •

edited

Loading