[SPARK-19218][SQL] Fix SET command to show a result correctly and in a sorted order #16579

dongjoon-hyun · 2017-01-13T19:06:34Z

What changes were proposed in this pull request?

This PR aims to fix the following two things.

sql("SET -v").collect() or sql("SET -v").show() raises the following exceptions for String configuration with default value, null. For the test, please see Jenkins result and 60953bf in [WIP] Fix SET -v not to raise exceptions for configs with default value null #16624 .

sbt.ForkMain$ForkError: java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false))
:- input[0, string, false].toString
:  +- input[0, string, false]
:- input[1, string, false].toString
:  +- input[1, string, false]
+- input[2, string, false].toString
   +- input[2, string, false]

Currently, SET and SET -v commands show unsorted result.
We had better show a sorted result for UX. Also, this is compatible with Hive.

BEFORE

scala> sql("set").show(false)
...
|spark.driver.host              |10.22.16.140                                                                                                                                 |
|spark.driver.port              |63893                                                                                                                                        |
|spark.repl.class.uri           |spark://10.22.16.140:63893/classes                                                                                                           |
...
|spark.app.name                 |Spark shell                                                                                                                                  |
|spark.driver.memory            |4G                                                                                                                                           |
|spark.executor.id              |driver                                                                                                                                       |
|spark.submit.deployMode        |client                                                                                                                                       |
|spark.master                   |local[*]                                                                                                                                     |
|spark.home                     |/Users/dhyun/spark                                                                                                                           |
|spark.sql.catalogImplementation|hive                                                                                                                                         |
|spark.app.id                   |local-1484333618945                                                                                                                          |

AFTER

scala> sql("set").show(false)
...
|spark.app.id                   |local-1484333925649                                                                                                                          |
|spark.app.name                 |Spark shell                                                                                                                                  |
|spark.driver.host              |10.22.16.140                                                                                                                                 |
|spark.driver.memory            |4G                                                                                                                                           |
|spark.driver.port              |64994                                                                                                                                        |
|spark.executor.id              |driver                                                                                                                                       |
|spark.jars                     |                                                                                                                                             |
|spark.master                   |local[*]                                                                                                                                     |
|spark.repl.class.uri           |spark://10.22.16.140:64994/classes                                                                                                           |
|spark.sql.catalogImplementation|hive                                                                                                                                         |
|spark.submit.deployMode        |client                                                                                                                                       |

How was this patch tested?

Jenkins with a new test case.

srowen · 2017-01-13T19:35:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala

Is it simpler to sort earlier? sparkSession.conf.getAll.toSeq.sorted.map { case (k, v) => Row(k, v) }

Thank you for review, @srowen . Sure, I'll update the PR like that.

Have you found the reason?

Oh, @gatorsmile . I missed your comment. The return value from sql("set -v") seems not to be safe. I think there may be some synchronization issue here. I'll create a separate PR for that.

srowen

It seems OK to me, if it's just a minor cosmetic improvement to the 'help' output

dongjoon-hyun · 2017-01-13T20:32:33Z

Thank you for approval, @srowen !

gatorsmile · 2017-01-13T20:55:15Z

LGTM pending test

dongjoon-hyun · 2017-01-13T21:00:07Z

@gatorsmile Thank you for review and approval, too!

SparkQA · 2017-01-13T21:05:45Z

Test build #71344 has finished for PR 16579 at commit 332935b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-13T21:33:49Z

Test build #71342 has finished for PR 16579 at commit 052a2f4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-13T22:05:33Z

Test build #71348 has finished for PR 16579 at commit 9576d51.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-01-13T22:11:00Z

Interesting. The only failure was a new test case.

[info] - SET commands should return a list sorted by key *** FAILED *** (18 milliseconds)
[info]   java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
[info] createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false))
[info] :- input[0, string, false].toString
[info] :  +- input[0, string, false]
[info] :- input[1, string, false].toString
[info] :  +- input[1, string, false]
[info] +- input[2, string, false].toString
[info]    +- input[2, string, false]
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:303)

dongjoon-hyun · 2017-01-13T22:13:52Z

Retest this please

SparkQA · 2017-01-13T23:54:21Z

Test build #71352 has finished for PR 16579 at commit 9576d51.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-01-14T04:24:27Z

The failure does not happen local pc, but it always happens in Jenkins.
The decoding error seems to depend on the value of configuration. I'll revert to the first commit to sort by key only. It passed as Test build #71342 has finished.

SparkQA · 2017-01-14T06:08:25Z

Test build #71363 has finished for PR 16579 at commit 5237174.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-14T19:43:34Z

Test build #71381 has finished for PR 16579 at commit 337d02d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-16T23:49:14Z

Test build #3535 has finished for PR 16579 at commit 337d02d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-01-17T16:54:29Z

Retest this please

SparkQA · 2017-01-17T18:36:20Z

Test build #71517 has finished for PR 16579 at commit 337d02d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-17T20:52:35Z

Test build #71525 has finished for PR 16579 at commit f767e11.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-01-17T20:56:44Z

The root cause is not inside a newly added code.

org.apache.spark.sql.SQLQuerySuite.SET -v test	43 ms	1
org.apache.spark.sql.SQLQuerySuite.`SET -v` commands should return a list sorted by key

The current set -v implementation seems to have issue according to the error message. I'll make another PR to clarify that.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 480.0 failed 1 times, most recent failure: Lost task 0.0 in stage 480.0 (TID 1470, localhost, executor driver): java.lang.NullPointerException  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)  at org.apache.spark.scheduler.Task.run(Task.scala:114)  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)  at java.lang.Thread.run(Thread.java:745)  Driver stacktrace:

srowen · 2017-01-21T13:59:25Z

@dongjoon-hyun what do you think the status is here -- is it possible to fix the tests or still an unknown problem?

dongjoon-hyun · 2017-01-21T17:11:25Z

Hi, Sorry for the delays, as you see #16624 , it's still unknown issue for the existing SET -v.

In fact, that is orthogonal to this PR. If we removes the following, this PR will pass. I'll try to fix that TODAY again at there. BTW, if you don't mind, I will remove that test cases for now.

 +  test("SET -v test") {
 +    sql("SET -v").map(_.getString(0)).collect()
 +  }
 +
 +  test("`SET -v` commands should return a list sorted by key") {
 +    val result = sql("SET -v").map(_.getString(0)).collect()
 +    assert(result === result.sorted)
 +  }

…the existing issue.

dongjoon-hyun · 2017-01-22T19:03:09Z

Yes. BTW, to do that, we need to add the test case in SET -v unit test because it's about decoding error. Is it okay?

gatorsmile · 2017-01-22T19:04:40Z

Please try it and then we can check whether it makes sense to do it in the same test case or not.

dongjoon-hyun · 2017-01-22T19:05:04Z

Yep!

dongjoon-hyun · 2017-01-22T19:22:21Z

@gatorsmile . I updated with <undefined> and added the test case.
If you can revert the change on SetCommand.scala, this test case will fail.

SparkQA · 2017-01-22T21:38:59Z

Test build #71809 has finished for PR 16579 at commit 0214fad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-23T00:29:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala

-          Row(key, defaultValue, doc)
+        sparkSession.sessionState.conf.getAllDefinedConfs.sorted.map {
+          case (key, defaultValue, doc) =>
+            Row(key, if (defaultValue == null) "<undefined>" else defaultValue, doc)


Let us do it using a more scala way:

Row(key, Option(defaultValue).getOrElse("<undefined>"), doc)

Yep. It looks much better.

gatorsmile · 2017-01-23T00:47:42Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+
+    val result2 = sql("SET -v").collect()
+    assert(result2 === result2.sortBy(_.getString(0)))
+    spark.sessionState.conf.clear()


This will not drop the spark.test. We need to introduce a function into SQLConf for testing only

// For testing only private[sql] def unregister(entry: ConfigEntry[_]): Unit = sqlConfEntries.synchronized { sqlConfEntries.remove(entry.key) }

Create a separate test case; then call the following code in a finally block: SQLConf.unregister(confEntry)

Will be back after a few hours.

+1. Thank you, @gatorsmile .

SparkQA · 2017-01-23T02:52:17Z

Test build #71815 has finished for PR 16579 at commit 528b0fd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-23T03:22:03Z

Test build #71816 has finished for PR 16579 at commit 387ab59.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-01-23T03:38:36Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+      val result = sql("SET -v").collect()
+      assert(result === result.sortBy(_.getString(0)))
+      spark.sessionState.conf.clear()
+    } finally {


nit: try ... finally seems redundant.

dongjoon-hyun · 2017-01-23T03:45:29Z

Thank you, @viirya .
I noticed that spark.sessionState.conf.clear() is useless. I removed that.

viirya · 2017-01-23T03:49:11Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+
+    try {
+      val result = sql("SET -v").collect()
+      assert(result === result.sortBy(_.getString(0)))


oh, i meant that you actually don't need a try {...} finally {...} here. you don't catch anything.

Ah, I see what you meant. Actually, previously, SET -v raises exceptions, so this case use try and catch. But, as you mentioned, now it's not.

However, IMO, it's needed to clean up spark.test if there occurs some regression for this case in the future. For the exceptions in that case, we needs that regression cause test cases failures, so catch is not used here.

but you don't catch anything actually? so if any regression in the future, is it different with a try or not? you still see an exception.

Yes, but we need to clean up spark.test in order not to interfere the other test cases here.

it is failed. isn't it??

:) The point is the other test cases are still running.

Maybe, we are confused on terms.

You meant the other test statements.

I meant the other test cases

oh, i meant the final Jenkins test result is failed. nvm, i think it is still useful so we can better infer which test causes the failure if we don't interfere other tests.

Oh, I understand. Thanks. :)

viirya · 2017-01-23T04:26:03Z

LGTM

gatorsmile · 2017-01-23T04:46:21Z

LGTM pending test

SparkQA · 2017-01-23T05:21:17Z

Test build #71822 has finished for PR 16579 at commit 7879201.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-01-23T05:29:08Z

Retest this please.

dongjoon-hyun · 2017-01-23T05:30:41Z

The only failure is irrelevant to this PR.

[info] - set spark.sql.warehouse.dir *** FAILED *** (5 minutes, 0 seconds)
[info]   Timeout of './bin/spark-submit' '--class' 'org.apache.spark.sql.hive.SetWarehouseLocationTest' '--name' 'SetSparkWarehouseLocationTest' '--master' 'local-cluster[2,1,1024]' '--conf'

SparkQA · 2017-01-23T06:00:02Z

Test build #71821 has finished for PR 16579 at commit 7061cd9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-23T07:50:51Z

Test build #71824 has finished for PR 16579 at commit 7879201.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-23T09:22:09Z

Thanks! Merging to master.

dongjoon-hyun · 2017-01-23T17:12:53Z

Thank you, @gatorsmile , @srowen , and @viirya .

…a sorted order ## What changes were proposed in this pull request? This PR aims to fix the following two things. 1. `sql("SET -v").collect()` or `sql("SET -v").show()` raises the following exceptions for String configuration with default value, `null`. For the test, please see [Jenkins result](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71539/testReport/) and apache@60953bf in apache#16624 . ``` sbt.ForkMain$ForkError: java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false)) :- input[0, string, false].toString : +- input[0, string, false] :- input[1, string, false].toString : +- input[1, string, false] +- input[2, string, false].toString +- input[2, string, false] ``` 2. Currently, `SET` and `SET -v` commands show unsorted result. We had better show a sorted result for UX. Also, this is compatible with Hive. **BEFORE** ``` scala> sql("set").show(false) ... |spark.driver.host |10.22.16.140 | |spark.driver.port |63893 | |spark.repl.class.uri |spark://10.22.16.140:63893/classes | ... |spark.app.name |Spark shell | |spark.driver.memory |4G | |spark.executor.id |driver | |spark.submit.deployMode |client | |spark.master |local[*] | |spark.home |/Users/dhyun/spark | |spark.sql.catalogImplementation|hive | |spark.app.id |local-1484333618945 | ``` **AFTER** ``` scala> sql("set").show(false) ... |spark.app.id |local-1484333925649 | |spark.app.name |Spark shell | |spark.driver.host |10.22.16.140 | |spark.driver.memory |4G | |spark.driver.port |64994 | |spark.executor.id |driver | |spark.jars | | |spark.master |local[*] | |spark.repl.class.uri |spark://10.22.16.140:64994/classes | |spark.sql.catalogImplementation|hive | |spark.submit.deployMode |client | ``` ## How was this patch tested? Jenkins with a new test case. Author: Dongjoon Hyun <[email protected]> Closes apache#16579 from dongjoon-hyun/SPARK-19218.

dongjoon-hyun mentioned this pull request Jan 13, 2017

[SPARK-14819] [SQL] Improve SET / SET -v command #12583

Closed

srowen reviewed Jan 13, 2017

View reviewed changes

srowen approved these changes Jan 13, 2017

View reviewed changes

gatorsmile approved these changes Jan 13, 2017

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-19218][SQL] SET command should show a result sorted by key~~ [WIP][SPARK-19218][SQL] SET command should show a result sorted by key Jan 14, 2017

dongjoon-hyun changed the title ~~[WIP][SPARK-19218][SQL] SET command should show a result sorted by key~~ [SPARK-19218][SQL] SET command should show a result sorted by key Jan 17, 2017

dongjoon-hyun changed the title ~~[SPARK-19218][SQL] SET command should show a result sorted by key~~ [WIP][SPARK-19218][SQL] SET command should show a result sorted by key Jan 17, 2017

dongjoon-hyun mentioned this pull request Jan 17, 2017

[WIP] Fix SET -v not to raise exceptions for configs with default value null #16624

Closed

dongjoon-hyun added 3 commits January 21, 2017 11:02

[SPARK-19218][SQL] SET command should show a sorted result

c182c7d

Address comments.

2ed6d87

Revert to the approved commit and remove set -v test case to avoid …

fa4914b

…the existing issue.

Address comments

0214fad

gatorsmile reviewed Jan 23, 2017

View reviewed changes

Use Option.

528b0fd

gatorsmile reviewed Jan 23, 2017

View reviewed changes

Add unregister and use it.

387ab59

viirya reviewed Jan 23, 2017

View reviewed changes

Remove redundant one.

7061cd9

Add a missing clear() while splitting the testcases

7879201

viirya reviewed Jan 23, 2017

View reviewed changes

asfgit closed this in c4a6519 Jan 23, 2017

dongjoon-hyun deleted the SPARK-19218 branch January 7, 2019 07:03

[SPARK-19218][SQL] Fix SET command to show a result correctly and in a sorted order #16579

[SPARK-19218][SQL] Fix SET command to show a result correctly and in a sorted order #16579

Uh oh!

Conversation

dongjoon-hyun commented Jan 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 13, 2017

Uh oh!

gatorsmile commented Jan 13, 2017

Uh oh!

dongjoon-hyun commented Jan 13, 2017

Uh oh!

SparkQA commented Jan 13, 2017

Uh oh!

SparkQA commented Jan 13, 2017

Uh oh!

SparkQA commented Jan 13, 2017

Uh oh!

dongjoon-hyun commented Jan 13, 2017

Uh oh!

dongjoon-hyun commented Jan 13, 2017

Uh oh!

SparkQA commented Jan 13, 2017

Uh oh!

dongjoon-hyun commented Jan 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 14, 2017

Uh oh!

SparkQA commented Jan 14, 2017

Uh oh!

SparkQA commented Jan 16, 2017

Uh oh!

dongjoon-hyun commented Jan 17, 2017

Uh oh!

SparkQA commented Jan 17, 2017

Uh oh!

SparkQA commented Jan 17, 2017

Uh oh!

dongjoon-hyun commented Jan 17, 2017

Uh oh!

srowen commented Jan 21, 2017

Uh oh!

dongjoon-hyun commented Jan 21, 2017

Uh oh!

dongjoon-hyun commented Jan 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Jan 22, 2017

Uh oh!

dongjoon-hyun commented Jan 22, 2017

Uh oh!

dongjoon-hyun commented Jan 22, 2017

Uh oh!

SparkQA commented Jan 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 13, 2017 •

edited

Loading

dongjoon-hyun commented Jan 14, 2017 •

edited

Loading

dongjoon-hyun commented Jan 22, 2017 •

edited

Loading

gatorsmile Jan 23, 2017 •

edited

Loading

viirya Jan 23, 2017 •

edited

Loading

dongjoon-hyun Jan 23, 2017 •

edited

Loading

dongjoon-hyun Jan 23, 2017 •

edited

Loading

dongjoon-hyun Jan 23, 2017 •

edited

Loading