-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19218][SQL] Fix SET command to show a result correctly and in a sorted order #16579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it simpler to sort earlier? sparkSession.conf.getAll.toSeq.sorted.map { case (k, v) => Row(k, v) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review, @srowen . Sure, I'll update the PR like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you found the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, @gatorsmile . I missed your comment. The return value from sql("set -v") seems not to be safe. I think there may be some synchronization issue here. I'll create a separate PR for that.
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems OK to me, if it's just a minor cosmetic improvement to the 'help' output
|
Thank you for approval, @srowen ! |
|
LGTM pending test |
|
@gatorsmile Thank you for review and approval, too! |
|
Test build #71344 has finished for PR 16579 at commit
|
|
Test build #71342 has finished for PR 16579 at commit
|
|
Test build #71348 has finished for PR 16579 at commit
|
|
Interesting. The only failure was a new test case. |
|
Retest this please |
|
Test build #71352 has finished for PR 16579 at commit
|
|
The failure does not happen local pc, but it always happens in Jenkins. |
|
Test build #71363 has finished for PR 16579 at commit
|
|
Test build #71381 has finished for PR 16579 at commit
|
|
Test build #3535 has finished for PR 16579 at commit
|
|
Retest this please |
|
Test build #71517 has finished for PR 16579 at commit
|
|
Test build #71525 has finished for PR 16579 at commit
|
|
The root cause is not inside a newly added code. The current |
|
@dongjoon-hyun what do you think the status is here -- is it possible to fix the tests or still an unknown problem? |
|
Hi, Sorry for the delays, as you see #16624 , it's still unknown issue for the existing In fact, that is orthogonal to this PR. If we removes the following, this PR will pass. I'll try to fix that TODAY again at there. BTW, if you don't mind, I will remove that test cases for now. |
|
Yes. BTW, to do that, we need to add the test case in |
|
Please try it and then we can check whether it makes sense to do it in the same test case or not. |
|
Yep! |
|
@gatorsmile . I updated with |
|
Test build #71809 has finished for PR 16579 at commit
|
| Row(key, defaultValue, doc) | ||
| sparkSession.sessionState.conf.getAllDefinedConfs.sorted.map { | ||
| case (key, defaultValue, doc) => | ||
| Row(key, if (defaultValue == null) "<undefined>" else defaultValue, doc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us do it using a more scala way:
Row(key, Option(defaultValue).getOrElse("<undefined>"), doc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. It looks much better.
|
|
||
| val result2 = sql("SET -v").collect() | ||
| assert(result2 === result2.sortBy(_.getString(0))) | ||
| spark.sessionState.conf.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not drop the spark.test. We need to introduce a function into SQLConf for testing only
// For testing only
private[sql] def unregister(entry: ConfigEntry[_]): Unit = sqlConfEntries.synchronized {
sqlConfEntries.remove(entry.key)
}Create a separate test case; then call the following code in a finally block: SQLConf.unregister(confEntry)
Will be back after a few hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Thank you, @gatorsmile .
|
Test build #71815 has finished for PR 16579 at commit
|
|
Test build #71816 has finished for PR 16579 at commit
|
| val result = sql("SET -v").collect() | ||
| assert(result === result.sortBy(_.getString(0))) | ||
| spark.sessionState.conf.clear() | ||
| } finally { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: try ... finally seems redundant.
|
Thank you, @viirya . |
|
|
||
| try { | ||
| val result = sql("SET -v").collect() | ||
| assert(result === result.sortBy(_.getString(0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i meant that you actually don't need a try {...} finally {...} here. you don't catch anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you meant. Actually, previously, SET -v raises exceptions, so this case use try and catch. But, as you mentioned, now it's not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, IMO, it's needed to clean up spark.test if there occurs some regression for this case in the future. For the exceptions in that case, we needs that regression cause test cases failures, so catch is not used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you don't catch anything actually? so if any regression in the future, is it different with a try or not? you still see an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we need to clean up spark.test in order not to interfere the other test cases here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is failed. isn't it??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) The point is the other test cases are still running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, we are confused on terms.
- You meant the other test statements.
- I meant the other test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i meant the final Jenkins test result is failed. nvm, i think it is still useful so we can better infer which test causes the failure if we don't interfere other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I understand. Thanks. :)
|
LGTM |
|
LGTM pending test |
|
Test build #71822 has finished for PR 16579 at commit
|
|
Retest this please. |
|
The only failure is irrelevant to this PR. |
|
Test build #71821 has finished for PR 16579 at commit
|
|
Test build #71824 has finished for PR 16579 at commit
|
|
Thanks! Merging to master. |
|
Thank you, @gatorsmile , @srowen , and @viirya . |
…a sorted order
## What changes were proposed in this pull request?
This PR aims to fix the following two things.
1. `sql("SET -v").collect()` or `sql("SET -v").show()` raises the following exceptions for String configuration with default value, `null`. For the test, please see [Jenkins result](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71539/testReport/) and apache@60953bf in apache#16624 .
```
sbt.ForkMain$ForkError: java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false))
:- input[0, string, false].toString
: +- input[0, string, false]
:- input[1, string, false].toString
: +- input[1, string, false]
+- input[2, string, false].toString
+- input[2, string, false]
```
2. Currently, `SET` and `SET -v` commands show unsorted result.
We had better show a sorted result for UX. Also, this is compatible with Hive.
**BEFORE**
```
scala> sql("set").show(false)
...
|spark.driver.host |10.22.16.140 |
|spark.driver.port |63893 |
|spark.repl.class.uri |spark://10.22.16.140:63893/classes |
...
|spark.app.name |Spark shell |
|spark.driver.memory |4G |
|spark.executor.id |driver |
|spark.submit.deployMode |client |
|spark.master |local[*] |
|spark.home |/Users/dhyun/spark |
|spark.sql.catalogImplementation|hive |
|spark.app.id |local-1484333618945 |
```
**AFTER**
```
scala> sql("set").show(false)
...
|spark.app.id |local-1484333925649 |
|spark.app.name |Spark shell |
|spark.driver.host |10.22.16.140 |
|spark.driver.memory |4G |
|spark.driver.port |64994 |
|spark.executor.id |driver |
|spark.jars | |
|spark.master |local[*] |
|spark.repl.class.uri |spark://10.22.16.140:64994/classes |
|spark.sql.catalogImplementation|hive |
|spark.submit.deployMode |client |
```
## How was this patch tested?
Jenkins with a new test case.
Author: Dongjoon Hyun <[email protected]>
Closes apache#16579 from dongjoon-hyun/SPARK-19218.
…a sorted order
## What changes were proposed in this pull request?
This PR aims to fix the following two things.
1. `sql("SET -v").collect()` or `sql("SET -v").show()` raises the following exceptions for String configuration with default value, `null`. For the test, please see [Jenkins result](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71539/testReport/) and apache@60953bf in apache#16624 .
```
sbt.ForkMain$ForkError: java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false))
:- input[0, string, false].toString
: +- input[0, string, false]
:- input[1, string, false].toString
: +- input[1, string, false]
+- input[2, string, false].toString
+- input[2, string, false]
```
2. Currently, `SET` and `SET -v` commands show unsorted result.
We had better show a sorted result for UX. Also, this is compatible with Hive.
**BEFORE**
```
scala> sql("set").show(false)
...
|spark.driver.host |10.22.16.140 |
|spark.driver.port |63893 |
|spark.repl.class.uri |spark://10.22.16.140:63893/classes |
...
|spark.app.name |Spark shell |
|spark.driver.memory |4G |
|spark.executor.id |driver |
|spark.submit.deployMode |client |
|spark.master |local[*] |
|spark.home |/Users/dhyun/spark |
|spark.sql.catalogImplementation|hive |
|spark.app.id |local-1484333618945 |
```
**AFTER**
```
scala> sql("set").show(false)
...
|spark.app.id |local-1484333925649 |
|spark.app.name |Spark shell |
|spark.driver.host |10.22.16.140 |
|spark.driver.memory |4G |
|spark.driver.port |64994 |
|spark.executor.id |driver |
|spark.jars | |
|spark.master |local[*] |
|spark.repl.class.uri |spark://10.22.16.140:64994/classes |
|spark.sql.catalogImplementation|hive |
|spark.submit.deployMode |client |
```
## How was this patch tested?
Jenkins with a new test case.
Author: Dongjoon Hyun <[email protected]>
Closes apache#16579 from dongjoon-hyun/SPARK-19218.
What changes were proposed in this pull request?
This PR aims to fix the following two things.
sql("SET -v").collect()orsql("SET -v").show()raises the following exceptions for String configuration with default value,null. For the test, please see Jenkins result and 60953bf in [WIP] FixSET -vnot to raise exceptions for configs with default valuenull#16624 .SETandSET -vcommands show unsorted result.We had better show a sorted result for UX. Also, this is compatible with Hive.
BEFORE
AFTER
How was this patch tested?
Jenkins with a new test case.