Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 13, 2017

What changes were proposed in this pull request?

This PR aims to fix the following two things.

  1. sql("SET -v").collect() or sql("SET -v").show() raises the following exceptions for String configuration with default value, null. For the test, please see Jenkins result and 60953bf in [WIP] Fix SET -v not to raise exceptions for configs with default value null #16624 .
sbt.ForkMain$ForkError: java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false))
:- input[0, string, false].toString
:  +- input[0, string, false]
:- input[1, string, false].toString
:  +- input[1, string, false]
+- input[2, string, false].toString
   +- input[2, string, false]
  1. Currently, SET and SET -v commands show unsorted result.
    We had better show a sorted result for UX. Also, this is compatible with Hive.

BEFORE

scala> sql("set").show(false)
...
|spark.driver.host              |10.22.16.140                                                                                                                                 |
|spark.driver.port              |63893                                                                                                                                        |
|spark.repl.class.uri           |spark://10.22.16.140:63893/classes                                                                                                           |
...
|spark.app.name                 |Spark shell                                                                                                                                  |
|spark.driver.memory            |4G                                                                                                                                           |
|spark.executor.id              |driver                                                                                                                                       |
|spark.submit.deployMode        |client                                                                                                                                       |
|spark.master                   |local[*]                                                                                                                                     |
|spark.home                     |/Users/dhyun/spark                                                                                                                           |
|spark.sql.catalogImplementation|hive                                                                                                                                         |
|spark.app.id                   |local-1484333618945                                                                                                                          |

AFTER

scala> sql("set").show(false)
...
|spark.app.id                   |local-1484333925649                                                                                                                          |
|spark.app.name                 |Spark shell                                                                                                                                  |
|spark.driver.host              |10.22.16.140                                                                                                                                 |
|spark.driver.memory            |4G                                                                                                                                           |
|spark.driver.port              |64994                                                                                                                                        |
|spark.executor.id              |driver                                                                                                                                       |
|spark.jars                     |                                                                                                                                             |
|spark.master                   |local[*]                                                                                                                                     |
|spark.repl.class.uri           |spark://10.22.16.140:64994/classes                                                                                                           |
|spark.sql.catalogImplementation|hive                                                                                                                                         |
|spark.submit.deployMode        |client                                                                                                                                       |

How was this patch tested?

Jenkins with a new test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it simpler to sort earlier? sparkSession.conf.getAll.toSeq.sorted.map { case (k, v) => Row(k, v) }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @srowen . Sure, I'll update the PR like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you found the reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, @gatorsmile . I missed your comment. The return value from sql("set -v") seems not to be safe. I think there may be some synchronization issue here. I'll create a separate PR for that.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems OK to me, if it's just a minor cosmetic improvement to the 'help' output

@dongjoon-hyun
Copy link
Member Author

Thank you for approval, @srowen !

@gatorsmile
Copy link
Member

LGTM pending test

@dongjoon-hyun
Copy link
Member Author

@gatorsmile Thank you for review and approval, too!

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71344 has finished for PR 16579 at commit 332935b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71342 has finished for PR 16579 at commit 052a2f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71348 has finished for PR 16579 at commit 9576d51.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Interesting. The only failure was a new test case.

[info] - SET commands should return a list sorted by key *** FAILED *** (18 milliseconds)
[info]   java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
[info] createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false))
[info] :- input[0, string, false].toString
[info] :  +- input[0, string, false]
[info] :- input[1, string, false].toString
[info] :  +- input[1, string, false]
[info] +- input[2, string, false].toString
[info]    +- input[2, string, false]
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:303)

@dongjoon-hyun
Copy link
Member Author

Retest this please

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71352 has finished for PR 16579 at commit 9576d51.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jan 14, 2017

The failure does not happen local pc, but it always happens in Jenkins.
The decoding error seems to depend on the value of configuration. I'll revert to the first commit to sort by key only. It passed as Test build #71342 has finished.

@SparkQA
Copy link

SparkQA commented Jan 14, 2017

Test build #71363 has finished for PR 16579 at commit 5237174.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-19218][SQL] SET command should show a result sorted by key [WIP][SPARK-19218][SQL] SET command should show a result sorted by key Jan 14, 2017
@SparkQA
Copy link

SparkQA commented Jan 14, 2017

Test build #71381 has finished for PR 16579 at commit 337d02d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2017

Test build #3535 has finished for PR 16579 at commit 337d02d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-19218][SQL] SET command should show a result sorted by key [SPARK-19218][SQL] SET command should show a result sorted by key Jan 17, 2017
@dongjoon-hyun
Copy link
Member Author

Retest this please

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71517 has finished for PR 16579 at commit 337d02d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-19218][SQL] SET command should show a result sorted by key [WIP][SPARK-19218][SQL] SET command should show a result sorted by key Jan 17, 2017
@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71525 has finished for PR 16579 at commit f767e11.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The root cause is not inside a newly added code.

org.apache.spark.sql.SQLQuerySuite.SET -v test	43 ms	1
org.apache.spark.sql.SQLQuerySuite.`SET -v` commands should return a list sorted by key

The current set -v implementation seems to have issue according to the error message. I'll make another PR to clarify that.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 480.0 failed 1 times, most recent failure: Lost task 0.0 in stage 480.0 (TID 1470, localhost, executor driver): java.lang.NullPointerException  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)  at org.apache.spark.scheduler.Task.run(Task.scala:114)  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)  at java.lang.Thread.run(Thread.java:745)  Driver stacktrace:

@srowen
Copy link
Member

srowen commented Jan 21, 2017

@dongjoon-hyun what do you think the status is here -- is it possible to fix the tests or still an unknown problem?

@dongjoon-hyun
Copy link
Member Author

Hi, Sorry for the delays, as you see #16624 , it's still unknown issue for the existing SET -v.

In fact, that is orthogonal to this PR. If we removes the following, this PR will pass. I'll try to fix that TODAY again at there. BTW, if you don't mind, I will remove that test cases for now.

 +  test("SET -v test") {
 +    sql("SET -v").map(_.getString(0)).collect()
 +  }
 +
 +  test("`SET -v` commands should return a list sorted by key") {
 +    val result = sql("SET -v").map(_.getString(0)).collect()
 +    assert(result === result.sorted)
 +  }

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jan 22, 2017

Yes. BTW, to do that, we need to add the test case in SET -v unit test because it's about decoding error. Is it okay?

@gatorsmile
Copy link
Member

Please try it and then we can check whether it makes sense to do it in the same test case or not.

@dongjoon-hyun
Copy link
Member Author

Yep!

@dongjoon-hyun
Copy link
Member Author

@gatorsmile . I updated with <undefined> and added the test case.
If you can revert the change on SetCommand.scala, this test case will fail.

@SparkQA
Copy link

SparkQA commented Jan 22, 2017

Test build #71809 has finished for PR 16579 at commit 0214fad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Row(key, defaultValue, doc)
sparkSession.sessionState.conf.getAllDefinedConfs.sorted.map {
case (key, defaultValue, doc) =>
Row(key, if (defaultValue == null) "<undefined>" else defaultValue, doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us do it using a more scala way:

            Row(key, Option(defaultValue).getOrElse("<undefined>"), doc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. It looks much better.


val result2 = sql("SET -v").collect()
assert(result2 === result2.sortBy(_.getString(0)))
spark.sessionState.conf.clear()
Copy link
Member

@gatorsmile gatorsmile Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not drop the spark.test. We need to introduce a function into SQLConf for testing only

  // For testing only
  private[sql] def unregister(entry: ConfigEntry[_]): Unit = sqlConfEntries.synchronized {
    sqlConfEntries.remove(entry.key)
  }

Create a separate test case; then call the following code in a finally block: SQLConf.unregister(confEntry)

Will be back after a few hours.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thank you, @gatorsmile .

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71815 has finished for PR 16579 at commit 528b0fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71816 has finished for PR 16579 at commit 387ab59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val result = sql("SET -v").collect()
assert(result === result.sortBy(_.getString(0)))
spark.sessionState.conf.clear()
} finally {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: try ... finally seems redundant.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya .
I noticed that spark.sessionState.conf.clear() is useless. I removed that.


try {
val result = sql("SET -v").collect()
assert(result === result.sortBy(_.getString(0)))
Copy link
Member

@viirya viirya Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i meant that you actually don't need a try {...} finally {...} here. you don't catch anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you meant. Actually, previously, SET -v raises exceptions, so this case use try and catch. But, as you mentioned, now it's not.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, IMO, it's needed to clean up spark.test if there occurs some regression for this case in the future. For the exceptions in that case, we needs that regression cause test cases failures, so catch is not used here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you don't catch anything actually? so if any regression in the future, is it different with a try or not? you still see an exception.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we need to clean up spark.test in order not to interfere the other test cases here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is failed. isn't it??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) The point is the other test cases are still running.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we are confused on terms.

  • You meant the other test statements.
  • I meant the other test cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i meant the final Jenkins test result is failed. nvm, i think it is still useful so we can better infer which test causes the failure if we don't interfere other tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I understand. Thanks. :)

@viirya
Copy link
Member

viirya commented Jan 23, 2017

LGTM

@gatorsmile
Copy link
Member

LGTM pending test

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71822 has finished for PR 16579 at commit 7879201.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@dongjoon-hyun
Copy link
Member Author

The only failure is irrelevant to this PR.

[info] - set spark.sql.warehouse.dir *** FAILED *** (5 minutes, 0 seconds)
[info]   Timeout of './bin/spark-submit' '--class' 'org.apache.spark.sql.hive.SetWarehouseLocationTest' '--name' 'SetSparkWarehouseLocationTest' '--master' 'local-cluster[2,1,1024]' '--conf' 

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71821 has finished for PR 16579 at commit 7061cd9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71824 has finished for PR 16579 at commit 7879201.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in c4a6519 Jan 23, 2017
@dongjoon-hyun
Copy link
Member Author

Thank you, @gatorsmile , @srowen , and @viirya .

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…a sorted order

## What changes were proposed in this pull request?

This PR aims to fix the following two things.

1. `sql("SET -v").collect()` or `sql("SET -v").show()` raises the following exceptions for String configuration with default value, `null`. For the test, please see [Jenkins result](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71539/testReport/) and apache@60953bf in apache#16624 .

```
sbt.ForkMain$ForkError: java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false))
:- input[0, string, false].toString
:  +- input[0, string, false]
:- input[1, string, false].toString
:  +- input[1, string, false]
+- input[2, string, false].toString
   +- input[2, string, false]
```

2. Currently, `SET` and `SET -v` commands show unsorted result.
    We had better show a sorted result for UX. Also, this is compatible with Hive.

**BEFORE**
```
scala> sql("set").show(false)
...
|spark.driver.host              |10.22.16.140                                                                                                                                 |
|spark.driver.port              |63893                                                                                                                                        |
|spark.repl.class.uri           |spark://10.22.16.140:63893/classes                                                                                                           |
...
|spark.app.name                 |Spark shell                                                                                                                                  |
|spark.driver.memory            |4G                                                                                                                                           |
|spark.executor.id              |driver                                                                                                                                       |
|spark.submit.deployMode        |client                                                                                                                                       |
|spark.master                   |local[*]                                                                                                                                     |
|spark.home                     |/Users/dhyun/spark                                                                                                                           |
|spark.sql.catalogImplementation|hive                                                                                                                                         |
|spark.app.id                   |local-1484333618945                                                                                                                          |
```

**AFTER**

```
scala> sql("set").show(false)
...
|spark.app.id                   |local-1484333925649                                                                                                                          |
|spark.app.name                 |Spark shell                                                                                                                                  |
|spark.driver.host              |10.22.16.140                                                                                                                                 |
|spark.driver.memory            |4G                                                                                                                                           |
|spark.driver.port              |64994                                                                                                                                        |
|spark.executor.id              |driver                                                                                                                                       |
|spark.jars                     |                                                                                                                                             |
|spark.master                   |local[*]                                                                                                                                     |
|spark.repl.class.uri           |spark://10.22.16.140:64994/classes                                                                                                           |
|spark.sql.catalogImplementation|hive                                                                                                                                         |
|spark.submit.deployMode        |client                                                                                                                                       |
```

## How was this patch tested?

Jenkins with a new test case.

Author: Dongjoon Hyun <[email protected]>

Closes apache#16579 from dongjoon-hyun/SPARK-19218.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…a sorted order

## What changes were proposed in this pull request?

This PR aims to fix the following two things.

1. `sql("SET -v").collect()` or `sql("SET -v").show()` raises the following exceptions for String configuration with default value, `null`. For the test, please see [Jenkins result](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71539/testReport/) and apache@60953bf in apache#16624 .

```
sbt.ForkMain$ForkError: java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
createexternalrow(input[0, string, false].toString, input[1, string, false].toString, input[2, string, false].toString, StructField(key,StringType,false), StructField(value,StringType,false), StructField(meaning,StringType,false))
:- input[0, string, false].toString
:  +- input[0, string, false]
:- input[1, string, false].toString
:  +- input[1, string, false]
+- input[2, string, false].toString
   +- input[2, string, false]
```

2. Currently, `SET` and `SET -v` commands show unsorted result.
    We had better show a sorted result for UX. Also, this is compatible with Hive.

**BEFORE**
```
scala> sql("set").show(false)
...
|spark.driver.host              |10.22.16.140                                                                                                                                 |
|spark.driver.port              |63893                                                                                                                                        |
|spark.repl.class.uri           |spark://10.22.16.140:63893/classes                                                                                                           |
...
|spark.app.name                 |Spark shell                                                                                                                                  |
|spark.driver.memory            |4G                                                                                                                                           |
|spark.executor.id              |driver                                                                                                                                       |
|spark.submit.deployMode        |client                                                                                                                                       |
|spark.master                   |local[*]                                                                                                                                     |
|spark.home                     |/Users/dhyun/spark                                                                                                                           |
|spark.sql.catalogImplementation|hive                                                                                                                                         |
|spark.app.id                   |local-1484333618945                                                                                                                          |
```

**AFTER**

```
scala> sql("set").show(false)
...
|spark.app.id                   |local-1484333925649                                                                                                                          |
|spark.app.name                 |Spark shell                                                                                                                                  |
|spark.driver.host              |10.22.16.140                                                                                                                                 |
|spark.driver.memory            |4G                                                                                                                                           |
|spark.driver.port              |64994                                                                                                                                        |
|spark.executor.id              |driver                                                                                                                                       |
|spark.jars                     |                                                                                                                                             |
|spark.master                   |local[*]                                                                                                                                     |
|spark.repl.class.uri           |spark://10.22.16.140:64994/classes                                                                                                           |
|spark.sql.catalogImplementation|hive                                                                                                                                         |
|spark.submit.deployMode        |client                                                                                                                                       |
```

## How was this patch tested?

Jenkins with a new test case.

Author: Dongjoon Hyun <[email protected]>

Closes apache#16579 from dongjoon-hyun/SPARK-19218.
@dongjoon-hyun dongjoon-hyun deleted the SPARK-19218 branch January 7, 2019 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants