Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Feb 10, 2017

What changes were proposed in this pull request?

Using from_json on a column with an empty string results in: java.util.NoSuchElementException: head of empty list.

This is because parser.parse(input) may return Nil when input.trim.isEmpty

How was this patch tested?

Regression test in JsonExpressionsSuite

@brkyvz
Copy link
Contributor Author

brkyvz commented Feb 10, 2017

cc @hvanhovell mind taking a look? You looked at the last PR that touched this code


override def nullSafeEval(json: Any): Any = {
try parser.parse(json.toString).head catch {
try parser.parse(json.toString).headOption.orNull catch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not for this PR but maybe loosely related I guess) I was thinking it is a bit odd that we support to only read the single row when it is a json array. It seems, for example,

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val schema = StructType(StructField("a", IntegerType) :: Nil)
Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("struct").select(from_json(col("struct"), schema)).show()
+--------------------+
|jsontostruct(struct)|
+--------------------+
|                 [1]|
+--------------------+

I think maybe we should not support this in that function or it should work like a generator expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon that seems fair. Feel free to work in this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for confirming.

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72681 has finished for PR 16881 at commit 6ef0f45.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - merging to master/2.1. Thanks!

asfgit pushed a commit that referenced this pull request Feb 10, 2017
## What changes were proposed in this pull request?

Using from_json on a column with an empty string results in: java.util.NoSuchElementException: head of empty list.

This is because `parser.parse(input)` may return `Nil` when `input.trim.isEmpty`

## How was this patch tested?

Regression test in `JsonExpressionsSuite`

Author: Burak Yavuz <[email protected]>

Closes #16881 from brkyvz/json-fix.

(cherry picked from commit d5593f7)
Signed-off-by: Herman van Hovell <[email protected]>
@asfgit asfgit closed this in d5593f7 Feb 10, 2017
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
## What changes were proposed in this pull request?

Using from_json on a column with an empty string results in: java.util.NoSuchElementException: head of empty list.

This is because `parser.parse(input)` may return `Nil` when `input.trim.isEmpty`

## How was this patch tested?

Regression test in `JsonExpressionsSuite`

Author: Burak Yavuz <[email protected]>

Closes apache#16881 from brkyvz/json-fix.
@yeikel
Copy link

yeikel commented Jan 3, 2019

Is there any way to use this fix without re-installing spark? Or is there any workaround?

In my organization they are running a version of Spark where this error is happening but there are no plans to upgrade anytime soon.

@HyukjinKwon
Copy link
Member

You can manually replace the empty strings to, for instance, empty object {} by dataframe APIs before call from_json.

@yeikel
Copy link

yeikel commented Jan 3, 2019

@HyukjinKwon I've tried your suggestion with the following attempt :

val empty = df.na.fill("{}",Seq("phone"))
val json_columns = df.withColumn("phone", from_json($"phone", schema))
json_columns.show

But it still gave me the same error. Any suggestions?

Thank you

@HyukjinKwon
Copy link
Member

Can you show your df? Btw let's ask a question to mailing list next time. Here usually is supposed to discuss about the change itself.

@yeikel
Copy link

yeikel commented Jan 3, 2019

@HyukjinKwon I fixed it. Problem was the string '[]' replacing it with '{}' fixed the issue . I am not sure if the function should be able to parse '[]' tho.

I commented here because I thought devs could give a better input, but I will post in the email list next time.

Thank you

@brkyvz brkyvz deleted the json-fix branch February 3, 2019 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants