-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19543] from_json fails when the input row is empty #16881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @hvanhovell mind taking a look? You looked at the last PR that touched this code |
|
|
||
| override def nullSafeEval(json: Any): Any = { | ||
| try parser.parse(json.toString).head catch { | ||
| try parser.parse(json.toString).headOption.orNull catch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not for this PR but maybe loosely related I guess) I was thinking it is a bit odd that we support to only read the single row when it is a json array. It seems, for example,
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val schema = StructType(StructField("a", IntegerType) :: Nil)
Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("struct").select(from_json(col("struct"), schema)).show()
+--------------------+
|jsontostruct(struct)|
+--------------------+
| [1]|
+--------------------+I think maybe we should not support this in that function or it should work like a generator expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon that seems fair. Feel free to work in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for confirming.
|
Test build #72681 has finished for PR 16881 at commit
|
|
LGTM - merging to master/2.1. Thanks! |
## What changes were proposed in this pull request? Using from_json on a column with an empty string results in: java.util.NoSuchElementException: head of empty list. This is because `parser.parse(input)` may return `Nil` when `input.trim.isEmpty` ## How was this patch tested? Regression test in `JsonExpressionsSuite` Author: Burak Yavuz <[email protected]> Closes #16881 from brkyvz/json-fix. (cherry picked from commit d5593f7) Signed-off-by: Herman van Hovell <[email protected]>
## What changes were proposed in this pull request? Using from_json on a column with an empty string results in: java.util.NoSuchElementException: head of empty list. This is because `parser.parse(input)` may return `Nil` when `input.trim.isEmpty` ## How was this patch tested? Regression test in `JsonExpressionsSuite` Author: Burak Yavuz <[email protected]> Closes apache#16881 from brkyvz/json-fix.
|
Is there any way to use this fix without re-installing spark? Or is there any workaround? In my organization they are running a version of Spark where this error is happening but there are no plans to upgrade anytime soon. |
|
You can manually replace the empty strings to, for instance, empty object |
|
@HyukjinKwon I've tried your suggestion with the following attempt : But it still gave me the same error. Any suggestions? Thank you |
|
Can you show your df? Btw let's ask a question to mailing list next time. Here usually is supposed to discuss about the change itself. |
|
@HyukjinKwon I fixed it. Problem was the string I commented here because I thought devs could give a better input, but I will post in the email list next time. Thank you |
What changes were proposed in this pull request?
Using from_json on a column with an empty string results in: java.util.NoSuchElementException: head of empty list.
This is because
parser.parse(input)may returnNilwheninput.trim.isEmptyHow was this patch tested?
Regression test in
JsonExpressionsSuite