-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25243][SQL] Use FailureSafeParser in from_json #22237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c2e7078
fe2baa4
cecc8f5
75bdb03
7a8804d
01b63f1
b1894d2
b76b8d3
104ee44
a87785a
c3091b3
55be20b
ce49b24
57eb59f
20b7522
63b8b66
a5489f5
9904903
bda3a4e
fa20fd2
2663696
b84b343
e22b974
4157141
54be09c
3f04f7f
b2988c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -554,54 +554,44 @@ case class JsonToStructs( | |||
| @transient | ||||
| lazy val converter = nullableSchema match { | ||||
| case _: StructType => | ||||
| (rows: Seq[InternalRow]) => if (rows.length == 1) rows.head else null | ||||
| (rows: Iterator[InternalRow]) => if (rows.hasNext) rows.next() else null | ||||
|
||||
| case null => Nil |
We can throw BadRecordException instead of Nil but this will change behavior of JSON/CSV datasources, so they will return rows with nulls for empty lines (in PERMISSIVE mode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have to do it in this PR, but it would be great to document when this expression will return null, in the class doc of JsonToStructs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and with some tests to verify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have to do it in this PR, but it would be great to document when this expression will return null ...
We already state in the docs for from_json(): Returns null, in the case of an unparseable string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now the case is more concrete, we return null if Jackson parser doesn't find any token in the input. Not sure, this detailed info about underlying problem can help users much more.
Outdated
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean by introducing new val failFast to JSONOptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, instead of the "mode" option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSONOptions is shared among build-in json functions like from_json and JSON datasource. And the formal one use 3 modes - FAILFAST, DROPMALFORMED and PERMISSIVE. I am not sure how the mode mode can be replaced. The approach that I could image is to inherit from JSONOptions and add new val. The mode itself cannot be removed because it is used in FailureSafeParser for example, in particular DropMalformedMode is handled explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see. If the mode option already exist, let's keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we made this change?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this particular line or in general?
This line was changed because in the
PERMISSIVEmode we usually return aRowwith null fields that we wasn't able to parse instead of justnullfor whole row.In general, to support the
PERMISSIVEandFAILFASTmodes as for JSON datasource. Before the changesfrom_jsondidn't support any modes and thecolumnNameOfCorruptRecordoption in particular.