-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11745][SQL] Enable more JSON parsing options #9724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add samplingRatio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we skipped it in the past because it had very little impact on performance, so in most cases it is better to just use 1.0... Maybe we should even deprecate that option.
|
Test build #2061 has finished for PR 9724 at commit
|
|
Test build #45972 has finished for PR 9724 at commit
|
|
Alright I've updated it. |
|
LGTM pending jenkins. |
|
Test build #45981 has finished for PR 9724 at commit
|
|
Thanks - I'm merging this in. |
This patch adds the following options to the JSON data source, for dealing with non-standard JSON files: * `allowComments` (default `false`): ignores Java/C++ style comment in JSON records * `allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names * `allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes * `allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers (e.g. 00012) To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options. Also updated documentation to explain these options. Scala  Python  Author: Reynold Xin <[email protected]> Closes #9724 from rxin/SPARK-11745. (cherry picked from commit 42de525) Signed-off-by: Reynold Xin <[email protected]>
This patch adds the following options to the JSON data source, for dealing with non-standard JSON files:
allowComments(defaultfalse): ignores Java/C++ style comment in JSON recordsallowUnquotedFieldNames(defaultfalse): allows unquoted JSON field namesallowSingleQuotes(defaulttrue): allows single quotes in addition to double quotesallowNumericLeadingZeros(defaultfalse): allows leading zeros in numbers (e.g. 00012)To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options.
Also updated documentation to explain these options.
Scala
Python