Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Nov 16, 2015

This patch adds the following options to the JSON data source, for dealing with non-standard JSON files:

  • allowComments (default false): ignores Java/C++ style comment in JSON records
  • allowUnquotedFieldNames (default false): allows unquoted JSON field names
  • allowSingleQuotes (default true): allows single quotes in addition to double quotes
  • allowNumericLeadingZeros (default false): allows leading zeros in numbers (e.g. 00012)

To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options.

Also updated documentation to explain these options.

Scala

screen shot 2015-11-15 at 6 12 12 pm

Python

screen shot 2015-11-15 at 6 11 28 pm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now unused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add samplingRatio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we skipped it in the past because it had very little impact on performance, so in most cases it is better to just use 1.0... Maybe we should even deprecate that option.

@SparkQA
Copy link

SparkQA commented Nov 16, 2015

Test build #2061 has finished for PR 9724 at commit 00cfc19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2015

Test build #45972 has finished for PR 9724 at commit 00cfc19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Nov 16, 2015

Alright I've updated it.

@yhuai
Copy link
Contributor

yhuai commented Nov 16, 2015

LGTM pending jenkins.

@SparkQA
Copy link

SparkQA commented Nov 16, 2015

Test build #45981 has finished for PR 9724 at commit d8ca56d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class JSONOptions(\n

@rxin
Copy link
Contributor Author

rxin commented Nov 16, 2015

Thanks - I'm merging this in.

asfgit pushed a commit that referenced this pull request Nov 16, 2015
This patch adds the following options to the JSON data source, for dealing with non-standard JSON files:
* `allowComments` (default `false`): ignores Java/C++ style comment in JSON records
* `allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names
* `allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes
* `allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers (e.g. 00012)

To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options.

Also updated documentation to explain these options.

Scala

![screen shot 2015-11-15 at 6 12 12 pm](https://cloud.githubusercontent.com/assets/323388/11172965/e3ace6ec-8bc4-11e5-805e-2d78f80d0ed6.png)

Python

![screen shot 2015-11-15 at 6 11 28 pm](https://cloud.githubusercontent.com/assets/323388/11172964/e23ed6ee-8bc4-11e5-8216-312f5983acd5.png)

Author: Reynold Xin <[email protected]>

Closes #9724 from rxin/SPARK-11745.

(cherry picked from commit 42de525)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 42de525 Nov 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants