-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26163][SQL] Parsing decimals from JSON using locale #23132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…g-locale # Conflicts: # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
|
Test build #99232 has finished for PR 23132 at commit
|
|
Test build #99318 has finished for PR 23132 at commit
|
|
@dongjoon-hyun @cloud-fan Please, take a look at the PR. |
docs/sql-migration-guide-upgrade.md
Outdated
|
|
||
| ## Upgrading From Spark SQL 2.4 to 3.0 | ||
|
|
||
| - In Spark version 2.4 and earlier, accepted format of decimals parsed from JSON is an optional sign ('+' or '-'), followed by a sequence of zero or more decimal digits, optionally followed by a fraction, optionally followed by an exponent. Any commas were removed from the input before parsing. Since Spark 3.0, format varies and depends on locale which can be set via JSON option `locale`. The default locale is `en-US`. To switch back to previous behavior, set `spark.sql.legacy.decimalParsing.enabled` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any performance regression with DecimalFormat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question. Do we need the DecimalFormat when locale is en-US?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any performance regression with DecimalFormat?
I haven't benchmarked the changes. Looking at JSONBenchmarks, we even don't check decimals there.
Do we need the DecimalFormat when locale is en-US?
No, we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the default value is en-US, can we skip DecimalFormat when locale is en-US? Then there is nothing changes by default, and we don't even need a legacy config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed SQL config and record in the migration guid. Also I applied DecimalFormat only for not en-US locales.
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
|
Test build #99396 has finished for PR 23132 at commit
|
|
jenkins, retest this, please |
|
LGTM, does CSV need to do the same? |
|
Test build #99417 has finished for PR 23132 at commit
|
|
@cloud-fan I did the same for CSV: #22979 |
|
thanks, merging to master! |
|
|
|
@MaxGekk, mind fixing PR description accordingly? |
@HyukjinKwon fixed |
## What changes were proposed in this pull request? In the PR, I propose using of the locale option to parse (and infer) decimals from JSON input. After the changes, `JacksonParser` converts input string to `BigDecimal` and to Spark's Decimal by using `java.text.DecimalFormat`. New behaviour can be switched off via SQL config `spark.sql.legacy.decimalParsing.enabled`. ## How was this patch tested? Added 2 tests to `JsonExpressionsSuite` for the `en-US`, `ko-KR`, `ru-RU`, `de-DE` locales: - Inferring decimal type using locale from JSON field values - Converting JSON field values to specified decimal type using the locales. Closes apache#23132 from MaxGekk/json-decimal-parsing-locale. Lead-authored-by: Maxim Gekk <[email protected]> Co-authored-by: Maxim Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose using of the locale option to parse (and infer) decimals from JSON input. After the changes,
JacksonParserconverts input string toBigDecimaland to Spark's Decimal by usingjava.text.DecimalFormat. New behavior is enabled for locales different fromen-US.How was this patch tested?
Added 2 tests to
JsonExpressionsSuitefor theen-US,ko-KR,ru-RU,de-DElocales: