-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27512][SQL] Avoid to replace ',' in CSV's decimal type inference for backward compatibility #24437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Honestly, I think that it's also okay to drop this weird previous behaviour ... but just decided to follow the intention of keeping backward compatibility since it's simple, for now. I think we should drop this |
|
cc @MaxGekk and @cloud-fan |
|
What about to support spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala Lines 54 to 55 in ef2d63b
|
|
Makes sense but I think that doesn't fully address the current issue - some people could still complain about changing this behaviour. IIUC, Let's consider that option later when we drop this behaviour later. |
| } | ||
|
|
||
| Seq("en-US", "ko-KR", "ru-RU", "de-DE").foreach(checkDecimalInfer(_, DecimalType(7, 0))) | ||
| // input like '1,0' is inferred as strings for backward compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I may not know all the context, but doesn't 1,0 mean 2 int columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field separator here is | not ,. So 1,0 cannot be considered as separate columns.
|
Test build #104804 has finished for PR 24437 at commit
|
| options.locale) | ||
|
|
||
| private val decimalParser = { | ||
| private val decimalParser = if (options.locale == Locale.US) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks slightly ugly that we handle special case of Locale.US for JSON inside of ExprUtils.getDecimalParser but for CSV outside of it. Maybe we unify the implementation, and handle the special cases outside of generic implementation ExprUtils.getDecimalParser (or process both special cases inside of it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.. we gotta do that I think .. Let's do that later when we drop the replacement thing entirely. I realised that CSV inference path alone handles it without replacement.
|
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala Line 157 in ef2d63b
|
|
Nope, for parsing itself somehow replacement was being done before and it looks being handled fine spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala Line 92 in 7a83d71
|
|
Let me get this in if you guys don't mind. This just literally restores previous behaviour as was. |
|
Thanks, @MaxGekk and @cloud-fan. Merged to master. |
What changes were proposed in this pull request?
The code below currently infers as decimal but previously it was inferred as string.
In branch-2.4, type inference path for decimal and parsing data are different.
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
Line 153 in 2a83431
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
Line 125 in c284c4e
So the code below:
produced string as its type.
In the current master, it now infers decimal as below:
It happened after #22979 because, now after this PR, we only have one way to parse decimal:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala
Line 92 in 7a83d71
After the fix:
This PR proposes to restore the previous behaviour back in
CSVInferSchema.How was this patch tested?
Manually tested and unit tests were added.