-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer #41156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
Outdated
Show resolved
Hide resolved
|
cc @gengliangwang FYI |
revans2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the problem. I personally thought that we would just use a different math context for times in Decimal, but this does work.
|
cc @cloud-fan |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
Outdated
Show resolved
Hide resolved
f009445 to
d7c34a0
Compare
|
Can we spend more time explaining what is the correct result? Spark follows SQL semantic, not java semantic. It will be helpful to check results in other databases. |
|
MySQL: All are cc @cloud-fan |
|
Is it better to avoid double round-up? e.g. we can pass a also cc @beliefer , does your new decimal implementation have the same bug? |
double round-up can not avoid, because first round-up in Java BigDecimal, we pass a wider
Yep, we can implement it in decimal arithmetic operations, will be suitable than now. Now I just copy method in arithmetic operations to
Seem like checks overflow are bind with change precision? |
|
|
||
| override def decimalMethod(mathContextValue: GlobalValue, value1: String, value2: String): | ||
| String = s"Decimal.apply($value1.toJavaBigDecimal()" + | ||
| s".multiply($value2.toJavaBigDecimal(), $mathContextValue))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to avoid double round-up?
@cloud-fan In fact, the current solution is very similar to what you mentioned, we pass in a wider MathContext, and then do RoundUp in toPrecision.
revans2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Think you for the ping. The new decimal implementation about multiply overflowed. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
cc @cloud-fan It seems #43678 is similar to this one. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |






What changes were proposed in this pull request?
The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), but I think it can be reproduced with other number combinations, and possibly with divide too.
This produces an answer in Spark of
-110083130231976019291714061058.575920But if I do the calculation in regular java BigDecimal I get-110083130231976019291714061058.575919Spark does essentially all of the same operations, but it used Decimal to do it instead of java's BigDecimal directly. Spark, by way of Decimal, will set a
MathContextfor the multiply operation that has a max precision of 38 and will do half up rounding. That means that the result of the multiply operation in Spark is-110083130231976019291714061058.57591950, but for the java BigDecimal code the result is-110083130231976019291714061058.57591949560000000000. Then Spark will calltoPrecisionto round up again. So Spark round up result twice.This PR change the code-gen and
nullSafeEvalofArithmetic. To make sure when use multiply method will set customMathContextwith precision of 39 (meaning one more scale). Then round up twice will not affect result.Why are the changes needed?
Fix the bug Decimal multiply produce wrong answer
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add new test.