[SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer #41156

Hisoka-X · 2023-05-12T16:11:54Z

What changes were proposed in this pull request?

The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), but I think it can be reproduced with other number combinations, and possibly with divide too.

Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as DECIMAL(38,10))").show(truncate=false)

This produces an answer in Spark of -110083130231976019291714061058.575920 But if I do the calculation in regular java BigDecimal I get -110083130231976019291714061058.575919

BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913");
BigDecimal r = new BigDecimal("-12.0000000000");
BigDecimal prod = l.multiply(r);
BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP);

Spark does essentially all of the same operations, but it used Decimal to do it instead of java's BigDecimal directly. Spark, by way of Decimal, will set a MathContext for the multiply operation that has a max precision of 38 and will do half up rounding. That means that the result of the multiply operation in Spark is -110083130231976019291714061058.57591950, but for the java BigDecimal code the result is -110083130231976019291714061058.57591949560000000000. Then Spark will call toPrecision to round up again. So Spark round up result twice.

This PR change the code-gen and nullSafeEval of Arithmetic. To make sure when use multiply method will set custom MathContext with precision of 39 (meaning one more scale). Then round up twice will not affect result.

Why are the changes needed?

Fix the bug Decimal multiply produce wrong answer

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala

HyukjinKwon · 2023-05-15T03:22:38Z

cc @gengliangwang FYI

revans2

This fixes the problem. I personally thought that we would just use a different math context for times in Decimal, but this does work.

Hisoka-X · 2023-05-19T06:53:40Z

cc @cloud-fan

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala

cloud-fan · 2023-08-01T13:09:26Z

Can we spend more time explaining what is the correct result? Spark follows SQL semantic, not java semantic. It will be helpful to check results in other databases.

Hisoka-X · 2023-08-01T13:38:11Z

MySQL:

MySQL round 6:

Oracle:

Oracle round 6:

Postgres:

Postgres round 6:

All are -110083130231976019291714061058.575919 not -110083130231976019291714061058.575920.

cc @cloud-fan

cloud-fan · 2023-08-01T14:06:49Z

Is it better to avoid double round-up? e.g. we can pass a MathContext of the result decimal type (with wider precision to avoid overflow) to all decimal arithmetic operations. toPrecision only checks overflow.

also cc @beliefer , does your new decimal implementation have the same bug?

Hisoka-X · 2023-08-01T14:17:18Z

Is it better to avoid double round-up? e.g. we can pass a MathContext of the result decimal type (with wider precision to avoid overflow) to all decimal arithmetic operations. toPrecision only checks overflow.

also cc @beliefer , does your new decimal implementation have the same bug?

~~Sounds reasonable. Let me implement it.~~

double round-up can not avoid, because first round-up in Java BigDecimal, we pass a wider MathContext just make sure it do not affect result( just like without round-up). Second round-up in toPrecision used to match precision we need.

to all decimal arithmetic operations

Yep, we can implement it in decimal arithmetic operations, will be suitable than now. Now I just copy method in arithmetic operations to decimalMethod.

toPrecision only checks overflow.

Seem like checks overflow are bind with change precision?

Hisoka-X · 2023-08-01T14:51:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala

+
+  override def decimalMethod(mathContextValue: GlobalValue, value1: String, value2: String):
+    String = s"Decimal.apply($value1.toJavaBigDecimal()" +
+      s".multiply($value2.toJavaBigDecimal(), $mathContextValue))"


Is it better to avoid double round-up?

@cloud-fan In fact, the current solution is very similar to what you mentioned, we pass in a wider MathContext, and then do RoundUp in toPrecision.

revans2

Looks good to me

beliefer · 2023-08-02T04:39:44Z

also cc @beliefer , does your new decimal implementation have the same bug?

Think you for the ping. The new decimal implementation about multiply overflowed.

github-actions · 2023-11-11T00:17:35Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

beliefer · 2023-11-13T01:45:06Z

cc @cloud-fan It seems #43678 is similar to this one.

github-actions · 2024-02-22T00:18:08Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added the SQL label May 12, 2023

revans2 reviewed May 12, 2023

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala Outdated Show resolved Hide resolved

revans2 approved these changes May 15, 2023

View reviewed changes

MaxGekk requested changes Jul 4, 2023

View reviewed changes

Hisoka-X added 4 commits July 5, 2023 09:21

[SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer

fca71f0

[SPARK-40129][SQL] add non codegen logic

aeaf76b

[SPARK-40129][SQL] fix code style

9fdd97b

fix code style

d7c34a0

Hisoka-X force-pushed the SPARK-40129_Decimal_multiply branch from f009445 to d7c34a0 Compare July 5, 2023 01:37

Hisoka-X requested a review from MaxGekk July 11, 2023 10:09

Merge branch 'master' into SPARK-40129_Decimal_multiply

45f692d

Hisoka-X commented Aug 1, 2023

View reviewed changes

revans2 approved these changes Aug 2, 2023

View reviewed changes

update

6db0ec5

github-actions bot added the Stale label Nov 11, 2023

github-actions bot closed this Nov 12, 2023

cloud-fan reopened this Nov 12, 2023

cloud-fan removed the Stale label Nov 12, 2023

github-actions bot added the Stale label Feb 22, 2024

github-actions bot closed this Feb 23, 2024

[SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer #41156

[SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer #41156

Uh oh!

Conversation

Hisoka-X commented May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

HyukjinKwon commented May 15, 2023

Uh oh!

revans2 left a comment

Choose a reason for hiding this comment

Uh oh!

Hisoka-X commented May 19, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented Aug 1, 2023

Uh oh!

Hisoka-X commented Aug 1, 2023

Uh oh!

cloud-fan commented Aug 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hisoka-X commented Aug 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hisoka-X Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

revans2 left a comment

Choose a reason for hiding this comment

Uh oh!

beliefer commented Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 11, 2023

Uh oh!

beliefer commented Nov 13, 2023

Uh oh!

github-actions bot commented Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Hisoka-X commented May 12, 2023 •

edited

Loading

cloud-fan commented Aug 1, 2023 •

edited

Loading

Hisoka-X commented Aug 1, 2023 •

edited

Loading

beliefer commented Aug 2, 2023 •

edited

Loading