Skip to content

Conversation

@Hisoka-X
Copy link
Member

@Hisoka-X Hisoka-X commented May 12, 2023

What changes were proposed in this pull request?

The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), but I think it can be reproduced with other number combinations, and possibly with divide too.

Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as DECIMAL(38,10))").show(truncate=false)

This produces an answer in Spark of -110083130231976019291714061058.575920 But if I do the calculation in regular java BigDecimal I get -110083130231976019291714061058.575919

BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913");
BigDecimal r = new BigDecimal("-12.0000000000");
BigDecimal prod = l.multiply(r);
BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP);

Spark does essentially all of the same operations, but it used Decimal to do it instead of java's BigDecimal directly. Spark, by way of Decimal, will set a MathContext for the multiply operation that has a max precision of 38 and will do half up rounding. That means that the result of the multiply operation in Spark is -110083130231976019291714061058.57591950, but for the java BigDecimal code the result is -110083130231976019291714061058.57591949560000000000. Then Spark will call toPrecision to round up again. So Spark round up result twice.

This PR change the code-gen and nullSafeEval of Arithmetic. To make sure when use multiply method will set custom MathContext with precision of 39 (meaning one more scale). Then round up twice will not affect result.

Why are the changes needed?

Fix the bug Decimal multiply produce wrong answer

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test.

@github-actions github-actions bot added the SQL label May 12, 2023
@HyukjinKwon
Copy link
Member

cc @gengliangwang FYI

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the problem. I personally thought that we would just use a different math context for times in Decimal, but this does work.

@Hisoka-X
Copy link
Member Author

cc @cloud-fan

@Hisoka-X Hisoka-X force-pushed the SPARK-40129_Decimal_multiply branch from f009445 to d7c34a0 Compare July 5, 2023 01:37
@Hisoka-X Hisoka-X requested a review from MaxGekk July 11, 2023 10:09
@cloud-fan
Copy link
Contributor

Can we spend more time explaining what is the correct result? Spark follows SQL semantic, not java semantic. It will be helpful to check results in other databases.

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Aug 1, 2023

MySQL:
image
MySQL round 6:
image
Oracle:
image
Oracle round 6:
image
Postgres:
image
Postgres round 6:
image

All are -110083130231976019291714061058.575919 not -110083130231976019291714061058.575920.

cc @cloud-fan

@cloud-fan
Copy link
Contributor

cloud-fan commented Aug 1, 2023

Is it better to avoid double round-up? e.g. we can pass a MathContext of the result decimal type (with wider precision to avoid overflow) to all decimal arithmetic operations. toPrecision only checks overflow.

also cc @beliefer , does your new decimal implementation have the same bug?

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Aug 1, 2023

Is it better to avoid double round-up? e.g. we can pass a MathContext of the result decimal type (with wider precision to avoid overflow) to all decimal arithmetic operations. toPrecision only checks overflow.

also cc @beliefer , does your new decimal implementation have the same bug?

Sounds reasonable. Let me implement it.

double round-up can not avoid, because first round-up in Java BigDecimal, we pass a wider MathContext just make sure it do not affect result( just like without round-up). Second round-up in toPrecision used to match precision we need.

to all decimal arithmetic operations

Yep, we can implement it in decimal arithmetic operations, will be suitable than now. Now I just copy method in arithmetic operations to decimalMethod.

toPrecision only checks overflow.

Seem like checks overflow are bind with change precision?

Comment on lines 610 to 613

override def decimalMethod(mathContextValue: GlobalValue, value1: String, value2: String):
String = s"Decimal.apply($value1.toJavaBigDecimal()" +
s".multiply($value2.toJavaBigDecimal(), $mathContextValue))"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to avoid double round-up?

@cloud-fan In fact, the current solution is very similar to what you mentioned, we pass in a wider MathContext, and then do RoundUp in toPrecision.

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@beliefer
Copy link
Contributor

beliefer commented Aug 2, 2023

also cc @beliefer , does your new decimal implementation have the same bug?

Think you for the ping. The new decimal implementation about multiply overflowed.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Nov 11, 2023
@github-actions github-actions bot closed this Nov 12, 2023
@cloud-fan cloud-fan reopened this Nov 12, 2023
@cloud-fan cloud-fan removed the Stale label Nov 12, 2023
@beliefer
Copy link
Contributor

cc @cloud-fan It seems #43678 is similar to this one.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 22, 2024
@github-actions github-actions bot closed this Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants