Skip to content

Conversation

@kazuyukitanimura
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes inaccurate Decimal multiplication and division results.

Why are the changes needed?

Decimal multiplication and division results may be inaccurate due to rounding issues.

Multiplication:

scala> sql("select  -14120025096157587712113961295153.858047 * -0.4652").show(truncate=false)
+----------------------------------------------------+                          
|(-14120025096157587712113961295153.858047 * -0.4652)|
+----------------------------------------------------+
|6568635674732509803675414794505.574764              |
+----------------------------------------------------+

The correct answer is 6568635674732509803675414794505.574763

Please note that the last digit is 3 instead of 4 as

scala> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652"))
val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644

Since the factional part .574763 is followed by 4644, it should not be rounded up.

Division:

scala> sql("select -0.172787979 / 533704665545018957788294905796.5").show(truncate=false)
+-------------------------------------------------+
|(-0.172787979 / 533704665545018957788294905796.5)|
+-------------------------------------------------+
|-3.237521E-31                                    |
+-------------------------------------------------+

The correct answer is -3.237520E-31

Please note that the last digit is 0 instead of 1 as

scala> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), 100, java.math.RoundingMode.DOWN)
val res22: java.math.BigDecimal = -3.237520489418037889998826491401059986665344697406144511563561222578738E-31

Since the factional part .237520 is followed by 4894..., it should not be rounded up.

Does this PR introduce any user-facing change?

Yes, users will see correct Decimal multiplication and division results.
Directly multiplying and dividing with org.apache.spark.sql.types.Decimal() (not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-up

How was this patch tested?

Test added

Was this patch authored or co-authored using generative AI tooling?

No

… results

This PR fixes inaccurate Decimal multiplication and division results.

Decimal multiplication and division results may be inaccurate due to rounding issues.
```
scala> sql("select  -14120025096157587712113961295153.858047 * -0.4652").show(truncate=false)
+----------------------------------------------------+
|(-14120025096157587712113961295153.858047 * -0.4652)|
+----------------------------------------------------+
|6568635674732509803675414794505.574764              |
+----------------------------------------------------+
```
The correct answer is `6568635674732509803675414794505.574763`

Please note that the last digit is `3` instead of `4` as

```
scala> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652"))
val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644
```
Since the factional part `.574763` is followed by `4644`, it should not be rounded up.

```
scala> sql("select -0.172787979 / 533704665545018957788294905796.5").show(truncate=false)
+-------------------------------------------------+
|(-0.172787979 / 533704665545018957788294905796.5)|
+-------------------------------------------------+
|-3.237521E-31                                    |
+-------------------------------------------------+
```
The correct answer is `-3.237520E-31`

Please note that the last digit is `0` instead of `1` as

```
scala> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), 100, java.math.RoundingMode.DOWN)
val res22: java.math.BigDecimal = -3.237520489418037889998826491401059986665344697406144511563561222578738E-31
```
Since the factional part `.237520` is followed by `4894...`, it should not be rounded up.

Yes, users will see correct Decimal multiplication and division results.
Directly multiplying and dividing with `org.apache.spark.sql.types.Decimal()` (not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-up

Test added

No

Closes apache#43678 from kazuyukitanimura/SPARK-45786.

Authored-by: Kazuyuki Tanimura <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@github-actions github-actions bot added the SQL label Nov 7, 2023
@dongjoon-hyun
Copy link
Member

Thank you, @kazuyukitanimura !

@jcdang
Copy link

jcdang commented Nov 7, 2023

Should type coercion consider rounding? I think there might be confusion on what truncate=false means but it's more about trimming a result - not a toggle to round when set to false.

@kazuyukitanimura would you get the result you want if your forced the result as an explicit cast to the default DecimalType decimal(38, 18)?

@kazuyukitanimura
Copy link
Contributor Author

kazuyukitanimura commented Nov 7, 2023

Thanks @jcdang

@kazuyukitanimura would you get the result you want if your forced the result as an explicit cast to the default DecimalType decimal(38, 18)?

Did you mean like Cast(x * y as Decimal(38, 18)I? It may not give a correct result depending on the data. The underlying issue is that there are two places that do rounding. MathContext and TypeCoercion. Let's say
x * y = 7890123456789012345.678901234567890123456 (40 digits)
First x.multiply(y, new MathContext(38, HALF_UP)) will round it to 7890123456789012345.6789012345678901235 (38 dibits). Please note the last digits 45 are rounded to 5[0].
Next casting (or type coercion) to decimal(38, 18) will round it again to 7890123456789012345.678901234567890124 (18 fraction scale). Please note the last digits 35 are rounded to 4[0]. This is not correct because 7890123456789012345.678901234567890123 (no round up) should be the right answer because the original number was followed by 456 that is less than 500, there shouldn't be any rounding up.

This PR changes not to round with MathContext. Hopefully this clarifies. Some comments are here https://github.com/apache/spark/pull/43705/files#diff-87807d437248d04876eac9e116a527577f1a8e53e28337bf26423e5bf94630e1R567-R571

@kazuyukitanimura
Copy link
Contributor Author

This is a backport of #43678 but Spark 3.3 decimal works differently from 3.4, need to update the tests more.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you want to proceed this, @kazuyukitanimura ?

@dongjoon-hyun
Copy link
Member

In any way, Apache Spark 3.3 will reach the end of life in two weeks.

If you don't have enough time, we close this anyway, @kazuyukitanimura .

@kazuyukitanimura
Copy link
Contributor Author

Thanks @dongjoon-hyun closing

@dongjoon-hyun
Copy link
Member

Thank you for the decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants