-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results #43678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @kazuyukitanimura .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix the UT failure, @kazuyukitanimura ?
[info] *** 1 TEST FAILED ***
[error] Failed: Total 3196, Failed 1, Errors 0, Passed 3195, Ignored 3
[error] Failed tests:
[error] org.apache.spark.sql.SQLQueryTestSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
cc @cloud-fan , too
… results
### What changes were proposed in this pull request?
This PR fixes inaccurate Decimal multiplication and division results.
### Why are the changes needed?
Decimal multiplication and division results may be inaccurate due to rounding issues.
#### Multiplication:
```
scala> sql("select -14120025096157587712113961295153.858047 * -0.4652").show(truncate=false)
+----------------------------------------------------+
|(-14120025096157587712113961295153.858047 * -0.4652)|
+----------------------------------------------------+
|6568635674732509803675414794505.574764 |
+----------------------------------------------------+
```
The correct answer is `6568635674732509803675414794505.574763`
Please note that the last digit is `3` instead of `4` as
```
scala> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652"))
val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644
```
Since the factional part `.574763` is followed by `4644`, it should not be rounded up.
#### Division:
```
scala> sql("select -0.172787979 / 533704665545018957788294905796.5").show(truncate=false)
+-------------------------------------------------+
|(-0.172787979 / 533704665545018957788294905796.5)|
+-------------------------------------------------+
|-3.237521E-31 |
+-------------------------------------------------+
```
The correct answer is `-3.237520E-31`
Please note that the last digit is `0` instead of `1` as
```
scala> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), 100, java.math.RoundingMode.DOWN)
val res22: java.math.BigDecimal = -3.237520489418037889998826491401059986665344697406144511563561222578738E-31
```
Since the factional part `.237520` is followed by `4894...`, it should not be rounded up.
### Does this PR introduce _any_ user-facing change?
Yes, users will see correct Decimal multiplication and division results.
Directly multiplying and dividing with `org.apache.spark.sql.types.Decimal()` (not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-up
### How was this patch tested?
Test added
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #43678 from kazuyukitanimura/SPARK-45786.
Authored-by: Kazuyuki Tanimura <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5ef3a84)
Signed-off-by: Dongjoon Hyun <[email protected]>
… results
### What changes were proposed in this pull request?
This PR fixes inaccurate Decimal multiplication and division results.
### Why are the changes needed?
Decimal multiplication and division results may be inaccurate due to rounding issues.
#### Multiplication:
```
scala> sql("select -14120025096157587712113961295153.858047 * -0.4652").show(truncate=false)
+----------------------------------------------------+
|(-14120025096157587712113961295153.858047 * -0.4652)|
+----------------------------------------------------+
|6568635674732509803675414794505.574764 |
+----------------------------------------------------+
```
The correct answer is `6568635674732509803675414794505.574763`
Please note that the last digit is `3` instead of `4` as
```
scala> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652"))
val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644
```
Since the factional part `.574763` is followed by `4644`, it should not be rounded up.
#### Division:
```
scala> sql("select -0.172787979 / 533704665545018957788294905796.5").show(truncate=false)
+-------------------------------------------------+
|(-0.172787979 / 533704665545018957788294905796.5)|
+-------------------------------------------------+
|-3.237521E-31 |
+-------------------------------------------------+
```
The correct answer is `-3.237520E-31`
Please note that the last digit is `0` instead of `1` as
```
scala> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), 100, java.math.RoundingMode.DOWN)
val res22: java.math.BigDecimal = -3.237520489418037889998826491401059986665344697406144511563561222578738E-31
```
Since the factional part `.237520` is followed by `4894...`, it should not be rounded up.
### Does this PR introduce _any_ user-facing change?
Yes, users will see correct Decimal multiplication and division results.
Directly multiplying and dividing with `org.apache.spark.sql.types.Decimal()` (not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-up
### How was this patch tested?
Test added
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #43678 from kazuyukitanimura/SPARK-45786.
Authored-by: Kazuyuki Tanimura <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5ef3a84)
Signed-off-by: Dongjoon Hyun <[email protected]>
|
Merged to master/3.5/3.4. Thank you, @kazuyukitanimura . Could you make a backporting PR to branch-3.3 too? |
|
Thank you all
I will @dongjoon-hyun |
… results
This PR fixes inaccurate Decimal multiplication and division results.
Decimal multiplication and division results may be inaccurate due to rounding issues.
```
scala> sql("select -14120025096157587712113961295153.858047 * -0.4652").show(truncate=false)
+----------------------------------------------------+
|(-14120025096157587712113961295153.858047 * -0.4652)|
+----------------------------------------------------+
|6568635674732509803675414794505.574764 |
+----------------------------------------------------+
```
The correct answer is `6568635674732509803675414794505.574763`
Please note that the last digit is `3` instead of `4` as
```
scala> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652"))
val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644
```
Since the factional part `.574763` is followed by `4644`, it should not be rounded up.
```
scala> sql("select -0.172787979 / 533704665545018957788294905796.5").show(truncate=false)
+-------------------------------------------------+
|(-0.172787979 / 533704665545018957788294905796.5)|
+-------------------------------------------------+
|-3.237521E-31 |
+-------------------------------------------------+
```
The correct answer is `-3.237520E-31`
Please note that the last digit is `0` instead of `1` as
```
scala> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), 100, java.math.RoundingMode.DOWN)
val res22: java.math.BigDecimal = -3.237520489418037889998826491401059986665344697406144511563561222578738E-31
```
Since the factional part `.237520` is followed by `4894...`, it should not be rounded up.
Yes, users will see correct Decimal multiplication and division results.
Directly multiplying and dividing with `org.apache.spark.sql.types.Decimal()` (not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-up
Test added
No
Closes apache#43678 from kazuyukitanimura/SPARK-45786.
Authored-by: Kazuyuki Tanimura <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
| } | ||
| } | ||
|
|
||
| test("SPARK-45786: Decimal multiply, divide, remainder, quot") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will failed when spark.sql.ansi.enabled
https://github.com/apache/spark/actions/runs/6885072758/job/18728675619
You can reproduce the issue locally by executing SPARK_ANSI_SQL_MODE=true build/sbt clean "catalyst/testOnly org.apache.spark.sql.catalyst.expressions.ArithmeticExpressionSuite"
@kazuyukitanimura Can you take a look at this issue?
also cc @dongjoon-hyun Since this patch has been backported to branch-3.4, I'm not sure if this will affect the version release of Spark 3.4.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LuciferYang
Yes, this test is assuming the default spark.sql.ansi.enabled=false. The default behavior does not throw the exception for overflows, but Ansi mode does. Since this is a random value test, we may have combinations that overflows.
Cause: org.apache.spark.SparkArithmeticException: [NUMERIC_VALUE_OUT_OF_RANGE] 431393072276642444045219979063553045.571 cannot be represented as Decimal(38, 4). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error, and return NULL instead. SQLSTATE: 22003
Sorry that I wasn't aware that there is a GHA for spark.sql.ansi.enabled=true. I can modify the test to ignore those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed #43853
…th ANSI enabled ### What changes were proposed in this pull request? This follow-up PR fixes the test for SPARK-45786 that is failing in GHA with SPARK_ANSI_SQL_MODE=true ### Why are the changes needed? The issue discovered in #43678 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test updated ### Was this patch authored or co-authored using generative AI tooling? No Closes #43853 from kazuyukitanimura/SPARK-45786-FollowUp. Authored-by: Kazuyuki Tanimura <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…th ANSI enabled ### What changes were proposed in this pull request? This follow-up PR fixes the test for SPARK-45786 that is failing in GHA with SPARK_ANSI_SQL_MODE=true ### Why are the changes needed? The issue discovered in #43678 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test updated ### Was this patch authored or co-authored using generative AI tooling? No Closes #43853 from kazuyukitanimura/SPARK-45786-FollowUp. Authored-by: Kazuyuki Tanimura <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 949de34) Signed-off-by: Dongjoon Hyun <[email protected]>
…th ANSI enabled ### What changes were proposed in this pull request? This follow-up PR fixes the test for SPARK-45786 that is failing in GHA with SPARK_ANSI_SQL_MODE=true ### Why are the changes needed? The issue discovered in #43678 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test updated ### Was this patch authored or co-authored using generative AI tooling? No Closes #43853 from kazuyukitanimura/SPARK-45786-FollowUp. Authored-by: Kazuyuki Tanimura <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 949de34) Signed-off-by: Dongjoon Hyun <[email protected]>
… results
### What changes were proposed in this pull request?
This PR fixes inaccurate Decimal multiplication and division results.
### Why are the changes needed?
Decimal multiplication and division results may be inaccurate due to rounding issues.
#### Multiplication:
```
scala> sql("select -14120025096157587712113961295153.858047 * -0.4652").show(truncate=false)
+----------------------------------------------------+
|(-14120025096157587712113961295153.858047 * -0.4652)|
+----------------------------------------------------+
|6568635674732509803675414794505.574764 |
+----------------------------------------------------+
```
The correct answer is `6568635674732509803675414794505.574763`
Please note that the last digit is `3` instead of `4` as
```
scala> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652"))
val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644
```
Since the factional part `.574763` is followed by `4644`, it should not be rounded up.
#### Division:
```
scala> sql("select -0.172787979 / 533704665545018957788294905796.5").show(truncate=false)
+-------------------------------------------------+
|(-0.172787979 / 533704665545018957788294905796.5)|
+-------------------------------------------------+
|-3.237521E-31 |
+-------------------------------------------------+
```
The correct answer is `-3.237520E-31`
Please note that the last digit is `0` instead of `1` as
```
scala> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), 100, java.math.RoundingMode.DOWN)
val res22: java.math.BigDecimal = -3.237520489418037889998826491401059986665344697406144511563561222578738E-31
```
Since the factional part `.237520` is followed by `4894...`, it should not be rounded up.
### Does this PR introduce _any_ user-facing change?
Yes, users will see correct Decimal multiplication and division results.
Directly multiplying and dividing with `org.apache.spark.sql.types.Decimal()` (not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-up
### How was this patch tested?
Test added
### Was this patch authored or co-authored using generative AI tooling?
No
Closes apache#43678 from kazuyukitanimura/SPARK-45786.
Authored-by: Kazuyuki Tanimura <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5ef3a84)
Signed-off-by: Dongjoon Hyun <[email protected]>
…th ANSI enabled ### What changes were proposed in this pull request? This follow-up PR fixes the test for SPARK-45786 that is failing in GHA with SPARK_ANSI_SQL_MODE=true ### Why are the changes needed? The issue discovered in apache#43678 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test updated ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#43853 from kazuyukitanimura/SPARK-45786-FollowUp. Authored-by: Kazuyuki Tanimura <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 949de34) Signed-off-by: Dongjoon Hyun <[email protected]>
… results (apache#358) * [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results ### What changes were proposed in this pull request? This PR fixes inaccurate Decimal multiplication and division results. ### Why are the changes needed? Decimal multiplication and division results may be inaccurate due to rounding issues. #### Multiplication: ``` scala> sql("select -14120025096157587712113961295153.858047 * -0.4652").show(truncate=false) +----------------------------------------------------+ |(-14120025096157587712113961295153.858047 * -0.4652)| +----------------------------------------------------+ |6568635674732509803675414794505.574764 | +----------------------------------------------------+ ``` The correct answer is `6568635674732509803675414794505.574763` Please note that the last digit is `3` instead of `4` as ``` scala> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652")) val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644 ``` Since the factional part `.574763` is followed by `4644`, it should not be rounded up. #### Division: ``` scala> sql("select -0.172787979 / 533704665545018957788294905796.5").show(truncate=false) +-------------------------------------------------+ |(-0.172787979 / 533704665545018957788294905796.5)| +-------------------------------------------------+ |-3.237521E-31 | +-------------------------------------------------+ ``` The correct answer is `-3.237520E-31` Please note that the last digit is `0` instead of `1` as ``` scala> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), 100, java.math.RoundingMode.DOWN) val res22: java.math.BigDecimal = -3.237520489418037889998826491401059986665344697406144511563561222578738E-31 ``` Since the factional part `.237520` is followed by `4894...`, it should not be rounded up. ### Does this PR introduce _any_ user-facing change? Yes, users will see correct Decimal multiplication and division results. Directly multiplying and dividing with `org.apache.spark.sql.types.Decimal()` (not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-up ### How was this patch tested? Test added ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#43678 from kazuyukitanimura/SPARK-45786. Authored-by: Kazuyuki Tanimura <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5ef3a84) Signed-off-by: Dongjoon Hyun <[email protected]> * [SPARK-45786][SQL][FOLLOWUP][TEST] Fix Decimal random number tests with ANSI enabled ### What changes were proposed in this pull request? This follow-up PR fixes the test for SPARK-45786 that is failing in GHA with SPARK_ANSI_SQL_MODE=true ### Why are the changes needed? The issue discovered in apache#43678 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test updated ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#43853 from kazuyukitanimura/SPARK-45786-FollowUp. Authored-by: Kazuyuki Tanimura <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 949de34) Signed-off-by: Dongjoon Hyun <[email protected]> --------- Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: Kazuyuki Tanimura <[email protected]>
What changes were proposed in this pull request?
This PR fixes inaccurate Decimal multiplication and division results.
Why are the changes needed?
Decimal multiplication and division results may be inaccurate due to rounding issues.
Multiplication:
The correct answer is
6568635674732509803675414794505.574763Please note that the last digit is
3instead of4asSince the factional part
.574763is followed by4644, it should not be rounded up.Division:
The correct answer is
-3.237520E-31Please note that the last digit is
0instead of1asSince the factional part
.237520is followed by4894..., it should not be rounded up.Does this PR introduce any user-facing change?
Yes, users will see correct Decimal multiplication and division results.
Directly multiplying and dividing with
org.apache.spark.sql.types.Decimal()(not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-upHow was this patch tested?
Test added
Was this patch authored or co-authored using generative AI tooling?
No